Science.gov

Sample records for probabilistic database search

  1. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  2. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  3. Optimal probabilistic search

    SciTech Connect

    Lokutsievskiy, Lev V

    2011-05-31

    This paper is concerned with the optimal search of an object at rest with unknown exact position in the n-dimensional space. A necessary condition for optimality of a trajectory is obtained. An explicit form of a differential equation for an optimal trajectory is found while searching over R-strongly convex sets. An existence theorem is also established. Bibliography: 8 titles.

  4. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  5. Begin: Online Database Searching Now!

    ERIC Educational Resources Information Center

    Lodish, Erica K.

    1986-01-01

    Because of the increasing importance of online databases, school library media specialists are encouraged to introduce students to online searching. Four books that would help media specialists gain a basic background are reviewed and it is noted that although they are very technical, they can be adapted to individual needs. (EM)

  6. Searching NCBI Databases Using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-10-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  7. Searching NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2008-12-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  8. Searching NCBI databases using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-06-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  9. Database Search Engines: Paradigms, Challenges and Solutions.

    PubMed

    Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

  10. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  11. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  12. Using volume holograms to search digital databases

    NASA Astrophysics Data System (ADS)

    Burr, Geoffrey W.; Maltezos, George; Grawert, Felix; Kobras, Sebastian; Hanssen, Holger; Coufal, Hans J.

    2002-01-01

    Holographic data storage offers the potential for simultaneous search of an entire database by performing multiple optical correlations between stored data pages and a search argument. This content-addressable retrieval produces one analog correlation score for each stored volume hologram. We have previously developed fuzzy encoding techniques for this fast parallel search, and holographically searched a small database with high fidelity. We recently showed that such systems can be configured to produce true inner-products, and proposed an architecture in which massively-parallel searches could be implemented. However, the speed advantage over conventional electronic search provided by parallelism brings with it the possibility of erroneous search results, since these analog correlation scores are subject to various noise sources. We show that the fidelity of such an optical search depends not only on the usual holographic storage signal-to-noise factors (such as readout power, diffraction efficiency, and readout speed), but also on the particular database query being made. In effect, the presence of non-matching database records with nearly the same correlation score as the targeted matching records reduces the speed advantage of the parallel search. Thus for any given fidelity target, the performance improvement offered by a content-addressable holographic storage can vary from query to query even within the same database.

  13. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  14. A Probabilistic Approach to Information Retrieval in Systems with Boolean Search Request Formulations.

    ERIC Educational Resources Information Center

    Radecki, Tadeusz

    1982-01-01

    Outlines an approach to information retrieval which integrates the existing theory of probabilistic retrieval into a practical methodology based on Boolean searches. Basic concepts, search methodology, and examples of Boolean searching are noted. Twenty-six sources are appended. (EJS)

  15. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  16. Efficient search and retrieval in biometric databases

    NASA Astrophysics Data System (ADS)

    Mhatre, Amit J.; Palla, Srinivas; Chikkerur, Sharat; Govindaraju, Venu

    2005-03-01

    Biometric identification has emerged as a reliable means of controlling access to both physical and virtual spaces. Fingerprints, face and voice biometrics are being increasingly used as alternatives to passwords, PINs and visual verification. In spite of the rapid proliferation of large-scale databases, the research has thus far been focused only on accuracy within small databases. In larger applications, response time and retrieval efficiency also become important in addition to accuracy. Unlike structured information such as text or numeric data that can be sorted, biometric data does not have any natural sorting order. Therefore indexing and binning of biometric databases represents a challenging problem. We present results using parallel combination of multiple biometrics to bin the database. Using hand geometry and signature features we show that the search space can be reduced to just 5% of the entire database.

  17. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  18. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  19. A fuzzy approach for mining association rules in a probabilistic database

    NASA Astrophysics Data System (ADS)

    Pei, Bin; Chen, Dingjie; Zhao, Suyun; Chen, Hong

    2013-07-01

    Association rule mining is an essential knowledge discovery method that can find associations in database. Previous studies on association rule mining focus on finding quantitative association rules from certain data, or finding Boolean association rules from uncertain data. Unfortunately, due to instrument errors, imprecise of sensor monitoring systems and so on, real-world data tend to be quantitative data with inherent uncertainty. In our paper, we study the discovery of association rules from probabilistic database with quantitative attributes. Once we convert quantitative attributes into fuzzy sets, we get a probabilistic database with fuzzy sets in the database. This is theoretical challenging, since we need to give appropriate interest measures to define support and confidence degree of fuzzy events with probability. We propose a Shannon-like Entropy to measure the information of such event. After that, an algorithm is proposed to find fuzzy association rules from probabilistic database. Finally, an illustrated example is given to demonstrate the procedure of the algorithm.

  20. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-03-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  1. Searching the NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2006-11-01

    One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  2. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  3. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions

  4. WAIS Searching of the Current Contents Database

    NASA Astrophysics Data System (ADS)

    Banholzer, P.; Grabenstein, M. E.

    The Homer E. Newell Memorial Library of NASA's Goddard Space Flight Center is developing capabilities to permit Goddard personnel to access electronic resources of the Library via the Internet. The Library's support services contractor, Maxima Corporation, and their subcontractor, SANAD Support Technologies have recently developed a World Wide Web Home Page (http://www-library.gsfc.nasa.gov) to provide the primary means of access. The first searchable database to be made available through the HomePage to Goddard employees is Current Contents, from the Institute for Scientific Information (ISI). The initial implementation includes coverage of articles from the last few months of 1992 to present. These records are augmented with abstracts and references, and often are more robust than equivalent records in bibliographic databases that currently serve the astronomical community. Maxima/SANAD selected Wais Incorporated's WAIS product with which to build the interface to Current Contents. This system allows access from Macintosh, IBM PC, and Unix hosts, which is an important feature for Goddard's multiplatform environment. The forms interface is structured to allow both fielded (author, article title, journal name, id number, keyword, subject term, and citation) and unfielded WAIS searches. The system allows a user to: Retrieve individual journal article records. Retrieve Table of Contents of specific issues of journals. Connect to articles with similar subject terms or keywords. Connect to other issues of the same journal in the same year. Browse journal issues from an alphabetical list of indexed journal names.

  5. The Database Dilemma: Online Search Strategies in Nursing.

    ERIC Educational Resources Information Center

    Fried, Ava K.; And Others

    1989-01-01

    Describes a study that compared the coverage of the nursing profession, subject heading specificity, and ease of retrieval of the MEDLINE and Nursing & Allied Health (CINAHL) online databases. The strengths and weaknesses of each database are discussed and hints for searching on both databases are provided. (four references) (CLB)

  6. Online Bibliographic Searching in the Humanities Databases: An Introduction.

    ERIC Educational Resources Information Center

    Suresh, Raghini S.

    Numerous easily accessible databases cover almost every subject area in the humanities. The principal database resources in the humanities are described. There are two major database vendors for humanities information: BRS (Bibliographic Retrieval Services) and DIALOG Information Services, Inc. As an introduction to online searching, this article…

  7. Multiple Database Searching: Techniques and Pitfalls

    ERIC Educational Resources Information Center

    Hawkins, Donald T.

    1978-01-01

    Problems involved in searching multiple data bases are discussed including indexing differences, overlap among data bases, variant spellings, and elimination of duplicate items from search output. Discussion focuses on CA Condensates, Inspec, and Metadex data bases. (J PF)

  8. Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)

    2001-01-01

    To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.

  9. Performance Evaluation of Adaptive Probabilistic Search in P2P Networks

    NASA Astrophysics Data System (ADS)

    Zhang, Haoxiang; Zhang, Lin; Shan, Xiuming; Li, Victor O. K.

    The overall performance of P2P-based file sharing applications is becoming increasingly important. Based on the Adaptive Resource-based Probabilistic Search algorithm (ARPS), which was previously proposed by the authors, a novel probabilistic search algorithm with QoS guarantees is proposed in this letter. The algorithm relies on generating functions to satisfy the user's constraints and to exploit the power-law distribution in the node degree. Simulation results demonstrate that it performs well under various P2P scenarios. The proposed algorithm provides guarantees on the search performance perceived by the user while minimizing the search cost. Furthermore, it allows different QoS levels, resulting in greater flexibility and scalability.

  10. Lost in Search: (Mal-)Adaptation to Probabilistic Decision Environments in Children and Adults

    ERIC Educational Resources Information Center

    Betsch, Tilmann; Lehmann, Anne; Lindow, Stefanie; Lang, Anna; Schoemann, Martin

    2016-01-01

    Adaptive decision making in probabilistic environments requires individuals to use probabilities as weights in predecisional information searches and/or when making subsequent choices. Within a child-friendly computerized environment (Mousekids), we tracked 205 children's (105 children 5-6 years of age and 100 children 9-10 years of age) and 103…

  11. An efficient quantum search engine on unsorted database

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Zhang, Yingyu; Liu, Fang

    2013-10-01

    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O() queries to factorized oracles. But the algorithm requires that all the attribute values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the attribute values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O() for most cases.

  12. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  13. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  14. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  15. Exhaustive Database Searching for Amino Acid Mutations in Proteomes

    SciTech Connect

    Hyatt, Philip Douglas; Pan, Chongle

    2012-01-01

    Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage.

  16. Probabilistic Cuing in Large-Scale Environmental Search

    ERIC Educational Resources Information Center

    Smith, Alastair D.; Hood, Bruce M.; Gilchrist, Iain D.

    2010-01-01

    Finding an object in our environment is an important human ability that also represents a critical component of human foraging behavior. One type of information that aids efficient large-scale search is the likelihood of the object being in one location over another. In this study we investigated the conditions under which individuals respond to…

  17. Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.

    PubMed

    Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L

    2016-11-04

    The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the

  18. The Effects of Search Tool Type and Cognitive Style on Performance during Hypermedia Database Searches.

    ERIC Educational Resources Information Center

    Leader, Lars F.; Klein, James D.

    1996-01-01

    Describes a study that investigated the effects of search tools and learner cognitive styles on performance in searches for information within a hypermedia database. Students in a university English-as-a-Second-Language program were assigned to one of four treatment groups, and results show a significant interaction between search tool and…

  19. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  20. Forensic utilization of familial searches in DNA databases.

    PubMed

    Gershaw, Cassandra J; Schweighardt, Andrew J; Rourke, Linda C; Wallace, Margaret M

    2011-01-01

    DNA evidence is widely recognized as an invaluable tool in the process of investigation and identification, as well as one of the most sought after types of evidence for presentation to a jury. In the United States, the development of state and federal DNA databases has greatly impacted the forensic community by creating an efficient, searchable system that can be used to eliminate or include suspects in an investigation based on matching DNA profiles - the profile already in the database to the profile of the unknown sample in evidence. Recent changes in legislation have begun to allow for the possibility to expand the parameters of DNA database searches, taking into account the possibility of familial searches. This article discusses prospective positive outcomes of utilizing familial DNA searches and acknowledges potential negative outcomes, thereby presenting both sides of this very complicated, rapidly evolving situation.

  1. Complementary use of the SciSearch database for improved biomedical information searching.

    PubMed Central

    Brown, C M

    1998-01-01

    The use of at least two complementary online biomedical databases is generally considered critical for biomedical scientists seeking to keep fully abreast of recent research developments as well as to retrieve the highest number of relevant citations possible. Although the National Library of Medicine's MEDLINE is usually the database of choice, this paper illustrates the benefits of using another database, the Institute for Scientific Information's SciSearch, when conducting a biomedical information search. When a simple query about red wine consumption and coronary artery disease was posed simultaneously in both MEDLINE and SciSearch, a greater number of relevant citations were retrieved through SciSearch. This paper also provides suggestions for carrying out a comprehensive biomedical literature search in a rapid and efficient manner by using SciSearch in conjunction with MEDLINE. PMID:9549014

  2. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  3. The LAILAPS search engine: relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Bargsten, Joachim; Haberhauer, Gregor; Klapperstück, Matthias; Leps, Michael; Weinel, Christian; Wünschiers, Röbbe; Weissbach, Mandy; Stein, Jens; Scholz, Uwe

    2010-01-15

    Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  4. Automatic sub-volume registration by probabilistic random search

    NASA Astrophysics Data System (ADS)

    Han, Jingfeng; Qiao, Min; Hornegger, Joachim; Kuwert, Torsten; Bautz, Werner; Römer, Wolfgang

    2006-03-01

    Registration of an individual's image data set to an anatomical atlas provides valuable information to the physician. In many cases, the individual image data sets are partial data, which may be mapped to one part or one organ of the entire atlas data. Most of the existing intensity based image registration approaches are designed to align images of the entire view. When they are applied to the registration with partial data, a manual pre-registration is usually required. This paper proposes a fully automatic approach to the registration of the incomplete image data to an anatomical atlas. The spatial transformations are modelled as any parametric functions. The proposed method is built upon a random search mechanism, which allows to find the optimal transformation randomly and globally even when the initialization is not ideal. It works more reliably than the existing methods for the partial data registration because it successfully overcomes the local optimum problem. With appropriate similarity measures, this framework is applicable to both mono-modal and multi-modal registration problems with partial data. The contribution of this work is the description of the mathematical framework of the proposed algorithm and the implementation of the related software. The medical evaluation on the MRI data and the comparison of the proposed method with different existing registration methods show the feasibility and superiority of the proposed method.

  5. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    SciTech Connect

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  6. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  7. Feature selection in validating mass spectrometry database search results.

    PubMed

    Fang, Jianwen; Dong, Yinghua; Williams, Todd D; Lushington, Gerald H

    2008-02-01

    Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.

  8. Probabilistic acute dietary exposure assessments to captan and tolylfluanid using several European food consumption and pesticide concentration databases.

    PubMed

    Boon, Polly E; Svensson, Kettil; Moussavian, Shahnaz; van der Voet, Hilko; Petersen, Annette; Ruprich, Jiri; Debegnach, Francesca; de Boer, Waldo J; van Donkersgoed, Gerda; Brera, Carlo; van Klaveren, Jacob D; Busk, Leif

    2009-12-01

    Probabilistic dietary acute exposure assessments of captan and tolylfluanid were performed for the populations of the Czech Republic, Denmark, Italy, the Netherlands and Sweden. The basis for these assessments was national databases for food consumption and pesticide concentration data harmonised at the level of raw agricultural commodity. Data were obtained from national food consumption surveys and national monitoring programmes and organised in an electronic platform of databases connected to probabilistic software. The exposure assessments were conducted by linking national food consumption data either (1) to national pesticide concentration data or (2) to a pooled database containing all national pesticide concentration data. We show that with this tool national exposure assessments can be performed in a harmonised way and that pesticide concentrations of other countries can be linked to national food consumption surveys. In this way it is possible to exchange or merge concentration data between countries in situations of data scarcity. This electronic platform in connection with probabilistic software can be seen as a prototype of a data warehouse, including a harmonised approach for dietary exposure modelling.

  9. Enriching Great Britain's National Landslide Database by searching newspaper archives

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Freeborough, Katy; Demeritt, David

    2015-11-01

    Our understanding of where landslide hazard and impact will be greatest is largely based on our knowledge of past events. Here, we present a method to supplement existing records of landslides in Great Britain by searching an electronic archive of regional newspapers. In Great Britain, the British Geological Survey (BGS) is responsible for updating and maintaining records of landslide events and their impacts in the National Landslide Database (NLD). The NLD contains records of more than 16,500 landslide events in Great Britain. Data sources for the NLD include field surveys, academic articles, grey literature, news, public reports and, since 2012, social media. We aim to supplement the richness of the NLD by (i) identifying additional landslide events, (ii) acting as an additional source of confirmation of events existing in the NLD and (iii) adding more detail to existing database entries. This is done by systematically searching the Nexis UK digital archive of 568 regional newspapers published in the UK. In this paper, we construct a robust Boolean search criterion by experimenting with landslide terminology for four training periods. We then apply this search to all articles published in 2006 and 2012. This resulted in the addition of 111 records of landslide events to the NLD over the 2 years investigated (2006 and 2012). We also find that we were able to obtain information about landslide impact for 60-90% of landslide events identified from newspaper articles. Spatial and temporal patterns of additional landslides identified from newspaper articles are broadly in line with those existing in the NLD, confirming that the NLD is a representative sample of landsliding in Great Britain. This method could now be applied to more time periods and/or other hazards to add richness to databases and thus improve our ability to forecast future events based on records of past events.

  10. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches.

  11. MassMatrix: A Database Search Program for Rapid Characterization of Proteins and Peptides from Tandem Mass Spectrometry Data

    PubMed Central

    Xu, Hua; Freitas, Michael A.

    2009-01-01

    MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The tandem mass spectrometry search software was evaluated by use of a high mass accuracy data set and its results compared with those from Mascot, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than Mascot, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable post-translational modifications did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with Mascot, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ~100,000 spectra per hour when searching against a complete human database with 8 variable modifications. The algorithm is available for public searches at http://www.massmatrix.net. PMID:19235167

  12. Fast and accurate database searches with MS-GF+Percolator.

    PubMed

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-07

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  13. A Multilevel Probabilistic Beam Search Algorithm for the Shortest Common Supersequence Problem

    PubMed Central

    Gallardo, José E.

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably. PMID:23300667

  14. Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search.

    PubMed

    Shortreed, Michael R; Wenger, Craig D; Frey, Brian L; Sheynkman, Gloria M; Scalf, Mark; Keller, Mark P; Attie, Alan D; Smith, Lloyd M

    2015-11-06

    Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.

  15. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-02

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided.

  16. Grover's search algorithm with an entangled database state

    NASA Astrophysics Data System (ADS)

    Alsing, Paul M.; McDonald, Nathan

    2011-05-01

    Grover's oracle based unstructured search algorithm is often stated as "given a phone number in a directory, find the associated name." More formally, the problem can be stated as "given as input a unitary black box Uf for computing an unknown function f:{0,1}n ->{0,1}find x=x0 an element of {0,1}n such that f(x0) =1, (and zero otherwise). The crucial role of the externally supplied oracle Uf (whose inner workings are unknown to the user) is to change the sign of the solution 0 x , while leaving all other states unaltered. Thus, Uf depends on the desired solution x0. This paper examines an amplitude amplification algorithm in which the user encodes the directory (e.g. names and telephone numbers) into an entangled database state, which at a later time can be queried on one supplied component entry (e.g. a given phone number t0) to find the other associated unknown component (e.g. name x0). For N=2n names x with N associated phone numbers t , performing amplitude amplification on a subspace of size N of the total space of size N2 produces the desired state 0 0 x t in √N steps. We discuss how and why sequential (though not concurrent parallel) searches can be performed on multiple database states. Finally, we show how this procedure can be generalized to databases with more than two correlated lists (e.g. x t s r ...).

  17. Accelerating chemical database searching using graphics processing units.

    PubMed

    Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric

    2011-08-22

    The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.

  18. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  19. Cycloquest: Identification of cyclopeptides via database search of their mass spectra against genome databases

    PubMed Central

    Mohimani, Hosein; Liu, Wei-Ting; Mylne, Joshua S.; Poth, Aaron G.; Colgrave, Michelle L.; Tran, Dat; Selsted, Michael E.; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2011-01-01

    Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization, and different fragmentation pattern of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry, and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque. PMID:21851130

  20. Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

    PubMed

    Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

    2016-01-01

    Advances in mass spectrometric instrumentation in the past 15 years have resulted in an explosion in the raw data yield from typical phosphoproteomics workflows. This poses the challenge of confidently identifying peptide sequences, localizing phosphosites to proteins and quantifying these from the vast amounts of raw data. This task is tackled by computational tools implementing algorithms that match the experimental data to databases, providing the user with lists for downstream analysis. Several platforms for such automated interpretation of mass spectrometric data have been developed, each having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here.

  1. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  2. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  3. (Sub)structure Searches in Databases Containing Generic Chemical Structure Representations.

    ERIC Educational Resources Information Center

    Schoch-Grubler, Ursula

    1990-01-01

    Reviews three database systems available for searching generic chemical structure representations: (1) Derwent's Chemical Code System; (2) IDC's Gremas System; and (3) Derwent's Markush DARC System. Various types of searches are described, features desirable to users are discussed, and comparison searches are described that measured recall and…

  4. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  5. Medical Students' Personal Knowledge, Searching Proficiency, and Database Use in Problem Solving.

    ERIC Educational Resources Information Center

    Wildemuth, Barbara M.; And Others

    1995-01-01

    Discusses the relationship between personal knowledge in a domain and online searching proficiency in that domain, and the relationship between searching proficiency and database-assisted problem-solving performance based on a study of medical students. Search results, selection of terms, and efficiency were found to be related to problem-solving…

  6. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  7. Searching the expressed sequence tag (EST) databases: panning for genes.

    PubMed

    Jongeneel, C V

    2000-02-01

    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

  8. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  9. Seismic hazard assessment for Myanmar: Earthquake model database, ground-motion scenarios, and probabilistic assessments

    NASA Astrophysics Data System (ADS)

    Chan, C. H.; Wang, Y.; Thant, M.; Maung Maung, P.; Sieh, K.

    2015-12-01

    We have constructed an earthquake and fault database, conducted a series of ground-shaking scenarios, and proposed seismic hazard maps for all of Myanmar and hazard curves for selected cities. Our earthquake database integrates the ISC, ISC-GEM and global ANSS Comprehensive Catalogues, and includes harmonized magnitude scales without duplicate events. Our active fault database includes active fault data from previous studies. Using the parameters from these updated databases (i.e., the Gutenberg-Richter relationship, slip rate, maximum magnitude and the elapse time of last events), we have determined the earthquake recurrence models of seismogenic sources. To evaluate the ground shaking behaviours in different tectonic regimes, we conducted a series of tests by matching the modelled ground motions to the felt intensities of earthquakes. Through the case of the 1975 Bagan earthquake, we determined that Atkinson and Moore's (2003) scenario using the ground motion prediction equations (GMPEs) fits the behaviours of the subduction events best. Also, the 2011 Tarlay and 2012 Thabeikkyin events suggested the GMPEs of Akkar and Cagnan (2010) fit crustal earthquakes best. We thus incorporated the best-fitting GMPEs and site conditions based on Vs30 (the average shear-velocity down to 30 m depth) from analysis of topographic slope and microtremor array measurements to assess seismic hazard. The hazard is highest in regions close to the Sagaing Fault and along the Western Coast of Myanmar as seismic sources there have earthquakes occur at short intervals and/or last events occurred a long time ago. The hazard curves for the cities of Bago, Mandalay, Sagaing, Taungoo and Yangon show higher hazards for sites close to an active fault or with a low Vs30, e.g., the downtown of Sagaing and Shwemawdaw Pagoda in Bago.

  10. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  11. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  12. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  13. Use of Composite Protein Database including Search Result Sequences for Mass Spectrometric Analysis of Cell Secretome

    PubMed Central

    Shin, Jihye; Kim, Gamin; Kabir, Mohammad Humayun; Park, Seong Jun; Lee, Seoung Taek; Lee, Cheolju

    2015-01-01

    Mass spectrometric (MS) data of human cell secretomes are usually run through the conventional human database for identification. However, the search may result in false identifications due to contamination of the secretome with fetal bovine serum (FBS) proteins. To overcome this challenge, here we provide a composite protein database including human as well as 199 FBS protein sequences for MS data search of human cell secretomes. Searching against the human-FBS database returned more reliable results with fewer false-positive and false-negative identifications compared to using either a human only database or a human-bovine database. Furthermore, the improved results validated our strategy without complex experiments like SILAC. We expect our strategy to improve the accuracy of human secreted protein identification and to also add value for general use. PMID:25822838

  14. Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

    PubMed Central

    Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

    2007-01-01

    Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248

  15. Techniques for searching the CINAHL database using the EBSCO interface.

    PubMed

    Lawrence, Janna C

    2007-04-01

    The cumulative index to Nursing and Allied Health Literature (CINAHL) is a useful research tool for accessing articles of interest to nurses and health care professionals. More than 2,800 journals are indexed by CINAHL and can be searched easily using assigned subject headings. Detailed instructions about conducting, combining, and saving searches in CINAHL are provided in this article. Establishing an account at EBSCO further allows a nurse to save references and searches and to receive e-mail alerts when new articles on a topic of interest are published.

  16. Optimal design of groundwater remediation systems using a probabilistic multi-objective fast harmony search algorithm under uncertainty

    NASA Astrophysics Data System (ADS)

    Luo, Q.; Wu, J.; Qian, J.

    2013-12-01

    This study develops a new probabilistic multi-objective fast harmony search algorithm (PMOFHS) for optimal design of groundwater remediation system under uncertainty associated with the hydraulic conductivity of aquifers. The PMOFHS integrates the previously developed deterministic multi-objective optimization method, namely multi-objective fast harmony search algorithm (MOFHS) with a probabilistic Pareto domination ranking and probabilistic niche technique to search for Pareto-optimal solutions to multi-objective optimization problems in a noisy hydrogeological environment arising from insufficient hydraulic conductivity data. The PMOFHS is then coupled with the commonly used flow and transport codes, MODFLOW and MT3DMS, to identify the optimal groundwater remediation system of a two-dimensional hypothetical test problem involving two objectives: (i) minimization of the total remediation cost through the engineering planning horizon, and (ii) minimization of the percentage of mass remaining in the aquifer at the end of the operational period, which uses the Pump-and-Treat (PAT) technology to clean up contaminated groundwater. Also, Monte Carlo (MC) analysis is used to demonstrate the effectiveness of the proposed methodology. The MC analysis is taken to each Pareto solutions for every K realization. Then the statistical mean and the upper and lower bounds of uncertainty intervals of 95% confidence level are calculated. The MC analysis results show that all of the Pareto-optimal solutions are located between the upper and lower bounds of the MC analysis. Moreover, the root mean square errors (RMSEs) between the Pareto-optimal solutions by the PMOFHS and the average values of optimal solutions by the MC analysis are 0.0204 for the first objective and 0.0318 for the second objective, quite smaller than those RMSEs between the results by the existing probabilistic multi-objective genetic algorithm (PMOGA) and the MC analysis, 0.0384 and 0.0397, respectively. In

  17. Increasing number of databases searched in systematic reviews and meta-analyses between 1994 and 2014

    PubMed Central

    Lam, Michael T.; McDiarmid, Mary

    2016-01-01

    Objectives The purpose of this study was to determine whether the number of bibliographic databases used to search the health sciences literature in individual systematic reviews (SRs) and meta-analyses (MAs) changed over a twenty-year period related to the official 1995 launch of the Cochrane Database of Systematic Reviews (CDSR). Methods Ovid MEDLINE was searched using a modified version of a strategy developed by the Scottish Intercollegiate Guidelines Network to identify SRs and MAs. Records from 3 milestone years were searched: the year immediately preceding (1994) and 1 (2004) and 2 (2014) decades following the CDSR launch. Records were sorted with randomization software. Abstracts or full texts of the records were examined to identify database usage until 100 relevant records were identified from each of the 3 years. Results The mean and median number of bibliographic databases searched in 1994, 2004, and 2014 were 1.62 and 1, 3.34 and 3, and 3.73 and 4, respectively. Studies that searched only 1 database decreased over the 3 milestone years (60% in 1994, 28% in 2004, and 10% in 2014). Conclusions The number of bibliographic databases searched in individual SRs and MAs increased from 1994 to 2014. PMID:27822149

  18. Uninformed and probabilistic distributed agent combinatorial searches for the unary NP-complete disassembly line balancing problem

    NASA Astrophysics Data System (ADS)

    McGovern, Seamus M.; Gupta, Surendra M.

    2005-11-01

    Disassembly takes place in remanufacturing, recycling, and disposal, with a line being the best choice for automation. The disassembly line balancing problem seeks a sequence which: is feasible, minimizes workstations, and ensures similar idle times, as well as other end-of-life specific concerns. Finding the optimal balance is computationally intensive due to exponential growth. Combinatorial optimization methods hold promise for providing solutions to the disassembly line balancing problem, which is proven here to belong to the class of unary NP-complete problems. Probabilistic (ant colony optimization) and uninformed (H-K) search methods are presented and compared. Numerical results are obtained using a recent case study to illustrate the search implementations and compare their performance. Conclusions drawn include the consistent generation of near-optimal solutions, the ability to preserve precedence, the speed of the techniques, and their practicality due to ease of implementation.

  19. Probabilistic person identification in TV news programs using image web database

    NASA Astrophysics Data System (ADS)

    Battisti, F.; Carli, M.; Leo, M.; Neri, A.

    2014-02-01

    The automatic labeling of faces in TV broadcasting is still a challenging problem. The high variability in view points, facial expressions, general appearance, and lighting conditions, as well as occlusions, rapid shot changes, and camera motions, produce significant variations in image appearance. The application of automatic tools for face recognition is not yet fully established and the human intervention is needed. In this paper, we deal with the automatic face recognition in TV broadcasting programs. The target of the proposed method is to identify the presence of a specific person in a video by means of a set of images downloaded from Web using a specific search key.

  20. SledgeHMMER: a web server for batch searching the Pfam database.

    PubMed

    Chukkapalli, Giridhar; Guda, Chittibabu; Subramaniam, Shankar

    2004-07-01

    The SledgeHMMER web server is intended for genome-scale searching of the Pfam database without having to install this database and the HMMER software locally. The server implements a parallelized version of hmmpfam, the program used for searching the Pfam HMM database. Pfam search results have been calculated for the entire Swiss-Prot and TrEmbl database sequences (approximately 1.2 million) on 256 processors of IA64-based teragrid machines. The Pfam database can be searched in local, glocal or merged mode, using either gathering or E-value thresholds. Query sequences are first matched against the pre-calculated entries to retrieve results, and those without matches are processed through a new search process. Results are emailed in a space-delimited tabular format upon completion of the search. While most other Pfam-searching web servers set a limit of one sequence per query, this server processes batch sequences with no limit on the number of input sequences. The web server and downloadable data are accessible from http://SledgeHmmer.sdsc.edu.

  1. Dialog's Knowledge Index and BRS/After Dark: Database Searching on Personal Computers.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1983-01-01

    Describes two new bibliographic information services being marketed to microcomputer owners by DIALOG, Inc. and Bibliographic Retrieval Services to allow access to databases at low rates during evening hours. Subject focus, selection of a database, search strategies employed on each system are discussed, and the two services are compared. (EJS)

  2. STEPS: A Grid Search Methodology for Optimized Peptide Identification Filtering of MS/MS Database Search Results

    SciTech Connect

    Piehowski, Paul D.; Petyuk, Vladislav A.; Sandoval, John D.; Burnum, Kristin E.; Kiebel, Gary R.; Monroe, Matthew E.; Anderson, Gordon A.; Camp, David G.; Smith, Richard D.

    2013-03-01

    For bottom-up proteomics there are a wide variety of database searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection - referred to as STEPS - utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  3. Searching for Controlled Trials of Complementary and Alternative Medicine: A Comparison of 15 Databases

    PubMed Central

    Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

    2011-01-01

    This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records—these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases. PMID:19468052

  4. InfoTrac's SearchBank Databases: Business Information and More.

    ERIC Educational Resources Information Center

    Mehta, Usha; Goodman, Beth

    1997-01-01

    Describes the InfoTrac SearchBank based on experiences at the University of Nevada, Reno, libraries where the service is available through the online catalog. Highlights include remote access through the Internet; indexing and abstracting; full-text access to 460 journal titles; a powerful search engine; and business-oriented databases.…

  5. Extending the Role of the Corporate Library: Corporate Database Applications Using BRS/Search Software.

    ERIC Educational Resources Information Center

    Lammert, Diana

    1993-01-01

    Describes the McKenna Information Center's application of BRS/SEARCH, information retrieval software, as part of its services to Kennmetal Inc., its parent company. Features and uses of the software, including commands, custom searching, menu-driven interfaces, preparing reports, and designing databases are covered. Nine examples of software…

  6. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  7. Interspecies extrapolation based on the RepDose database--a probabilistic approach.

    PubMed

    Escher, Sylvia E; Batke, Monika; Hoffmann-Doerr, Simone; Messinger, Horst; Mangelsdorf, Inge

    2013-04-12

    Repeated dose toxicity studies from the RepDose database (DB) were used to determine interspecies differences for rats and mice. NOEL (no observed effect level) ratios based on systemic effects were investigated for three different types of exposure: inhalation, oral food/drinking water and oral gavage. Furthermore, NOEL ratios for local effects in inhalation studies were evaluated. On the basis of the NOEL ratio distributions, interspecies assessment factors (AF) are evaluated. All data sets were best described by a lognormal distribution. No difference was seen between inhalation and oral exposure for systemic effects. Rats and mice were on average equally sensitive at equipotent doses with geometric mean (GM) values of 1 and geometric standard deviation (GSD) values ranging from 2.30 to 3.08. The local AF based on inhalation exposure resulted in a similar distribution with GM values of 1 and GSD values between 2.53 and 2.70. Our analysis confirms former analyses on interspecies differences, including also dog and human data. Furthermore it supports the principle of allometric scaling according to caloric demand in the case that body doses are applied. In conclusion, an interspecies distribution animal/human with a GM equal to allometric scaling and a GSD of 2.5 was derived.

  8. Efficiency of 22 online databases in the search for physicochemical, toxicological and ecotoxicological information on chemicals.

    PubMed

    Guerbet, Michel; Guyodo, Gaetan

    2002-03-01

    The objective of this study was to evaluate the efficiency of 22 free online databases that could be used for an exhaustive search of physicochemical, toxicological and/or ecotoxicological information about various chemicals. Twenty-two databases with free access on the Internet were referenced. We then selected 27 major physicochemical, toxicological and ecotoxicological criteria and 14 compounds belonging to seven different chemical classes which were used to interrogate all the databases. Two indices were successively calculated to evaluate the efficiency with taking or not taking account of their specialization. More than 50% of the 22 databases 'knew' all of the 14 chemicals, but the quantity of information provided is very different from one to the other and most are poorly documented. Two categories clearly appear with specialized and non-specialized databases. The HSDB database is the most efficient general database to be searched first, because it is well documented for most of the 27 criteria. However, some specialized databases (i.e. EXTOXNET, SOLVEDB, etc.) must be searched secondarily to find additional information.

  9. MIDAS: a database-searching algorithm for metabolite identification in metabolomics.

    PubMed

    Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P; Pan, Chongle

    2014-10-07

    A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.

  10. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    SciTech Connect

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division; Purdue Univ.

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, the necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.

  11. The LAILAPS search engine: a feature model for relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

    2010-03-25

    Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  12. CLIP: similarity searching of 3D databases using clique detection.

    PubMed

    Rhodes, Nicholas; Willett, Peter; Calvet, Alain; Dunbar, James B; Humblet, Christine

    2003-01-01

    This paper describes a program for 3D similarity searching, called CLIP (for Candidate Ligand Identification Program), that uses the Bron-Kerbosch clique detection algorithm to find those structures in a file that have large structures in common with a target structure. Structures are characterized by the geometric arrangement of pharmacophore points and the similarity between two structures calculated using modifications of the Simpson and Tanimoto association coefficients. This modification takes into account the fact that a distance tolerance is required to ensure that pairs of interatomic distances can be regarded as equivalent during the clique-construction stage of the matching algorithm. Experiments with HIV assay data demonstrate the effectiveness and the efficiency of this approach to virtual screening.

  13. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases.

    PubMed

    Pols, David H J; Bramer, Wichor M; Bindels, Patrick J E; van de Laar, Floris A; Bohnen, Arthur M

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited.

  14. Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases

    NASA Astrophysics Data System (ADS)

    Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng

    2013-03-01

    Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.

  15. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search

    PubMed Central

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases PMID:26719890

  16. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.

    2009-05-06

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

  17. Planning for End-User Database Searching: Drexel and the Mac: A User-Consistent Interface.

    ERIC Educational Resources Information Center

    LaBorie, Tim; Donnelly, Leslie

    Drexel University instituted a microcomputing program in 1984 which required all freshmen to own Apple Macintosh microcomputers. All students were taught database searching on the BRS (Bibliographic Retrieval Services) system as part of the freshman humanities curriculum, and the university library was chosen as the site to house continuing…

  18. Parallel database search and prime factorization with magnonic holographic memory devices

    SciTech Connect

    Khitun, Alexander

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  19. Successful Keyword Searching: Initiating Research on Popular Topics Using Electronic Databases.

    ERIC Educational Resources Information Center

    MacDonald, Randall M.; MacDonald, Susan Priest

    Students are using electronic resources more than ever before to locate information for assignments. Without the proper search terms, results are incomplete, and students are frustrated. Using the keywords, key people, organizations, and Web sites provided in this book and compiled from the most commonly used databases, students will be able to…

  20. Toward a public analysis database for LHC new physics searches using M ADA NALYSIS 5

    NASA Astrophysics Data System (ADS)

    Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.

    2015-02-01

    We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.

  1. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  2. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  3. Parallel database search and prime factorization with magnonic holographic memory devices

    NASA Astrophysics Data System (ADS)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  4. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  5. Systematic reviews and meta-analyses of traditional chinese medicine must search chinese databases to reduce language bias.

    PubMed

    Wu, Xin-Yin; Tang, Jin-Ling; Mao, Chen; Yuan, Jin-Qiu; Qin, Ying; Chung, Vincent C H

    2013-01-01

    Systematic reviews (SRs) that fail to search non-English databases may miss relevant studies and cause selection bias. The bias may be particularly severe in SRs of traditional Chinese medicine (TCM) as most randomized controlled trials (RCT) in TCM are published and accessible only in Chinese. In this study we investigated how often Chinese databases were not searched in SRs of TCM, how many trials were missed, and whether a bias may occur if Chinese databases were not searched. We searched 5 databases in English and 3 in Chinese for RCTs of Chinese herbal medicine for coronary artery disease and found that 96.64% (115/119) eligible studies could be identified only from Chinese databases. In a random sample of 80 Cochrane reviews on TCM, we found that Chinese databases were only searched in 43 or 53.75%, in which almost all the included studies were identified from Chinese databases. We also compared SRs of the same topic and found that they may draw a different conclusion if Chinese databases were not searched. In conclusion, an overwhelmingly high percentage of eligible trials on TCM could only be identified in Chinese databases. Reviewers in TCM are suggested to search Chinese databases to reduce potential selection bias.

  6. Studying Gene Expression: Database Searches and Promoter Fusions to Investigate Transcriptional Regulation in Bacteria†

    PubMed Central

    Martinez-Vaz, Betsy M.; Makarevitch, Irina; Stensland, Shane

    2010-01-01

    A laboratory project was designed to illustrate how to search biological databases and utilize the information provided by these resources to investigate transcriptional regulation in Escherichia coli. The students searched several databases (NCBI Genomes, RegulonDB and EcoCyc) to learn about gene function, regulation, and the organization of transcriptional units. A fluorometer and GFP promoter fusions were used to obtain fluorescence data and measure changes in transcriptional activity. The class designed and performed experiments to investigate the regulation of genes necessary for biosynthesis of amino acids and how expression is affected by environmental signals and transcriptional regulators. Assessment data showed that this activity enhanced students’ knowledge of databases, reporter genes and transcriptional regulation. PMID:23653697

  7. Multimedia explorer: image database, image proxy-server and search-engine.

    PubMed Central

    Frankewitsch, T.; Prokosch, U.

    1999-01-01

    Multimedia plays a major role in medicine. Databases containing images, movies or other types of multimedia objects are increasing in number, especially on the WWW. However, no good retrieval mechanism or search engine currently exists to efficiently track down such multimedia sources in the vast of information provided by the WWW. Secondly, the tools for searching databases are usually not adapted to the properties of images. HTML pages do not allow complex searches. Therefore establishing a more comfortable retrieval involves the use of a higher programming level like JAVA. With this platform independent language it is possible to create extensions to commonly used web browsers. These applets offer a graphical user interface for high level navigation. We implemented a database using JAVA objects as the primary storage container which are then stored by a JAVA controlled ORACLE8 database. Navigation depends on a structured vocabulary enhanced by a semantic network. With this approach multimedia objects can be encapsulated within a logical module for quick data retrieval. PMID:10566463

  8. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.

  9. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    PubMed

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  10. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  11. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-05

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds.

  12. BioSCAN: a network sharable computational resource for searching biosequence databases.

    PubMed

    Singh, R K; Hoffman, D L; Tell, S G; White, C T

    1996-06-01

    We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations.

  13. Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

    NASA Astrophysics Data System (ADS)

    Kutzleb, C. D.

    1997-02-01

    The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.

  14. A hybrid approach for addressing ring flexibility in 3D database searching.

    PubMed

    Sadowski, J

    1997-01-01

    A hybrid approach for flexible 3D database searching is presented that addresses the problem of ring flexibility. It combines the explicit storage of up to 25 multiple conformations of rings, with up to eight atoms, generated by the 3D structure generator CORINA with the power of a torsional fitting technique implemented in the 3D database system UNITY. A comparison with the original UNITY approach, using a database with about 130,000 entries and five different pharmacophore queries, was performed. The hybrid approach scored, on an average, 10-20% more hits than the reference run. Moreover, specific problems with unrealistic hit geometries produced by the original approach can be excluded. In addition, the influence of the maximum number of ring conformations per molecule was investigated. An optimal number of 10 conformations per molecule is recommended.

  15. Fast multiresolution search algorithm for optimal retrieval in large multimedia databases

    NASA Astrophysics Data System (ADS)

    Song, Byung C.; Kim, Myung J.; Ra, Jong Beom

    1999-12-01

    Most of the content-based image retrieval systems require a distance computation for each candidate image in the database. As a brute-force approach, the exhaustive search can be employed for this computation. However, this exhaustive search is time-consuming and limits the usefulness of such systems. Thus, there is a growing demand for a fast algorithm which provides the same retrieval results as the exhaustive search. In this paper, we prose a fast search algorithm based on a multi-resolution data structure. The proposed algorithm computes the lower bound of distance at each level and compares it with the latest minimum distance, starting from the low-resolution level. Once it is larger than the latest minimum distance, we can exclude the candidates without calculating the full- resolution distance. By doing this, we can dramatically reduce the total computational complexity. It is noticeable that the proposed fast algorithm provides not only the same retrieval results as the exhaustive search, but also a faster searching ability than existing fast algorithms. For additional performance improvement, we can easily combine the proposed algorithm with existing tree-based algorithms. The algorithm can also be used for the fast matching of various features such as luminance histograms, edge images, and local binary partition textures.

  16. A grammar based methodology for structural motif finding in ncRNA database search.

    PubMed

    Quest, Daniel; Tapprich, William; Ali, Hesham

    2007-01-01

    In recent years, sequence database searching has been conducted through local alignment heuristics, pattern-matching, and comparison of short statistically significant patterns. While these approaches have unlocked many clues as to sequence relationships, they are limited in that they do not provide context-sensitive searching capabilities (e.g. considering pseudoknots, protein binding positions, and complementary base pairs). Stochastic grammars (hidden Markov models HMMs and stochastic context-free grammars SCFG) do allow for flexibility in terms of local context, but the context comes at the cost of increased computational complexity. In this paper we introduce a new grammar based method for searching for RNA motifs that exist within a conserved RNA structure. Our method constrains computational complexity by using a chain of topology elements. Through the use of a case study we present the algorithmic approach and benchmark our approach against traditional methods.

  17. Pharmacophore modeling and three-dimensional database searching for drug design using catalyst.

    PubMed

    Kurogi, Y; Güner, O F

    2001-07-01

    Perceiving a pharmacophore is the first essential step towards understanding the interaction between a receptor and a ligand. Once a pharmacophore is established, a beneficial use of it is 3D database searching to retrieve novel compounds that would match the pharmacophore, without necessarily duplicating the topological features of known active compounds (hence remain independent of existing patents). As the 3D searching technology has evolved over the years, it has been effectively used for lead optimization, combinatorial library focusing, as well as virtual high-throughput screening. Clearly established as one of the successful computational tools in rational drug design, we present in this review article a brief history of the evolution of this technology and detailed algorithms of Catalyst, the latest 3D searching software to be released. We also provide brief summary of published successes with this technology, including two recent patent applications.

  18. The effect of wild card designations and rare alleles in forensic DNA database searches.

    PubMed

    Tvedebrink, Torben; Bright, Jo-Anne; Buckleton, John S; Curran, James M; Morling, Niels

    2015-05-01

    Forensic DNA databases are powerful tools used for the identification of persons of interest in criminal investigations. Typically, they consist of two parts: (1) a database containing DNA profiles of known individuals and (2) a database of DNA profiles associated with crime scenes. The risk of adventitious or chance matches between crimes and innocent people increases as the number of profiles within a database grows and more data is shared between various forensic DNA databases, e.g. from different jurisdictions. The DNA profiles obtained from crime scenes are often partial because crime samples may be compromised in quantity or quality. When an individual's profile cannot be resolved from a DNA mixture, ambiguity is introduced. A wild card, F, may be used in place of an allele that has dropped out or when an ambiguous profile is resolved from a DNA mixture. Variant alleles that do not correspond to any marker in the allelic ladder or appear above or below the extent of the allelic ladder range are assigned the allele designation R for rare allele. R alleles are position specific with respect to the observed/unambiguous allele. The F and R designations are made when the exact genotype has not been determined. The F and R designation are treated as wild cards for searching, which results in increased chance of adventitious matches. We investigated the probability of adventitious matches given these two types of wild cards.

  19. The Use of Research Electronic Data Capture (REDCap) Software to Create a Database of Librarian-Mediated Literature Searches

    PubMed Central

    LYON, JENNIFER A.; GARCIA-MILIAN, ROLANDO; NORTON, HANNAH F.; TENNANT, MICHELE R.

    2015-01-01

    Expert-mediated literature searching, a keystone service in biomedical librarianship, would benefit significantly from regular methodical review. This paper describes the novel use of Research Electronic Data Capture (REDCap) software to create a database of literature searches conducted at a large academic health sciences library. An archive of paper search requests was entered into REDCap, and librarians now prospectively enter records for current searches. Having search data readily available allows librarians to reuse search strategies and track their workload. In aggregate, this data can help guide practice and determine priorities by identifying users’ needs, tracking librarian effort, and focusing librarians’ continuing education. PMID:25023012

  20. Searching for patterns in remote sensing image databases using neural networks

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1995-01-01

    We have investigated a method, based on a successful neural network multispectral image classification system, of searching for single patterns in remote sensing databases. While defining the pattern to search for and the feature to be used for that search (spectral, spatial, temporal, etc.) is challenging, a more difficult task is selecting competing patterns to train against the desired pattern. Schemes for competing pattern selection, including random selection and human interpreted selection, are discussed in the context of an example detection of dense urban areas in Landsat Thematic Mapper imagery. When applying the search to multiple images, a simple normalization method can alleviate the problem of inconsistent image calibration. Another potential problem, that of highly compressed data, was found to have a minimal effect on the ability to detect the desired pattern. The neural network algorithm has been implemented using the PVM (Parallel Virtual Machine) library and nearly-optimal speedups have been obtained that help alleviate the long process of searching through imagery.

  1. Data Analysis Provenance: Use Case for Exoplanet Search in CoRoT Database

    NASA Astrophysics Data System (ADS)

    de Souza, L.; Salete Marcon Gomes Vaz, M.; Emílio, M.; Ferreira da Rocha, J. C.; Janot Pacheco, E.; Carlos Boufleur, R.

    2012-09-01

    CoRoT (COnvection Rotation and Planetary Transits) is a mission led by the French national space agency CNES, in collaboration with Austria, Spain, Germany, Belgium and Brazil. The mission priority is dedicated to exoplanet search and stellar seismology. CoRoT light curves database became public after one year of their delivery to the CoRoT Co-Is, following the CoRoT data policy. The CoRoT archive contains thousands of light curves in FITS format. Several exoplanet search algorithms require detrend algorithms to remove both stellar and instrumental signal, improving the chance to detect a transit. Different detrend and transit detection algorithms can be applied to the same database. Tracking the origin of the information and how the data was derived in each level in the data analysis process is essential to allow sharing, reuse, reprocessing and further analysis. This work aims at applying a formalized and codified knowledge model by means of domain ontology. It allows to enrich the data analysis with semantic and standardization. It holds the provenance information in the database for a posteriori recovers by humans or software agents.

  2. Exploring Multidisciplinary Data Sets through Database Driven Search Capabilities and Map-Based Web Services

    NASA Astrophysics Data System (ADS)

    O'Hara, S.; Ferrini, V.; Arko, R.; Carbotte, S. M.; Leung, A.; Bonczkowski, J.; Goodwillie, A.; Ryan, W. B.; Melkonian, A. K.

    2008-12-01

    Relational databases containing geospatially referenced data enable the construction of robust data access pathways that can be customized to suit the needs of a diverse user community. Web-based search capabilities driven by radio buttons and pull-down menus can be generated on-the-fly leveraging the power of the relational database and providing specialists a means of discovering specific data and data sets. While these data access pathways are sufficient for many scientists, map-based data exploration can also be an effective means of data discovery and integration by allowing users to rapidly assess the spatial co- registration of several data types. We present a summary of data access tools currently provided by the Marine Geoscience Data System (www.marine-geo.org) that are intended to serve a diverse community of users and promote data integration. Basic search capabilities allow users to discover data based on data type, device type, geographic region, research program, expedition parameters, personnel and references. In addition, web services are used to create database driven map interfaces that provide live access to metadata and data files.

  3. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

    PubMed Central

    Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336

  4. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  5. Associative memory model for searching an image database by image snippet

    NASA Astrophysics Data System (ADS)

    Khan, Javed I.; Yun, David Y.

    1994-09-01

    This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.

  6. Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.

    PubMed

    Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery

    2009-01-01

    We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).

  7. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  8. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  9. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.

    PubMed

    Blakeley, Paul; Overton, Ian M; Hubbard, Simon J

    2012-11-02

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  10. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  11. MHC-I ligand discovery using targeted database searches of mass spectrometry data: Implications for T cell immunotherapies.

    PubMed

    Murphy, John Patrick; Konda, Prathyusha; Kowalewski, Daniel J; Schuster, Heiko; Clements, Derek; Kim, Youra; Cohen, Alejandro Martin; Sharif, Tanveer; Nielsen, Morten; Stevanović, Stefan; Lee, Patrick W; Gujar, Shashi

    2017-02-28

    Class I major histocompatibility complex I (MHC-I)-bound peptide ligands dictate the activation and specificity of CD8+ T-cells, and thus are important for devising T cell immunotherapies. In recent times, advances in mass spectrometry (MS) have enabled the precise identification of these MHC-I peptides wherein MS spectra are compared against a reference proteome. Unfortunately, matching these spectra to reference proteome databases is hindered by inflated search spaces attributed to a lack of enzyme restriction in the searches, limiting the efficiency with which MHC ligands are discovered. Here, we offer a solution to this problem whereby we developed a targeted database search approach, and accompanying tool SpectMHC, that is based on a priori-predicted MHC-I peptides. We first validated the approach using mass spectrometry data from 2 different allotype-specific mouse antibodies for the C57BL/6 mouse background. We then developed allotype-specific HLA databases to search previously published MS datasets of human peripheral blood mononuclear cells (PBMCs). Using this targeted search strategy improved peptide identifications for both mouse and human ligandomes by greater than two-fold and is superior to traditional "no enzyme" searches of reference proteomes. Our novel targeted database search promises to uncover otherwise missed novel T cell epitopes of therapeutic potential.

  12. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules—Search Options and Applications in Food Science

    PubMed Central

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-01-01

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs. PMID:27929431

  13. Internet Databases of the Properties, Enzymatic Reactions, and Metabolism of Small Molecules-Search Options and Applications in Food Science.

    PubMed

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Bucholska, Justyna; Starowicz, Piotr; Czyrko, Emilia

    2016-12-06

    Internet databases of small molecules, their enzymatic reactions, and metabolism have emerged as useful tools in food science. Database searching is also introduced as part of chemistry or enzymology courses for food technology students. Such resources support the search for information about single compounds and facilitate the introduction of secondary analyses of large datasets. Information can be retrieved from databases by searching for the compound name or structure, annotating with the help of chemical codes or drawn using molecule editing software. Data mining options may be enhanced by navigating through a network of links and cross-links between databases. Exemplary databases reviewed in this article belong to two classes: tools concerning small molecules (including general and specialized databases annotating food components) and tools annotating enzymes and metabolism. Some problems associated with database application are also discussed. Data summarized in computer databases may be used for calculation of daily intake of bioactive compounds, prediction of metabolism of food components, and their biological activity as well as for prediction of interactions between food component and drugs.

  14. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  15. Fine-grained Database Field Search Using Attribute-Based Encryption for E-Healthcare Clouds.

    PubMed

    Guo, Cheng; Zhuang, Ruhan; Jie, Yingmo; Ren, Yizhi; Wu, Ting; Choo, Kim-Kwang Raymond

    2016-11-01

    An effectively designed e-healthcare system can significantly enhance the quality of access and experience of healthcare users, including facilitating medical and healthcare providers in ensuring a smooth delivery of services. Ensuring the security of patients' electronic health records (EHRs) in the e-healthcare system is an active research area. EHRs may be outsourced to a third-party, such as a community healthcare cloud service provider for storage due to cost-saving measures. Generally, encrypting the EHRs when they are stored in the system (i.e. data-at-rest) or prior to outsourcing the data is used to ensure data confidentiality. Searchable encryption (SE) scheme is a promising technique that can ensure the protection of private information without compromising on performance. In this paper, we propose a novel framework for controlling access to EHRs stored in semi-trusted cloud servers (e.g. a private cloud or a community cloud). To achieve fine-grained access control for EHRs, we leverage the ciphertext-policy attribute-based encryption (CP-ABE) technique to encrypt tables published by hospitals, including patients' EHRs, and the table is stored in the database with the primary key being the patient's unique identity. Our framework can enable different users with different privileges to search on different database fields. Differ from previous attempts to secure outsourcing of data, we emphasize the control of the searches of the fields within the database. We demonstrate the utility of the scheme by evaluating the scheme using datasets from the University of California, Irvine.

  16. Designing novel nicotinic agonists by searching a database of molecular shapes

    NASA Astrophysics Data System (ADS)

    Sheridan, Robert P.; Venkataraghavan, R.

    1987-10-01

    We introduce an approach by which novel ligands can be designed for a receptor if a pharmacophore geometry has been established and the receptor-bound conformations of other ligands are known. We use the shape-matching method of Kuntz et al. [J. Mol. Biol., 161 (1982) 269-288] to search a database of molecular shapes for those molecules which can fit inside the combined volume of the known ligands and which have interatomic distances compatible with the pharmacophore geometry. Some of these molecules are then modified by interactive modeling techniques to better match the chemical properties of the known ligands. Our shape database (about 5000 candidate molecules) is derived from a subset of the Cambridge Crystallographic Database [Allen et al., Acta Crystallogr., Sect. B,35 (1979) 2331-2339]. We show, as an example, how several novel designs for nicotinic agonists can be derived by this approach, given a pharmacophore model derived from known agonists [Sheridan et al., J. Med. Chem., 29 (1986) 889-906]. This report complements our previous report [DesJarlais et al., J. Med. Chem., in press], which introduced a similar method for designing ligands when the structure of the receptor is known.

  17. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

    PubMed

    Rognes, T

    2001-04-01

    There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

  18. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  19. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    NASA Astrophysics Data System (ADS)

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-01

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb1/2Mn1/2)O3 as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb2O4; and (3) ferroelectric semiconductors with formula M2P2(S,Se)6. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties.

  20. Internet programs for drawing moth pheromone analogs and searching literature database.

    PubMed

    Byers, John A

    2002-04-01

    An Internet web page is described for organizing and analyzing information about lepidopteran sex pheromone components. Hypertext markup language (HTML) with JavaScript program code is used to draw moth pheromone analogs by combining GIF bitmap images for viewing by web browsers such as Netscape or Microsoft Intemet Explorer. Straight-chain hydrocarbons of 5-22 carbons with epoxides or unsaturated positions of E or Z geometrical configuration with several altemative functional groups can be drawn by simply checking menu bars or checkboxes representing chain length, E/Z unsaturation points, epoxide position and chirality, and optional functional groups. The functional group can be an aldehyde, alcohol, or ester of formate, acetate, propionate, or butyrate. The program is capable of drawing several million structures and naming them [e.g., (E,E)-8,10-dodecadien-1-ol and abbreviated as E8E10-12:OH]. A Java applet program run from the same page searches forthe presently drawn structure in an intemal database compiled from the Pherolist, and if the component is found, provides a textarea display of the families and species using the component. Links are automatically specified for drawn components if found in the Pherolist web site (maintained by H. Am). Windowed links can also be made to two other JavaScript programs that allow searches of a web site database with over 5900 research citations on lepidopteran semiochemicals and a calculator of vapor pressures of some moth sex pheromone analogs at a specified temperature. Various evolutionary and biosynthetic aspects are discussed in regard to the diversity of moth sex pheromone components.

  1. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  2. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  3. webPRC: the Profile Comparer for alignment-based searching of public domain databases.

    PubMed

    Brandt, Bernd W; Heringa, Jaap

    2009-07-01

    Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.

  4. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis.

  5. A method for fast database search for all k-nucleotide repeats.

    PubMed Central

    Benson, G; Waterman, M S

    1994-01-01

    A significant portion of DNA consists of repeating patterns of various sizes, from very small (one, two and three nucleotides) to very large (over 300 nucleotides). Although the functions of these repeating regions are not well understood, they appear important for understanding the expression, regulation and evolution of DNA. For example, increases in the number of trinucleotide repeats have been associated with human genetic disease, including Fragile-X mental retardation and Huntington's disease. Repeats are also useful as a tool in mapping and identifying DNA; the number of copies of a particular pattern at a site is often variable among individuals (polymorphic) and is therefore helpful in locating genes via linkage studies and also in providing DNA fingerprints of individuals. The number of repeating regions is unknown as is the distribution of pattern sizes. It would be useful to search for such regions in the DNA database in order that they may be studied more fully. The DNA database currently consists of approximately 150 million basepairs and is growing exponentially. Therefore, any program to look for repeats must be efficient and fast. In this paper, we present some new techniques that are useful in recognizing repeating patterns and describe a new program for rapidly detecting repeat regions in the DNA database where the basic unit of the repeat has size up to 32 nucleotides. It is our hope that the examples in this paper will illustrate the unrealized diversity of repeats in DNA and that the program we have developed will be a useful tool for locating new and interesting repeats. PMID:7984436

  6. MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites.

    PubMed

    Golovin, Adel; Dimitropoulos, Dimitris; Oldfield, Tom; Rachedi, Abdelkrim; Henrick, Kim

    2005-01-01

    The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. For each bound molecule the biological assembly of the quaternary structure has been used to determine all contact residues and a fast interactive search and retrieval system has been developed. Prosite pattern and short sequence search options are available together with a novel graphical query generator for inter-residue contacts. The database and its query interface are accessible from the Internet through a web server located at: http://www.ebi.ac.uk/msd-srv/msdsite.

  7. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  8. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  9. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database

    PubMed Central

    2007-01-01

    We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (κ, α) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity. PMID:17335583

  10. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  11. Searching for NEO precoveries in the PS1 and MPC databases

    NASA Astrophysics Data System (ADS)

    Weryk, Robert J.; Wainscoat, Richard J.

    2016-10-01

    The Pan-STARRS (PS1) survey telescope, operated by the University of Hawai`i, covers the sky north of -49 degrees declination with its seven square degree field-of-view. Described in detail by Wainscoat et al. (2015), it has become the leading telescope for new Near Earth Object (NEO) discoveries. In 2015, it found almost half of the new Near Earth Asteroids, as well as half of the new comets.Observations of potential NEOs must be followed up before they can be confirmed and announced as new discoveries, and we are dependent on the follow-up capabilities of other telescopes for this. However, not every NEO candidate is immediately followed up and linked into a well established orbit, possibly due to the fact that smaller bodies may not be visible at current instrument sensitivity limits for very long, or that their predicted orbits are too uncertain so follow-up telescopes look in the wrong location. But in certain cases, these objects may have been observed during previous lunations.We present a method to search for precovery detections in both the PS1 database, and the Isolated Tracklet File (ITF) provided by the Minor Planet Center (MPC). This file contains over 12 million detections mostly from the large surveys, which are not associated with any known objects. We demonstrate that multi-tracklet linkages for both known and unknown objects may be found in these databases, including detections for both NEOs and non-NEOs which often appear on the MPC's NEO Confirmation Page.[1] Wainscoat, R. et al., IAU Symposium 318, editors S. Chesley and R. Jedicke (2015)

  12. Introducing a New Interface for the Online MagIC Database by Integrating Data Uploading, Searching, and Visualization

    NASA Astrophysics Data System (ADS)

    Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.

    2013-12-01

    The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.

  13. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling

    PubMed Central

    Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivation: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Availability and Implementation: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/. Contact: chengji@missouri.edu Supplementary information: Supplementary data are

  14. SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

    NASA Astrophysics Data System (ADS)

    Ho, Chris M. W.; Marshall, Garland R.

    1993-12-01

    SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.

  15. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    PubMed

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed.

  16. Preparing College Students To Search Full-Text Databases: Is Instruction Necessary?

    ERIC Educational Resources Information Center

    Riley, Cheryl; Wales, Barbara

    Full-text databases allow Central Missouri State University's clients to access some of the serials that libraries have had to cancel due to escalating subscription costs; EbscoHost, the subject of this study, is one such database. The database is available free to all Missouri residents. A survey was designed consisting of 21 questions intended…

  17. Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach.

    PubMed

    Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R

    2011-11-01

    A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders.

  18. In search of a statistical probability model for petroleum-resource assessment : a critique of the probabilistic significance of certain concepts and methods used in petroleum-resource assessment : to that end, a probabilistic model is sketched

    USGS Publications Warehouse

    Grossling, Bernardo F.

    1975-01-01

    Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then

  19. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  20. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  1. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

    PubMed Central

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  2. Feasibility of LC/TOFMS and elemental database searching as a spectral library for pesticides in food.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Malato, Octavio; Fernández-Alba, Amadeo Rodriguez

    2006-11-01

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library-searching routines. However, many of the new polar and thermally labile pesticides are more readily and easily analysed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS libraries that can detect pesticides and their degradation products. This paper reports an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and an Access database of 350 pesticides that are amenable to positive ion electrospray. The approach differs from conventional library searching of fragment ions. The concept consists of three parts: (1) initial screening of possible pesticides in actual market-place fruit extracts (apple and orange) using accurate mass and generating an accurate mass via an automatic ion-extraction routine, (2) searching the Access database manually for screening identification of a pesticide, and (3) identification of the suspected compound by accurate mass of at least one fragment ion and comparison of retention time with an actual standard. Imazalil and iprodione were identified in apples and thiabendazole in oranges using this database approach.

  3. A continued search for transient events in the COBE DMR database simultaneous with cosmic gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Stacy, J. Gregory; Jackson, Peter D.; Bontekoe, Tj. Romke; Winkler, Christoph

    1996-08-01

    We report on the status of our ongoing project to search the database of the COBE Differential Microwave Radiometer (DMR) experiment for transient signals at microwave wavelengths simultaneous with cosmic gamma-ray bursts (GRBs). To date we have carried out a complete search of the DMR database using burst positions taken from the original BATSE 1B catalog for the eight-month period of overlap (May-December 1991) corresponding to the first public release of COBE data. We are currently repeating our original search of the COBE DMR database using the revised burst positions of the newly-released BATSE 3B catalog. Using BATSE 1B positions, at least two apparent simultaneous observations of GRBs by the COBE DMR occurred in 1991, along with a number of ``near misses'' within 30 seconds in time. At present, only upper limits to burst microwave emission are indicated. Even in the event of a non-detection of a GRB by the COBE DMR, unprecedented observational limits will still be obtained, constraining the predictions of the many theoretical models proposed to explain the origin of GRBs.

  4. A Multivariate Mixture Model to Estimate the Accuracy of Glycosaminoglycan Identifications Made by Tandem Mass Spectrometry (MS/MS) and Database Search.

    PubMed

    Chiu, Yulun; Schliekelman, Paul; Orlando, Ron; Sharp, Joshua S

    2017-02-01

    We present a statistical model to estimate the accuracy of derivatized heparin and heparan sulfate (HS) glycosaminoglycan (GAG) assignments to tandem mass (MS/MS) spectra made by the first published database search application, GAG-ID. Employing a multivariate expectation-maximization algorithm, this statistical model distinguishes correct from ambiguous and incorrect database search results when computing the probability that heparin/HS GAG assignments to spectra are correct based upon database search scores. Using GAG-ID search results for spectra generated from a defined mixture of 21 synthesized tetrasaccharide sequences as well as seven spectra of longer defined oligosaccharides, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly, ambiguously, and incorrectly assigned heparin/HS GAGs. This analysis makes it possible to filter large MS/MS database search results with predictable false identification error rates.

  5. Digital cloning: identification of human cDNAs homologous to novel kinases through expressed sequence tag database searching.

    PubMed

    Chen, H C; Kung, H J; Robinson, D

    1998-01-01

    Identification of novel kinases based on their sequence conservation within kinase catalytic domain has relied so far on two major approaches, low-stringency hybridization of cDNA libraries, and PCR method using degenerate primers. Both of these approaches at times are technically difficult and time-consuming. We have developed a procedure that can significantly reduce the time and effort involved in searching for novel kinases and increase the sensitivity of the analysis. This procedure exploits the computer analysis of a vast resource of human cDNA sequences represented in the expressed sequence tag (EST) database. Seventeen novel human cDNA clones showing significant homology to serine/threonine kinases, including STE-20, CDK- and YAK-related family kinases, were identified by searching EST database. Further sequence analysis of these novel kinases obtained either directly from EST clones or from PCR-RACE products confirmed their identity as protein kinases. Given the rapid accumulation of the EST database and the advent of powerful computer analysis software, this approach provides a fast, sensitive, and economical way to identify novel kinases as well as other genes from EST database.

  6. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins.

    PubMed

    Vensel, William H; Dupont, Frances M; Sloane, Stacia; Altenbach, Susan B

    2011-07-01

    While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.

  7. Param-Medic: A Tool for Improving MS/MS Database Search Yield by Optimizing Parameter Settings.

    PubMed

    May, Damon H; Tamura, Kaipo; Noble, William S

    2017-03-13

    In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.

  8. GHOSTX: an improved sequence homology search algorithm using a query suffix array and a database suffix array.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.

  9. Low template STR typing: effect of replicate number and consensus method on genotyping reliability and DNA database search results.

    PubMed

    Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia

    2011-08-01

    To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be

  10. Probabilistic Risk Assessment: A Bibliography

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Probabilistic risk analysis is an integration of failure modes and effects analysis (FMEA), fault tree analysis and other techniques to assess the potential for failure and to find ways to reduce risk. This bibliography references 160 documents in the NASA STI Database that contain the major concepts, probabilistic risk assessment, risk and probability theory, in the basic index or major subject terms, An abstract is included with most citations, followed by the applicable subject terms.

  11. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  12. Reach for Reference. Don't Judge a Database by Its Search Screen

    ERIC Educational Resources Information Center

    Safford, Barbara Ripp

    2005-01-01

    In this column, the author provides a description and brief review of the "Children's Literature Comprehensive Database" (CLCD). This subscription database is a 1999 spinoff from Marilyn Courtot's "Children's Literature" website, which began in 1993 and is a free resource of reviews and features about books, authors, and illustrators. The separate…

  13. Closing the loop in cortically-coupled computer vision: a brain-computer interface for searching image databases.

    PubMed

    Pohlmeyer, Eric A; Wang, Jun; Jangraw, David C; Lou, Bin; Chang, Shih-Fu; Sajda, Paul

    2011-06-01

    We describe a closed-loop brain-computer interface that re-ranks an image database by iterating between user generated 'interest' scores and computer vision generated visual similarity measures. The interest scores are based on decoding the electroencephalographic (EEG) correlates of target detection, attentional shifts and self-monitoring processes, which result from the user paying attention to target images interspersed in rapid serial visual presentation (RSVP) sequences. The highest scored images are passed to a semi-supervised computer vision system that reorganizes the image database accordingly, using a graph-based representation that captures visual similarity between images. The system can either query the user for more information, by adaptively resampling the database to create additional RSVP sequences, or it can converge to a 'done' state. The done state includes a final ranking of the image database and also a 'guess' of the user's chosen category of interest. We find that the closed-loop system's re-rankings can substantially expedite database searches for target image categories chosen by the subjects. Furthermore, better reorganizations are achieved than by relying on EEG interest rankings alone, or if the system were simply run in an open loop format without adaptive resampling.

  14. Lead generation using pharmacophore mapping and three-dimensional database searching: application to muscarinic M(3) receptor antagonists.

    PubMed

    Marriott, D P; Dougall, I G; Meghani, P; Liu, Y J; Flower, D R

    1999-08-26

    By using a pharmacophore model, a geometrical representation of the features necessary for molecules to show a particular biological activity, it is possible to search databases containing the 3D structures of molecules and identify novel compounds which may possess this activity. We describe our experiences of establishing a working 3D database system and its use in rational drug design. By using muscarinic M(3) receptor antagonists as an example, we show that it is possible to identify potent novel lead compounds using this approach. Pharmacophore generation based on the structures of known M(3) receptor antagonists, 3D database searching, and medium-throughput screening were used to identify candidate compounds. Three compounds were chosen to define the pharmacophore: a lung-selective M(3) antagonist patented by Pfizer and two Astra compounds which show affinity at the M(3) receptor. From these, a pharmacophore model was generated, using the program DISCO, and this was used subsequently to search a UNITY 3D database of proprietary compounds; 172 compounds were found to fit the pharmacophore. These compounds were then screened, and 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone (pA(2) 6.67) was identified as the best hit, with N-[2-(piperidin-1-ylmethyl)cycohexyl]-2-propoxybenz amide (pA(2) 4. 83) and phenylcarbamic acid 2-(morpholin-4-ylmethyl)cyclohexyl ester (pA(2) 5.54) demonstrating lower activity. As well as its potency, 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone is a simple structure with limited similarity to existing M(3) receptor antagonists.

  15. An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.

    PubMed

    Guturu, Parthasarathy; Dantu, Ram

    2008-06-01

    Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite.

  16. Published and perished? The influence of the searched protein database on the long-term storage of proteomics data.

    PubMed

    Griss, Johannes; Côté, Richard G; Gerner, Christopher; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2011-09-01

    In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), UniProt Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein

  17. Boolean Logic: An Aid for Searching Computer Databases in Special Education and Rehabilitation.

    ERIC Educational Resources Information Center

    Summers, Edward G.

    1989-01-01

    The article discusses using Boolean logic as a tool for searching computerized information retrieval systems in special education and rehabilitation technology. It includes discussion of the Boolean search operators AND, OR, and NOT; Venn diagrams; and disambiguating parentheses. Six suggestions are offered for development of good Boolean logic…

  18. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.; Genevey, A.; Delaney, R.; Baker, P.; Sbarbori, E.

    2005-12-01

    The Magnetics Information Consortium (MagIC) operates an online relational database including both rock and paleomagnetic data. The goal of MagIC is to store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. These nodes provide basic search capabilities based on location, reference, methods applied, material type and geological age, while allowing the user to drill down from sites all the way to the measurements. At each stage, the data can be saved and, if the available data supports it, the data can be visualized by plotting equal area plots, VGP location maps or typical Zijderveld, hysteresis, FORC, and various magnetization and remanence diagrams. All plots are made in SVG (scalable vector graphics) and thus can be saved and easily read into the user's favorite graphics programs without loss of resolution. User contributions to the MagIC database are critical to achieve a useful research tool. We have developed a standard data and metadata template (version 1.6) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate easy population of these templates within Microsoft Excel. These tools allow for the import/export of text files and they provide advanced functionality to manage/edit the data, and to perform various internal checks to high grade the data and to make them ready for uploading. The uploading is all done online by using the MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm that takes only a few minutes to process a contribution of approximately 5,000 data records. After uploading these standardized MagIC template files will be stored in the

  19. Federated Search Tools in Fusion Centers: Bridging Databases in the Information Sharing Environment

    DTIC Science & Technology

    2012-09-01

    German and Jay Stanley , “Fusion Center Update,” American Civil Liberties Union, July 2008, http://www.aclu.org/files/pdfs/privacy...Intelligence, ed. Jennifer E. Sims and Burton Gerber (Washington DC: Georgetown University Press, 2005), 107. 16 Ibid. 11 through a federated search tool...SurveyMonkey. Last modified June 23, 2012. https://www.surveymonkey.com/s/FederatedSearchToolsinFCs. German, Mike and Jay Stanley . “Fusion Center

  20. Code optimization of the subroutine to remove near identical matches in the sequence database homology search tool PSI-BLAST.

    PubMed

    Aspnäs, Mats; Mattila, Kimmo; Osowski, Kristoffer; Westerholm, Jan

    2010-06-01

    A central task in protein sequence characterization is the use of a sequence database homology search tool to find similar protein sequences in other individuals or species. PSI-BLAST is a widely used module of the BLAST package that calculates a position-specific score matrix from the best matching sequences and performs iterated searches using a method to avoid many similar sequences for the score. For some queries and parameter settings, PSI-BLAST may find many similar high-scoring matches, and therefore up to 80% of the total run time may be spent in this procedure. In this article, we present code optimizations that improve the cache utilization and the overall performance of this procedure. Measurements show that, for queries where the number of similar matches is high, the optimized PSI-BLAST program may be as much as 2.9 times faster than the original program.

  1. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S. A.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.

    2006-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all measurements and the derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. The query result set is displayed in a digestible tabular format allowing the user to descend through hierarchical levels such as from locations to sites, samples, specimens, and measurements. At each stage, the result set can be saved and, if supported by the data, can be visualized by plotting global location maps, equal area plots, or typical Zijderveld, hysteresis, and various magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (Version 2.1) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload and takes only a few minutes to process several thousand data records. The standardized MagIC template files are stored in the digital archives of EarthRef.org where they

  2. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  3. Online/CD-ROM Bibliographic Database Searching in a Small Academic Library.

    ERIC Educational Resources Information Center

    Pitet, Lynn T.

    The purpose of the project described in this paper was to gather information about online/CD-ROM database systems that would be useful in improving the services offered at the University of Findlay, a small private liberal arts college in northwestern Ohio. A survey was sent to 67 libraries serving colleges similar in size which included questions…

  4. Fast 3D molecular superposition and similarity search in databases of flexible molecules

    NASA Astrophysics Data System (ADS)

    Krämer, Andreas; Horn, Hans W.; Rice, Julia E.

    2003-01-01

    We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.

  5. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  6. A Search for Nontriggered Gamma-Ray Bursts in the BATSE Database

    NASA Technical Reports Server (NTRS)

    Kommers, Jefferson M.; Lewin, Walter H. G.; Kouveliotou, Chryssa; VanParadus, Jan; Pendleton, Geoffrey N.; Meegan, Charles A.; Fishman, Gerald J.

    1997-01-01

    We describe a search of archival data from the Burst and Transient Source Experiment (BATSE). The purpose of the search is to find astronomically interesting transients that did not activate the burst-detection (or "trigger") system on board the spacecraft. Our search is sensitive to events with peak fluxes (on the 1.024 s timescale) that are lower by a factor of approximately 2 than can be detected with the on-board burst trigger. In a search of 345 days of archival data, we detected 91 events in the 50-300 keV range that resemble classical gamma-ray bursts but that did not activate the on-board burst trigger. We also detected 110 low-energy (25-50 keV) events of unknown origin that may include activity from' soft gamma repeater (SGR) 1806-20 and bursts and flares from X-ray binaries. This paper gives the occurrence times, estimated source directions, durations, peak fluxes, and fluences for the 91 gamma-ray burst candidates. The direction and intensity distributions of these bursts imply that the biases inherent in the on-board trigger mechanism have not significantly affected the completeness of the published BATSE gamma-ray burst catalogs.

  7. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  8. Exchange, interpretation, and database-search of ion mobility spectra supported by data format JCAMP-DX

    NASA Technical Reports Server (NTRS)

    Baumback, J. I.; Davies, A. N.; Vonirmer, A.; Lampen, P. H.

    1995-01-01

    To assist peak assignment in ion mobility spectrometry it is important to have quality reference data. The reference collection should be stored in a database system which is capable of being searched using spectral or substance information. We propose to build such a database customized for ion mobility spectra. To start off with it is important to quickly reach a critical mass of data in the collection. We wish to obtain as many spectra combined with their IMS parameters as possible. Spectra suppliers will be rewarded for their participation with access to the database. To make the data exchange between users and system administration possible, it is important to define a file format specially made for the requirements of ion mobility spectra. The format should be computer readable and flexible enough for extensive comments to be included. In this document we propose a data exchange format, and we would like you to give comments on it. For the international data exchange it is important, to have a standard data exchange format. We propose to base the definition of this format on the JCAMP-DX protocol, which was developed for the exchange of infrared spectra. This standard made by the Joint Committee on Atomic and Molecular Physical Data is of a flexible design. The aim of this paper is to adopt JCAMP-DX to the special requirements of ion mobility spectra.

  9. Searching the UVSP database and a list of experiments showing mass motions

    NASA Technical Reports Server (NTRS)

    Thompson, William

    1986-01-01

    Since the Solar Maximum Mission (SMM) satellite was launched, a large database has been built up of experiments using the Ultraviolet Spectrometer and Polarimeter (UVSP) instrument. Access to this database can be gained through the SMM Vax 750 computer at Goddard Space Flight Center. One useful way to do this is with a program called USEARCH. This program allows one to make a listing of different types of UVSP experiments. It is evident that this program is useful to those who would wish to make use of UVSP data, but who don't know what data is available. Therefore it was decided to include a short description of how to make use of the USEARCH program. Also described, but not included, is a listing of all UVSP experiments showing mass motions in prominences and filaments. This list was made with the aid of the USEARCH program.

  10. On-line biomedical databases-the best source for quick search of the scientific information in the biomedicine.

    PubMed

    Masic, Izet; Milinovic, Katarina

    2012-06-01

    Most of medical journals now has it's electronic version, available over public networks. Although there are parallel printed and electronic versions, and one other form need not to be simultaneously published. Electronic version of a journal can be published a few weeks before the printed form and must not has identical content. Electronic form of a journals may have an extension that does not contain a printed form, such as animation, 3D display, etc., or may have available fulltext, mostly in PDF or XML format, or just the contents or a summary. Access to a full text is usually not free and can be achieved only if the institution (library or host) enters into an agreement on access. Many medical journals, however, provide free access for some articles, or after a certain time (after 6 months or a year) to complete content. The search for such journals provide the network archive as High Wire Press, Free Medical Journals.com. It is necessary to allocate PubMed and PubMed Central, the first public digital archives unlimited collect journals of available medical literature, which operates in the system of the National Library of Medicine in Bethesda (USA). There are so called on- line medical journals published only in electronic form. It could be searched over on-line databases. In this paper authors shortly described about 30 data bases and short instructions how to make access and search the published papers in indexed medical journals.

  11. Image Content Engine (ICE): A System for Fast Image Database Searches

    SciTech Connect

    Brase, J M; Paglieroni, D W; Weinert, G F; Grant, C W; Lopez, A S; Nikolaev, S

    2005-03-22

    The Image Content Engine (ICE) is being developed to provide cueing assistance to human image analysts faced with increasingly large and intractable amounts of image data. The ICE architecture includes user configurable feature extraction pipelines which produce intermediate feature vector and match surface files which can then be accessed by interactive relational queries. Application of the feature extraction algorithms to large collections of images may be extremely time consuming and is launched as a batch job on a Linux cluster. The query interface accesses only the intermediate files and returns candidate hits nearly instantaneously. Queries may be posed for individual objects or collections. The query interface prompts the user for feedback, and applies relevance feedback algorithms to revise the feature vector weighting and focus on relevant search results. Examples of feature extraction and both model-based and search-by-example queries are presented.

  12. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  13. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

    PubMed

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  14. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  15. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  16. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  17. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  18. Chemical and biological warfare: General studies. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  19. Searching the databases: a quick look at Amazon and two other online catalogues.

    PubMed

    Potts, Hilary

    2003-01-01

    The Amazon Online Catalogue was compared with the Library of Congress Catalogue and the British Library Catalogue, both also available online, by searching on both neutral (Gay, Lesbian, Homosexual) and pejorative (Perversion, Sex Crime) subject terms, and also by searches using Boolean logic in an attempt to identify Lesbian Fiction items and religion-based anti-gay material. Amazon was much more likely to be the first port of call for non-academic enquiries. Although excluding much material necessary for academic research, it carried more information about the individual books and less historical homophobic baggage in its terminology than the great national catalogues. Its back catalogue of second-hand books outnumbered those in print. Current attitudes may partially be gauged by the relative numbers of titles published under each heading--e.g., there may be an inverse relationship between concern about child sex abuse and homophobia, more noticeable in U.S. because of the activities of the religious right.

  20. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing.

  1. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  2. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.

  3. Pivotal Role of Computers and Software in Mass Spectrometry - SEQUEST and 20 Years of Tandem MS Database Searching

    NASA Astrophysics Data System (ADS)

    Yates, John R.

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures.

  4. An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics

    PubMed Central

    Chick, Joel M.; Kolippakkam, Deepak; Nusinow, David P.; Zhai, Bo; Rad, Ramin; Huttlin, Edward L.; Gygi, Steven P.

    2015-01-01

    Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ±500 Da. In a proteome-wide dataset on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation, and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies often >90%. We conclude that at least one third of unassigned spectra arise from peptides with substoichiometric modifications. PMID:26076430

  5. FTP-Server for exchange, interpretation, and database-search of ion mobility spectra, literature, preprints and software

    NASA Technical Reports Server (NTRS)

    Baumbach, J. I.; Vonirmer, A.

    1995-01-01

    To assist current discussion in the field of ion mobility spectrometry, at the Institut fur Spectrochemie und angewandte Spektroskopie, Dortmund, start with 4th of December, 1994 work of an FTP-Server, available for all research groups at univerisities, institutes and research worker in industry. We support the exchange, interpretation, and database-search of ion mobility spectra through data format JCAMP-DS (Joint Committee on Atomic and Molecular Physical Data) as well as literature retrieval, pre-print, notice, and discussion board. We describe in general lines the entrance conditions, local addresses, and main code words. For further details, a monthly news report will be prepared for all common users. Internet email address for subscribing is included in document.

  6. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  7. Quality Control of Biomedicinal Allergen Products - Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses.

    PubMed

    Spiric, Jelena; Engin, Anna M; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied.

  8. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching.

    PubMed

    Howe, Douglas G; Bradford, Yvonne M; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-04

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, 'Fish' records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search.

  9. The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching

    PubMed Central

    Howe, Douglas G.; Bradford, Yvonne M.; Eagle, Anne; Fashena, David; Frazer, Ken; Kalita, Patrick; Mani, Prita; Martin, Ryan; Moxon, Sierra Taylor; Paddock, Holly; Pich, Christian; Ramachandran, Sridhar; Ruzicka, Leyla; Schaper, Kevin; Shao, Xiang; Singer, Amy; Toro, Sabrina; Van Slyke, Ceri; Westerfield, Monte

    2017-01-01

    The Zebrafish Model Organism Database (ZFIN; http://zfin.org) is the central resource for zebrafish (Danio rerio) genetic, genomic, phenotypic and developmental data. ZFIN curators provide expert manual curation and integration of comprehensive data involving zebrafish genes, mutants, transgenic constructs and lines, phenotypes, genotypes, gene expressions, morpholinos, TALENs, CRISPRs, antibodies, anatomical structures, models of human disease and publications. We integrate curated, directly submitted, and collaboratively generated data, making these available to zebrafish research community. Among the vertebrate model organisms, zebrafish are superbly suited for rapid generation of sequence-targeted mutant lines, characterization of phenotypes including gene expression patterns, and generation of human disease models. The recent rapid adoption of zebrafish as human disease models is making management of these data particularly important to both the research and clinical communities. Here, we describe recent enhancements to ZFIN including use of the zebrafish experimental conditions ontology, ‘Fish’ records in the ZFIN database, support for gene expression phenotypes, models of human disease, mutation details at the DNA, RNA and protein levels, and updates to the ZFIN single box search. PMID:27899582

  10. Utility of rapid database searching for quality assurance: 'detective work' in uncovering radiology coding and billing errors

    NASA Astrophysics Data System (ADS)

    Horii, Steven C.; Kim, Woojin; Boonn, William; Iyoob, Christopher; Maston, Keith; Coleman, Beverly G.

    2011-03-01

    When the first quarter of 2010 Department of Radiology statistics were provided to the Section Chiefs, the authors (SH, BC) were alarmed to discover that Ultrasound showed a decrease of 2.5 percent in billed examinations. This seemed to be in direct contradistinction to the experience of the ultrasound faculty members and sonographers. Their experience was that they were far busier than during the same quarter of 2009. The one exception that all acknowledged was the month of February, 2010 when several major winter storms resulted in a much decreased Hospital admission and Emergency Department visit rate. Since these statistics in part help establish priorities for capital budget items, professional and technical staffing levels, and levels of incentive salary, they are taken very seriously. The availability of a desktop, Web-based RIS database search tool developed by two of the authors (WK, WB) and built-in database functions of the ultrasound miniPACS, made it possible for us very rapidly to develop and test hypotheses for why the number of billable examinations was declining in the face of what experience told the authors was an increasing number of examinations being performed. Within a short time, we identified the major cause as errors on the part of the company retained to verify billable Current Procedural Terminology (CPT) codes against ultrasound reports. This information is being used going forward to recover unbilled examinations and take measures to reduce or eliminate the types of coding errors that resulted in the problem.

  11. A Cautionary Tale on the Inclusion of Variable Posttranslational Modifications in Database-Dependent Searches of Mass Spectrometry Data.

    PubMed

    Svozil, J; Baerenfaller, K

    2017-01-01

    Mass spectrometry-based proteomics allows in principle the identification of unknown target proteins of posttranslational modifications and the sites of attachment. Including a variety of posttranslational modifications in database-dependent searches of high-throughput mass spectrometry data holds the promise to gain spectrum assignments to modified peptides, thereby increasing the number of assigned spectra, and to identify potentially interesting modification events. However, these potential benefits come for the price of an increased search space, which can lead to reduced scores, increased score thresholds, and erroneous peptide spectrum matches. We have assessed here the advantages and disadvantages of including the variable posttranslational modifications methionine oxidation, protein N-terminal acetylation, cysteine carbamidomethylation, transformation of N-terminal glutamine to pyroglutamic acid (Gln→pyro-Glu), and deamidation of asparagine and glutamine. Based on calculations of local false discovery rates and comparisons to known features of the respective modifications, we recommend for searches of samples that were not enriched for specific posttranslational modifications to only include methionine oxidation, protein N-terminal acetylation, and peptide N-terminal Gln→pyro-Glu as variable modifications. The principle of the validation strategy adopted here can also be applied for assessing the inclusion of posttranslational modifications for differently prepared samples, or for additional modifications. In addition, we have reassessed the special properties of the ubiquitin footprint, which is the remainder of ubiquitin moieties attached to lysines after tryptic digest. We show here that the ubiquitin footprint often breaks off as neutral loss and that it can be distinguished from dicarbamidomethylation events.

  12. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  13. Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.

    PubMed

    Nahnsen, Sven; Bertsch, Andreas; Rahnenführer, Jörg; Nordheim, Alfred; Kohlbacher, Oliver

    2011-08-05

    Database search is a standard technique for identifying peptides from their tandem mass spectra. To increase the number of correctly identified peptides, we suggest a probabilistic framework that allows the combination of scores from different search engines into a joint consensus score. Central to the approach is a novel method to estimate scores for peptides not found by an individual search engine. This approach allows the estimation of p-values for each candidate peptide and their combination across all search engines. The consensus approach works better than any single search engine across all different instrument types considered in this study. Improvements vary strongly from platform to platform and from search engine to search engine. Compared to the industry standard MASCOT, our approach can identify up to 60% more peptides. The software for consensus predictions is implemented in C++ as part of OpenMS, a software framework for mass spectrometry. The source code is available in the current development version of OpenMS and can easily be used as a command line application or via a graphical pipeline designer TOPPAS.

  14. Recovery actions in PRA (probabilistic risk assessment) for the Risk Methods Integration and Evaluation Program (RMIEP): Volume 1, Development of the data-based method

    SciTech Connect

    Weston, L M; Whitehead, D W; Graves, N L

    1987-06-01

    In a probabilistic risk assessment (PRA) for a nuclear power plant, the analyst identifies a set of potential core damage events consisting of equipment failures and human errors and their estimated probabilities of occurrence. If operator recovery from an event within some specified time is considered, then the probability of this recovery can be included in the PRA. This report provides PRA analysts with an improved methodology for including recovery actions in a PRA. A recovery action can be divided into two distinct phases: a Diagnosis Phase (realizing that there is a problem with a critical parameter and deciding upon the correct course of action) and an Action Phase (physically accomplishing the required action). In this methodology, simulator data are used to estimate recovery probabilities for the diagnosis phase. Different time-reliability curves showing the probability of failure of diagnosis as a function of time from the compelling cue for the event are presented. These curves are based on simulator exercises, and the actions are grouped based upon their operational similarities. This is an improvement over existing diagnosis models that rely greatly upon subjective judgment to obtain such estimates. The action phase is modeled using estimates from available sources. The methodology also includes a recommendation on where and when to apply the recovery action in the PRA process.

  15. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data.

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-07-01

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.

  16. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data".

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-11-03

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics.

  17. Automatic sorting of toxicological information into the IUCLID (International Uniform Chemical Information Database) endpoint-categories making use of the semantic search engine Go3R.

    PubMed

    Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert

    2014-06-01

    The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks.

  18. Novel lead generation through hypothetical pharmacophore three-dimensional database searching: discovery of isoflavonoids as nonsteroidal inhibitors of rat 5 alpha-reductase.

    PubMed

    Chen, G S; Chang, C S; Kan, W M; Chang, C L; Wang, K C; Chern, J W

    2001-11-08

    A hypothetical pharmacophore of 5 alpha-reductase inhibitors was generated and served as a template in virtual screening. When the pharmacophore was used, eight isoflavone derivatives were characterized as novel potential nonsteroidal inhibitors of rat 5 alpha-reductase. This investigation has demonstrated a practical approach toward the development of lead compounds through a hypothetic pharmacophore via three-dimensional database searching.

  19. Conceptual changes arising from the use of a search interface developed for an elementary science curriculum database

    NASA Astrophysics Data System (ADS)

    Dwyer, William Michael

    1998-12-01

    The purpose of this study was to look for evidence of change in preservice elementary teachers, notions of science teaching after practice using a search interface for a database of elementary science curriculum materials. The Science Helper K--8 CD-ROM uses search criteria that include science content and process theme to provide appropriate science lessons for elementary educators. Training that took place when Science Helper was first disseminated revealed the possibility that notions about teaching science change with use of the resource. This study looked for evidence of conceptual change compatible with notions in recent reform materials, such as the National Science Education Standards. The study design consisted of a pretest-treatment-posttest model. The treatment included a brief training session in the use of Science Helper, followed by practical application, which consisted of finding appropriate lessons to form a science mini-unit. An analysis of covariance (ANCOVA), however, did not find significant differences between pretest and posttest scores for the treatment group. Study participants also wrote brief narratives about their experiences using Science Helper. A pattern analysis of the narratives found that most of the preservice teachers had positive experiences, saying the resource was easy to use and contained many interesting science activities. A closer examination of the comments revealed a subset of participants who expressed an understanding of the importance of criteria searches and the relatedness of the lessons produced. An ANCOVA of the treatment group controlling for pretest did not find significant differences between pretest and posttest scores for the group who expressed such understanding. Science Helper, with its affordances as a teacher resource, can be regarded as a "knowledge system" in a distributed environment. The interactions among people and material resources in a distributed environment results in a distributed

  20. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%.

  1. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  2. Pattern Recognition-Assisted Infrared Library Searching of the Paint Data Query Database to Enhance Lead Information from Automotive Paint Trace Evidence.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Weakley, Andrew

    2017-03-01

    Multilayered automotive paint fragments, which are one of the most complex materials encountered in the forensic science laboratory, provide crucial links in criminal investigations and prosecutions. To determine the origin of these paint fragments, forensic automotive paint examiners have turned to the paint data query (PDQ) database, which allows the forensic examiner to compare the layer sequence and color, texture, and composition of the sample to paint systems of the original equipment manufacturer (OEM). However, modern automotive paints have a thin color coat and this layer on a microscopic fragment is often too thin to obtain accurate chemical and topcoat color information. A search engine has been developed for the infrared (IR) spectral libraries of the PDQ database in an effort to improve discrimination capability and permit quantification of discrimination power for OEM automotive paint comparisons. The similarity of IR spectra of the corresponding layers of various records for original finishes in the PDQ database often results in poor discrimination using commercial library search algorithms. A pattern recognition approach employing pre-filters and a cross-correlation library search algorithm that performs both a forward and backward search has been used to significantly improve the discrimination of IR spectra in the PDQ database and thus improve the accuracy of the search. This improvement permits inter-comparison of OEM automotive paint layer systems using the IR spectra alone. Such information can serve to quantify the discrimination power of the original automotive paint encountered in casework and further efforts to succinctly communicate trace evidence to the courts.

  3. Astrobiological Complexity with Probabilistic Cellular Automata

    NASA Astrophysics Data System (ADS)

    Vukotić, Branislav; Ćirković, Milan M.

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  4. Astrobiological complexity with probabilistic cellular automata.

    PubMed

    Vukotić, Branislav; Ćirković, Milan M

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  5. Use of DNA profiles for investigation using a simulated national DNA database: Part II. Statistical and ethical considerations on familial searching.

    PubMed

    Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O

    2010-10-01

    Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching.

  6. Matching unknown empirical formulas to chemical structure using LC/MS TOF accurate mass and database searching: example of unknown pesticides on tomato skins.

    PubMed

    Thurman, E Michael; Ferrer, Imma; Fernández-Alba, Amadeo Rodriguez

    2005-03-04

    Traditionally, the screening of unknown pesticides in food has been accomplished by GC/MS methods using conventional library searching routines. However, many of the new polar and thermally labile pesticides and their degradates are more readily and easily analyzed by LC/MS methods and no searchable libraries currently exist (with the exception of some user libraries, which are limited). Therefore, there is a need for LC/MS approaches to detect unknown non-target pesticides in food. This report develops an identification scheme using a combination of LC/MS time-of-flight (accurate mass) and LC/MS ion trap MS (MS/MS) with searching of empirical formulas generated through accurate mass and a ChemIndex database or Merck Index database. The approach is different than conventional library searching of fragment ions. The concept here consists of four parts. First is the initial detection of a possible unknown pesticide in actual market-place vegetable extracts (tomato skins) using accurate mass and generating empirical formulas. Second is searching either the Merck Index database on CD (10,000 compounds) or the ChemIndex (77,000 compounds) for possible structures. Third is MS/MS of the unknown pesticide in the tomato-skin extract followed by fragment ion identification using chemical drawing software and comparison with accurate-mass ion fragments. Fourth is the verification with authentic standards, if available. Three examples of unknown, non-target pesticides are shown using a tomato-skin extract from an actual market place sample. Limitations of the approach are discussed including the use of A + 2 isotope signatures, extended databases, lack of authentic standards, and natural product unknowns in food extracts.

  7. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  8. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

    PubMed Central

    2010-01-01

    Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE

  9. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

    PubMed Central

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  10. On the applicability of probabilistics

    SciTech Connect

    Roth, P.G.

    1996-12-31

    GEAE`s traditional lifing approach, based on Low Cycle Fatigue (LCF) curves, is evolving for fracture critical powder metal components by incorporating probabilistic fracture mechanics analysis. Supporting this move is a growing validation database which convincingly demonstrates that probabilistics work given the right inputs. Significant efforts are being made to ensure the right inputs. For example, Heavy Liquid Separation (HLS) analysis has been developed to quantify and control inclusion content (1). Also, an intensive seeded fatigue program providing a model for crack initiation at inclusions is ongoing (2). Despite the optimism and energy, probabilistics are only tools and have limitations. Designing to low failure probabilities helps provide protection, but other strategies are needed to protect against surprises. A low risk design limit derived from a predicted failure distribution can lead to a high risk deployment if there are unaccounted-for deviations from analysis assumptions. Recognized deviations which are statistically quantifiable can be integrated into the probabilistic analysis (an advantage of the approach). When deviations are known to be possible but are not properly describable statistically, it may be more appropriate to maintain the traditional position of conservatively bounding relevant input parameters. Finally, safety factors on analysis results may be called for in cases where there is little experience supporting new design concepts or material applications (where unrecognized deviations might be expected).

  11. Asking Better Questions: How Presentation Formats Influence Information Search.

    PubMed

    Wu, Charley M; Meder, Björn; Filimon, Flavia; Nelson, Jonathan D

    2017-03-20

    While the influence of presentation formats have been widely studied in Bayesian reasoning tasks, we present the first systematic investigation of how presentation formats influence information search decisions. Four experiments were conducted across different probabilistic environments, where subjects (N = 2,858) chose between 2 possible search queries, each with binary probabilistic outcomes, with the goal of maximizing classification accuracy. We studied 14 different numerical and visual formats for presenting information about the search environment, constructed across 6 design features that have been prominently related to improvements in Bayesian reasoning accuracy (natural frequencies, posteriors, complement, spatial extent, countability, and part-to-whole information). The posterior variants of the icon array and bar graph formats led to the highest proportion of correct responses, and were substantially better than the standard probability format. Results suggest that presenting information in terms of posterior probabilities and visualizing natural frequencies using spatial extent (a perceptual feature) were especially helpful in guiding search decisions, although environments with a mixture of probabilistic and certain outcomes were challenging across all formats. Subjects who made more accurate probability judgments did not perform better on the search task, suggesting that simple decision heuristics may be used to make search decisions without explicitly applying Bayesian inference to compute probabilities. We propose a new take-the-difference (TTD) heuristic that identifies the accuracy-maximizing query without explicit computation of posterior probabilities. (PsycINFO Database Record

  12. Racial differences in the association between preoperative serum cholesterol and prostate cancer recurrence: results from the SEARCH database

    PubMed Central

    Allott, Emma H.; Howard, Lauren E.; Aronson, William J.; Terris, Martha K.; Kane, Christopher J.; Amling, Christopher L.; Cooperberg, Matthew R.; Freedland, Stephen J.

    2016-01-01

    Background Black men are disproportionately affected by both cardiovascular disease and prostate cancer. Epidemiologic evidence linking dyslipidemia, an established cardiovascular risk factor, and prostate cancer progression is mixed. As existing studies were conducted in predominantly non-black populations, research in black men is lacking. Methods We identified 628 black and 1,020 non-black men who underwent radical prostatectomy and never used statins before surgery in the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Median follow up was 2.9 years. The impact of preoperative hypercholesterolemia on risk of biochemical recurrence was examined using multivariable, race-stratified proportional hazards. In secondary analysis, we examined associations with low-density lipoprotein (LDL), high-density lipoprotein (HDL) and triglycerides, overall and among men with dyslipidemia. Results High cholesterol was associated with increased risk of recurrence in black (HRper10mg/dl 1.06; 95%CI 1.02–1.11) but not non-black men (HRper10mg/dl 0.99; 95%CI 0.95–1.03; p-interaction=0.011). Elevated triglycerides were associated with increased risk in both black and non-black men (HRper10mg/dl 1.02; 95%CI 1.00–1.03 and 1.02; 95%CI 1.00–1.02, respectively; p-interaction=0.458). There were no significant associations between LDL or HDL and recurrence risk in either race. Associations with cholesterol, LDL and triglycerides were similar among men with dyslipidemia, but low HDL was associated with increased risk of recurrence in black, but not non-black men with dyslipidemia (p-interaction=0.047). Conclusion Elevated cholesterol was a risk factor for recurrence in black but not non-black men, whereas high triglycerides were associated with increased risk regardless of race. Impact Significantly contrasting associations by race may provide insight into prostate cancer racial disparities. PMID:26809276

  13. Similarity searching in databases of flexible 3D structures using autocorrelation vectors derived from smoothed bounded distance matrices.

    PubMed

    Rhodes, Nicholas; Clark, David E; Willett, Peter

    2006-01-01

    This paper presents an exploratory study of a novel method for flexible 3-D similarity searching based on autocorrelation vectors and smoothed bounded distance matrices. Although the new approach is unable to outperform an existing 2-D similarity searching in terms of enrichment factors, it is able to retrieve different compounds at a given percentage of the hit-list and so may be a useful adjunct to other similarity searching methods.

  14. HMMER web server: interactive sequence similarity searching.

    PubMed

    Finn, Robert D; Clements, Jody; Eddy, Sean R

    2011-07-01

    HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.

  15. Integrating Boolean Queries in Conjunctive Normal Form with Probabilistic Retrieval Models.

    ERIC Educational Resources Information Center

    Losee, Robert M.; Bookstein, Abraham

    1988-01-01

    Presents a model that places Boolean database queries into conjunctive normal form, thereby allowing probabilistic ranking of documents and the incorporation of relevance feedback. Experimental results compare the performance of a sequential learning probabilistic retrieval model with the proposed integrated Boolean probabilistic model and a fuzzy…

  16. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We

  17. Searching for coexpressed genes in three-color cDNA microarray data using a probabilistic model-based Hough Transform.

    PubMed

    Tino, Peter; Zhao, Hongya; Yan, Hong

    2011-01-01

    The effects of a drug on the genomic scale can be assessed in a three-color cDNA microarray with the three color intensities represented through the so-called hexaMplot. In our recent study, we have shown that the Hough Transform (HT) applied to the hexaMplot can be used to detect groups of coexpressed genes in the normal-disease-drug samples. However, the standard HT is not well suited for the purpose because 1) the assayed genes need first to be hard-partitioned into equally and differentially expressed genes, with HT ignoring possible information in the former group; 2) the hexaMplot coordinates are negatively correlated and there is no direct way of expressing this in the standard HT and 3) it is not clear how to quantify the association of coexpressed genes with the line along which they cluster. We address these deficiencies by formulating a dedicated probabilistic model-based HT. The approach is demonstrated by assessing effects of the drug Rg1 on homocysteine-treated human umbilical vein endothetial cells. Compared with our previous study, we robustly detect stronger natural groupings of coexpressed genes. Moreover, the gene groups show coherent biological functions with high significance, as detected by the Gene Ontology analysis.

  18. Reduction in database search space by utilization of amino acid composition information from electron transfer dissociation and higher-energy collisional dissociation mass spectra.

    PubMed

    Hansen, Thomas A; Kryuchkov, Fedor; Kjeldsen, Frank

    2012-08-07

    With high-mass accuracy and consecutively obtained electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS), reliable (≥97%) and sensitive fragment ions have been extracted for identification of specific amino acid residues in peptide sequences. The analytical benefit of these specific amino acid composition (AAC) ions is to restrict the database search space and provide identification of peptides with higher confidence and reduced false negative rates. The 6706 uniquely identified peptide sequences determined with a conservative Mascot score of >30 were used to characterize the AAC ions. The loss of amino acid side chains (small neutral losses, SNLs) from the charge reduced peptide radical cations was studied using ETD. Complementary AAC information from HCD spectra was provided by immonium ions. From the ETD/HCD mass spectra, 5162 and 6720 reliable SNLs and immonium ions were successfully extracted, respectively. Automated application of the AAC information during database searching resulted in an average 3.5-fold higher confidence level of peptide identification. In addition, 4% and 28% more peptides were identified above the significance level in a standard and extended search space, respectively.

  19. The Object-analogue approach for probabilistic forecasting

    NASA Astrophysics Data System (ADS)

    Frediani, M. E.; Hopson, T. M.; Anagnostou, E. N.; Hacker, J.

    2015-12-01

    The object-analogue is a new method to estimate forecast uncertainty and to derive probabilistic predictions of gridded forecast fields over larger regions rather than point locations. The method has been developed for improving the forecast of 10-meter wind speed over the northeast US, and it can be extended to other forecast variables, vertical levels, and other regions. The object-analogue approach combines the analog post-processing technique (Hopson 2005; Hamill 2006; Delle Monache 2011) with the Method for Object-based Diagnostic Evaluation (MODE) for forecast verification (Davis et al 2006a, b). Originally, MODE is used to verify mainly precipitation forecasts using features of a forecast region represented by an object. The analog technique is used to reduce the NWP systematic and random errors of a gridded forecast field. In this study we use MODE-derived objects to characterize the wind fields forecasts into attributes such as object area, centroid location, and intensity percentiles, and apply the analogue concept to these objects. The object-analogue method uses a database of objects derived from reforecasts and their respective reanalysis. Given a real-time forecast field, it searches the database and selects the top-ranked objects with the most similar set of attributes using the MODE fuzzy logic algorithm for object matching. The attribute probabilities obtained with the set of selected object-analogues are used to derive a multi-layer probabilistic prediction. The attribute probabilities are combined into three uncertainty layers that address the main concerns of most applications: location, area, and magnitude. The multi-layer uncertainty can be weighted and combined or used independently in such that it provides a more accurate prediction, adjusted according to the application interest. In this study we present preliminary results of the object-analogue method. Using a database with one hundred storms we perform a leave-one-out cross-validation to

  20. JICST Factual Database JICST DNA Database

    NASA Astrophysics Data System (ADS)

    Shirokizawa, Yoshiko; Abe, Atsushi

    Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.

  1. Mixed deterministic and probabilistic networks.

    PubMed

    Mateescu, Robert; Dechter, Rina

    2008-11-01

    The paper introduces mixed networks, a new graphical model framework for expressing and reasoning with probabilistic and deterministic information. The motivation to develop mixed networks stems from the desire to fully exploit the deterministic information (constraints) that is often present in graphical models. Several concepts and algorithms specific to belief networks and constraint networks are combined, achieving computational efficiency, semantic coherence and user-interface convenience. We define the semantics and graphical representation of mixed networks, and discuss the two main types of algorithms for processing them: inference-based and search-based. A preliminary experimental evaluation shows the benefits of the new model.

  2. Mixed deterministic and probabilistic networks

    PubMed Central

    Dechter, Rina

    2010-01-01

    The paper introduces mixed networks, a new graphical model framework for expressing and reasoning with probabilistic and deterministic information. The motivation to develop mixed networks stems from the desire to fully exploit the deterministic information (constraints) that is often present in graphical models. Several concepts and algorithms specific to belief networks and constraint networks are combined, achieving computational efficiency, semantic coherence and user-interface convenience. We define the semantics and graphical representation of mixed networks, and discuss the two main types of algorithms for processing them: inference-based and search-based. A preliminary experimental evaluation shows the benefits of the new model. PMID:20981243

  3. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tandem mass spectrometry (MS/MS) is routinely used to identify proteins by comparing peptide spectra to those generated in silico from protein sequence databases. Wheat storage proteins (gliadins and glutenins) are difficult to distinguish by MS/MS as they have few cleavable tryptic sites, often res...

  4. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  5. Visualization Tools and Techniques for Search and Validation of Large Earth Science Spatial-Temporal Metadata Databases

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Herbert, A.; Kusterer, J.

    2014-12-01

    Spatial-temporal metadata databases are critical components of interactive data discovery services for ordering Earth Science datasets. The development staff at the Atmospheric Science Data Center (ASDC) works closely with satellite Earth Science mission teams such as CERES, CALIPSO, TES, MOPITT, and CATS to create and maintain metadata databases that are tailored to the data discovery needs of the Earth Science community. This presentation focuses on the visualization tools and techniques used by the ASDC software development team for data discovery and validation/optimization of spatial-temporal objects in large multi-mission spatial-temporal metadata databases. The following topics will be addressed: Optimizing the level of detail of spatial temporal metadata to provide interactive spatial query performance over a multi-year Earth Science mission Generating appropriately scaled sensor footprint gridded (raster) metadata from Level1 and Level2 Satellite and Aircraft time-series data granules Performance comparison of raster vs vector spatial granule footprint mask queries in large metadata database and a description of the visualization tools used to assist with this analysis

  6. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  7. Proteome analysis of Sorangium cellulosum employing 2D-HPLC-MS/MS and improved database searching strategies for CID and ETD fragment spectra.

    PubMed

    Leinenbach, Andreas; Hartmer, Ralf; Lubeck, Markus; Kneissl, Benny; Elnakady, Yasser A; Baessmann, Carsten; Müller, Rolf; Huber, Christian G

    2009-09-01

    Shotgun proteome analysis of the myxobacterial model strain for secondary metabolite biosynthesis Sorangium cellulosum was performed employing off-line two-dimensional high-pH reversed-phase HPLC x low-pH ion-pair reversed-phase HPLC and dual tandem mass spectrometry with collision-induced dissociation (CID) and electron transfer dissociation (ETD) as complementary fragmentation techniques. Peptide identification using database searching was optimized for ETD fragment spectra to obtain the maximum number of identifications at equivalent false discovery rates (1.0%) in the evaluation of both fragmentation techniques. In the database search of the CID MS/MS data, the mass tolerance was set to the well-established 0.3 Da window, whereas for ETD data, it was widened to 1.1 Da to account for hydrogen-rearrangement in the radical-intermediate of the peptide precursor ion. To achieve a false discovery rate comparable to the CID results, we increased the significance threshold for peptide identification to 0.001 for the ETD data. The ETD based analysis yielded about 74% of all peptides and about 78% of all proteins compared to the CID-method. In the combined data set, 952 proteins of S. cellulosum were confidently identified by at least two peptides per protein, facilitating the study of the function of regulatory proteins in the social myxobacteria and their role in secondary metabolism.

  8. Data manipulation in heterogeneous databases

    SciTech Connect

    Chatterjee, A.; Segev, A.

    1991-10-01

    Many important information systems applications require access to data stored in multiple heterogeneous databases. This paper examines a problem in inter-database data manipulation within a heterogeneous environment, where conventional techniques are no longer useful. To solve the problem, a broader definition for join operator is proposed. Also, a method to probabilistically estimate the accuracy of the join is discussed.

  9. Local image descriptor-based searching framework of usable similar cases in a radiation treatment planning database for stereotactic body radiotherapy

    NASA Astrophysics Data System (ADS)

    Nonaka, Ayumi; Arimura, Hidetaka; Nakamura, Katsumasa; Shioyama, Yoshiyuki; Soufi, Mazen; Magome, Taiki; Honda, Hiroshi; Hirata, Hideki

    2014-03-01

    Radiation treatment planning (RTP) of the stereotactic body radiotherapy (SBRT) was more complex compared with conventional radiotherapy because of using a number of beam directions. We reported that similar planning cases could be helpful for determination of beam directions for treatment planners, who have less experiences of SBRT. The aim of this study was to develop a framework of searching for usable similar cases to an unplanned case in a RTP database based on a local image descriptor. This proposed framework consists of two steps searching and rearrangement. In the first step, the RTP database was searched for 10 cases most similar to object cases based on the shape similarity of two-dimensional lung region at the isocenter plane. In the second step, the 5 most similar cases were selected by using geometric features related to the location, size and shape of the planning target volume, lung and spinal cord. In the third step, the selected 5 cases were rearranged by use of the Euclidean distance of a local image descriptor, which is a similarity index based on the magnitudes and orientations of image gradients within a region of interest around an isocenter. It was assumed that the local image descriptor represents the information around lung tumors related to treatment planning. The cases, which were selected as cases most similar to test cases by the proposed method, were more resemble in terms of the tumor location than those selected by a conventional method. For evaluation of the proposed method, we applied a similar-cases-based beam arrangement method developed in the previous study to the similar cases selected by the proposed method based on a linear registration. The proposed method has the potential to suggest the superior beam-arrangements from the treatment point of view.

  10. Accelerated Profile HMM Searches.

    PubMed

    Eddy, Sean R

    2011-10-01

    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

  11. VIEWCACHE: An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nick; Sellis, Timoleon

    1991-01-01

    The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.

  12. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  13. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

    SciTech Connect

    Kalyanaraman, Anantharaman; Cannon, William R.; Latt, Benjamin K.; Baxter, Douglas J.

    2011-11-01

    A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

  14. Search for 5'-leader regulatory RNA structures based on gene annotation aided by the RiboGap database.

    PubMed

    Naghdi, Mohammad Reza; Smail, Katia; Wang, Joy X; Wade, Fallou; Breaker, Ronald R; Perreault, Jonathan

    2017-03-15

    The discovery of noncoding RNAs (ncRNAs) and their importance for gene regulation led us to develop bioinformatics tools to pursue the discovery of novel ncRNAs. Finding ncRNAs de novo is challenging, first due to the difficulty of retrieving large numbers of sequences for given gene activities, and second due to exponential demands on calculation needed for comparative genomics on a large scale. Recently, several tools for the prediction of conserved RNA secondary structure were developed, but many of them are not designed to uncover new ncRNAs, or are too slow for conducting analyses on a large scale. Here we present various approaches using the database RiboGap as a primary tool for finding known ncRNAs and for uncovering simple sequence motifs with regulatory roles. This database also can be used to easily extract intergenic sequences of eubacteria and archaea to find conserved RNA structures upstream of given genes. We also show how to extend analysis further to choose the best candidate ncRNAs for experimental validation.

  15. Comparing the Hematopoetic Syndrome Time Course in the NHP Animal Model to Radiation Accident Cases From the Database Search.

    PubMed

    Graessle, Dieter H; Dörr, Harald; Bennett, Alexander; Shapiro, Alla; Farese, Ann M; MacVittie, Thomas J; Meineke, Viktor

    2015-11-01

    Since controlled clinical studies on drug administration for the acute radiation syndrome are lacking, clinical data of human radiation accident victims as well as experimental animal models are the main sources of information. This leads to the question of how to compare and link clinical observations collected after human radiation accidents with experimental observations in non-human primate (NHP) models. Using the example of granulocyte counts in the peripheral blood following radiation exposure, approaches for adaptation between NHP and patient databases on data comparison and transformation are introduced. As a substitute for studying the effects of administration of granulocyte-colony stimulating factor (G-CSF) in human clinical trials, the method of mathematical modeling is suggested using the example of G-CSF administration to NHP after total body irradiation.

  16. Detection and identification of heme c-modified peptides by histidine affinity chromatography, high-performance liquid chromatography-mass spectrometry, and database searching.

    PubMed

    Merkley, Eric D; Anderson, Brian J; Park, Jea; Belchik, Sara M; Shi, Liang; Monroe, Matthew E; Smith, Richard D; Lipton, Mary S

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed or, if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, two bacterial decaheme cytochromes, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded 3- to 6-fold more confident peptide-spectrum matches to heme c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for 4 of the 10 expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering 9 out of 10 sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 1×10(-4) was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  17. Detection and Identification of Heme c-Modified Peptides by Histidine Affinity Chromatography, High-Performance Liquid Chromatography-Mass Spectrometry, and Database Searching

    SciTech Connect

    Merkley, Eric D.; Anderson, Brian J.; Park, Jea H.; Belchik, Sara M.; Shi, Liang; Monroe, Matthew E.; Smith, Richard D.; Lipton, Mary S.

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry-(LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed, or if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, a bacterial decaheme cytochrome, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded three- to six-fold more confident peptide-spectrum matches to heme-c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for four of the ten expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering nine out of ten sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 10-4 was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  18. Probabilistic Structural Analysis Program

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.; Murthy, Pappu L. N.; Stefko, George L.; Riha, David S.; Thacker, Ben H.; Nagpal, Vinod K.; Mital, Subodh K.

    2010-01-01

    NASA/NESSUS 6.2c is a general-purpose, probabilistic analysis program that computes probability of failure and probabilistic sensitivity measures of engineered systems. Because NASA/NESSUS uses highly computationally efficient and accurate analysis techniques, probabilistic solutions can be obtained even for extremely large and complex models. Once the probabilistic response is quantified, the results can be used to support risk-informed decisions regarding reliability for safety-critical and one-of-a-kind systems, as well as for maintaining a level of quality while reducing manufacturing costs for larger-quantity products. NASA/NESSUS has been successfully applied to a diverse range of problems in aerospace, gas turbine engines, biomechanics, pipelines, defense, weaponry, and infrastructure. This program combines state-of-the-art probabilistic algorithms with general-purpose structural analysis and lifting methods to compute the probabilistic response and reliability of engineered structures. Uncertainties in load, material properties, geometry, boundary conditions, and initial conditions can be simulated. The structural analysis methods include non-linear finite-element methods, heat-transfer analysis, polymer/ceramic matrix composite analysis, monolithic (conventional metallic) materials life-prediction methodologies, boundary element methods, and user-written subroutines. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. NASA/NESSUS 6.2c is structured in a modular format with 15 elements.

  19. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  20. Robots for hazardous duties: Military, space, and nuclear facility applications. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1993-09-01

    The bibliography contains citations concerning the design and application of robots used in place of humans where the environment could be hazardous. Military applications include autonomous land vehicles, robotic howitzers, and battlefield support operations. Space operations include docking, maintenance, mission support, and intra-vehicular and extra-vehicular activities. Nuclear applications include operations within the containment vessel, radioactive waste operations, fueling operations, and plant security. Many of the articles reference control techniques and the use of expert systems in robotic operations. Applications involving industrial manufacturing, walking robots, and robot welding are cited in other published searches in this series. (Contains a minimum of 183 citations and includes a subject term index and title list.)

  1. Quality Control of Biomedicinal Allergen Products – Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses

    PubMed Central

    Spiric, Jelena; Engin, Anna M.; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied. PMID:26561299

  2. Discovery of novel aldose reductase inhibitors using a protein structure-based approach: 3D-database search followed by design and synthesis.

    PubMed

    Iwata, Y; Arisawa, M; Hamada, R; Kita, Y; Mizutani, M Y; Tomioka, N; Itai, A; Miyamoto, S

    2001-05-24

    Aldose reductase (AR) has been implicated in the etiology of diabetic complications. Due to the limited number of currently available drugs for the treatment of diabetic complications, we have carried out structure-based drug design and synthesis in an attempt to find new types of AR inhibitors. With the ADAM&EVE program, a three-dimensional database (ACD3D) was searched using the ligand binding site of the AR crystal structure. Out of 179 compounds selected through this search followed by visual inspection, 36 compounds were purchased and subjected to a biological assay. Ten compounds showed more than 40% inhibition of AR at a 15 microg/mL concentration. In a subsequent lead optimization, a series of analogues of the most active compound were synthesized based on the docking mode derived by ADAM&EVE. Many of these congeners exhibited higher activities compared to the mother compound. Indeed, the most potent, synthesized compound showed an approximately 20-fold increase in inhibitory activity (IC(50) = 0.21 vs 4.3 microM). Furthermore, a hydrophobic subsite was newly inferred, which would be useful for the design of inhibitors with improved affinity for AR.

  3. High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples.

    PubMed

    Madsen, James A; Xu, Hua; Robinson, Michelle R; Horton, Andrew P; Shaw, Jared B; Giles, David K; Kaoud, Tamer S; Dalby, Kevin N; Trent, M Stephen; Brodbelt, Jennifer S

    2013-09-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS(1) and MS(2) data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of

  4. Integration of an Evidence Base into a Probabilistic Risk Assessment Model. The Integrated Medical Model Database: An Organized Evidence Base for Assessing In-Flight Crew Health Risk and System Design

    NASA Technical Reports Server (NTRS)

    Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.

  5. Scopus database: a review.

    PubMed

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  6. Use of Probabilistic Topic Models for Search

    DTIC Science & Technology

    2009-09-01

    software god greek play medical computer christian zeus team acupuncture hardware chruch mythology player disease video jesus gods football pain disk...P̂ (w|T ) = KPK k=1 1 P (w| zk ) , in which K denotes the number of samples taken. The number of topics, that maximizes P̂ (w|T ) will then be accepted

  7. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    PubMed

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching.

  8. Web Search Engines: Search Syntax and Features.

    ERIC Educational Resources Information Center

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  9. Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

    PubMed Central

    Nawrocki, Eric P; Eddy, Sean R

    2007-01-01

    When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN 2.4 to LN 1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization. PMID:17397253

  10. Basics of Online Searching.

    ERIC Educational Resources Information Center

    Meadow, Charles T.; Cochrane, Pauline (Atherton)

    Intended to teach the principles of interactive bibliographic searching to those with little or no prior experience, this textbook explains the basic elements of online information retrieval and compares the major database search systems. Its chapters address (1) relevant definitions and vocabulary; (2) the conceptual facets of database searching,…

  11. Probabilistic Composite Design

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Probabilistic composite design is described in terms of a computational simulation. This simulation tracks probabilistically the composite design evolution from constituent materials, fabrication process, through composite mechanics and structural components. Comparisons with experimental data are provided to illustrate selection of probabilistic design allowables, test methods/specimen guidelines, and identification of in situ versus pristine strength, For example, results show that: in situ fiber tensile strength is 90% of its pristine strength; flat-wise long-tapered specimens are most suitable for setting ply tensile strength allowables: a composite radome can be designed with a reliability of 0.999999; and laminate fatigue exhibits wide-spread scatter at 90% cyclic-stress to static-strength ratios.

  12. Formalizing Probabilistic Safety Claims

    NASA Technical Reports Server (NTRS)

    Herencia-Zapana, Heber; Hagen, George E.; Narkawicz, Anthony J.

    2011-01-01

    A safety claim for a system is a statement that the system, which is subject to hazardous conditions, satisfies a given set of properties. Following work by John Rushby and Bev Littlewood, this paper presents a mathematical framework that can be used to state and formally prove probabilistic safety claims. It also enables hazardous conditions, their uncertainties, and their interactions to be integrated into the safety claim. This framework provides a formal description of the probabilistic composition of an arbitrary number of hazardous conditions and their effects on system behavior. An example is given of a probabilistic safety claim for a conflict detection algorithm for aircraft in a 2D airspace. The motivation for developing this mathematical framework is that it can be used in an automated theorem prover to formally verify safety claims.

  13. Probabilistic liquefaction triggering based on the cone penetration test

    USGS Publications Warehouse

    Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Tokimatsu, K.

    2005-01-01

    Performance-based earthquake engineering requires a probabilistic treatment of potential failure modes in order to accurately quantify the overall stability of the system. This paper is a summary of the application portions of the probabilistic liquefaction triggering correlations proposed recently proposed by Moss and co-workers. To enable probabilistic treatment of liquefaction triggering, the variables comprising the seismic load and the liquefaction resistance were treated as inherently uncertain. Supporting data from an extensive Cone Penetration Test (CPT)-based liquefaction case history database were used to develop a probabilistic correlation. The methods used to measure the uncertainty of the load and resistance variables, how the interactions of these variables were treated using Bayesian updating, and how reliability analysis was applied to produce curves of equal probability of liquefaction are presented. The normalization for effective overburden stress, the magnitude correlated duration weighting factor, and the non-linear shear mass participation factor used are also discussed.

  14. Probabilistic composite micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, T. A.; Bellini, P. X.; Murthy, P. L. N.; Chamis, C. C.

    1988-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material properties at the micro level. Regression results are presented to show the relative correlation between predicted and response variables in the study.

  15. Probabilistic Causation without Probability.

    ERIC Educational Resources Information Center

    Holland, Paul W.

    The failure of Hume's "constant conjunction" to describe apparently causal relations in science and everyday life has led to various "probabilistic" theories of causation of which the study by P. C. Suppes (1970) is an important example. A formal model that was developed for the analysis of comparative agricultural experiments…

  16. Probabilistic Threshold Criterion

    SciTech Connect

    Gresshoff, M; Hrousis, C A

    2010-03-09

    The Probabilistic Shock Threshold Criterion (PSTC) Project at LLNL develops phenomenological criteria for estimating safety or performance margin on high explosive (HE) initiation in the shock initiation regime, creating tools for safety assessment and design of initiation systems and HE trains in general. Until recently, there has been little foundation for probabilistic assessment of HE initiation scenarios. This work attempts to use probabilistic information that is available from both historic and ongoing tests to develop a basis for such assessment. Current PSTC approaches start with the functional form of the James Initiation Criterion as a backbone, and generalize to include varying areas of initiation and provide a probabilistic response based on test data for 1.8 g/cc (Ultrafine) 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) and LX-17 (92.5% TATB, 7.5% Kel-F 800 binder). Application of the PSTC methodology is presented investigating the safety and performance of a flying plate detonator and the margin of an Ultrafine TATB booster initiating LX-17.

  17. Searching Sociological Abstracts.

    ERIC Educational Resources Information Center

    Kerbel, Sandra Sandor

    1981-01-01

    Describes the scope, content, and retrieval characteristics of Sociological Abstracts, an online database of literature in the social sciences. Sample searches are displayed, and the strengths and weaknesses of the database are summarized. (FM)

  18. Exact and Approximate Probabilistic Symbolic Execution

    NASA Technical Reports Server (NTRS)

    Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem

    2014-01-01

    Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.

  19. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  20. JICST Factual Database

    NASA Astrophysics Data System (ADS)

    Suzuki, Kazuaki; Shimura, Kazuki; Monma, Yoshio; Sakamoto, Masao; Morishita, Hiroshi; Kanazawa, Kenji

    The Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST/NRIM Materials Strength Database for Engineering Steels and Alloys (JICST ME) in this March (1990). This database has been developed under the joint research between JICST and the National Research Institute for Metals (NRIM). It provides material strength data (creep, fatigue, etc.) of engineering steels and alloys. It is able to search and display on-line, and to analyze the searched data statistically and plot the result on graphic display. The database system and the data in JICST ME are described.

  1. Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog.

    ERIC Educational Resources Information Center

    Larson, Ray R.

    1991-01-01

    Discusses problems with subject searches in online library catalogs and examines theoretical principles for the design of effective information retrieval systems. Probabilistic ranking methods are discussed, and an experimental online catalog called CHESHIRE is described. It is noted that CHESHIRE uses classification clustering, provides natural…

  2. Mathematical Notation in Bibliographic Databases.

    ERIC Educational Resources Information Center

    Pasterczyk, Catherine E.

    1990-01-01

    Discusses ways in which using mathematical symbols to search online bibliographic databases in scientific and technical areas can improve search results. The representations used for Greek letters, relations, binary operators, arrows, and miscellaneous special symbols in the MathSci, Inspec, Compendex, and Chemical Abstracts databases are…

  3. Probabilistic authenticated quantum dialogue

    NASA Astrophysics Data System (ADS)

    Hwang, Tzonelih; Luo, Yi-Ping

    2015-12-01

    This work proposes a probabilistic authenticated quantum dialogue (PAQD) based on Bell states with the following notable features. (1) In our proposed scheme, the dialogue is encoded in a probabilistic way, i.e., the same messages can be encoded into different quantum states, whereas in the state-of-the-art authenticated quantum dialogue (AQD), the dialogue is encoded in a deterministic way; (2) the pre-shared secret key between two communicants can be reused without any security loophole; (3) each dialogue in the proposed PAQD can be exchanged within only one-step quantum communication and one-step classical communication. However, in the state-of-the-art AQD protocols, both communicants have to run a QKD protocol for each dialogue and each dialogue requires multiple quantum as well as classical communicational steps; (4) nevertheless, the proposed scheme can resist the man-in-the-middle attack, the modification attack, and even other well-known attacks.

  4. Probabilistic Fatigue: Computational Simulation

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2002-01-01

    Fatigue is a primary consideration in the design of aerospace structures for long term durability and reliability. There are several types of fatigue that must be considered in the design. These include low cycle, high cycle, combined for different cyclic loading conditions - for example, mechanical, thermal, erosion, etc. The traditional approach to evaluate fatigue has been to conduct many tests in the various service-environment conditions that the component will be subjected to in a specific design. This approach is reasonable and robust for that specific design. However, it is time consuming, costly and needs to be repeated for designs in different operating conditions in general. Recent research has demonstrated that fatigue of structural components/structures can be evaluated by computational simulation based on a novel paradigm. Main features in this novel paradigm are progressive telescoping scale mechanics, progressive scale substructuring and progressive structural fracture, encompassed with probabilistic simulation. These generic features of this approach are to probabilistically telescope scale local material point damage all the way up to the structural component and to probabilistically scale decompose structural loads and boundary conditions all the way down to material point. Additional features include a multifactor interaction model that probabilistically describes material properties evolution, any changes due to various cyclic load and other mutually interacting effects. The objective of the proposed paper is to describe this novel paradigm of computational simulation and present typical fatigue results for structural components. Additionally, advantages, versatility and inclusiveness of computational simulation versus testing are discussed. Guidelines for complementing simulated results with strategic testing are outlined. Typical results are shown for computational simulation of fatigue in metallic composite structures to demonstrate the

  5. Geothermal probabilistic cost study

    NASA Astrophysics Data System (ADS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  6. Probabilistic Model Development

    NASA Technical Reports Server (NTRS)

    Adam, James H., Jr.

    2010-01-01

    Objective: Develop a Probabilistic Model for the Solar Energetic Particle Environment. Develop a tool to provide a reference solar particle radiation environment that: 1) Will not be exceeded at a user-specified confidence level; 2) Will provide reference environments for: a) Peak flux; b) Event-integrated fluence; and c) Mission-integrated fluence. The reference environments will consist of: a) Elemental energy spectra; b) For protons, helium and heavier ions.

  7. Geothermal probabilistic cost study

    NASA Technical Reports Server (NTRS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-01-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  8. Probabilistic Graph Layout for Uncertain Network Visualization.

    PubMed

    Schulz, Christoph; Nocaj, Arlind; Goertler, Jochen; Deussen, Oliver; Brandes, Ulrik; Weiskopf, Daniel

    2017-01-01

    We present a novel uncertain network visualization technique based on node-link diagrams. Nodes expand spatially in our probabilistic graph layout, depending on the underlying probability distributions of edges. The visualization is created by computing a two-dimensional graph embedding that combines samples from the probabilistic graph. A Monte Carlo process is used to decompose a probabilistic graph into its possible instances and to continue with our graph layout technique. Splatting and edge bundling are used to visualize point clouds and network topology. The results provide insights into probability distributions for the entire network-not only for individual nodes and edges. We validate our approach using three data sets that represent a wide range of network types: synthetic data, protein-protein interactions from the STRING database, and travel times extracted from Google Maps. Our approach reveals general limitations of the force-directed layout and allows the user to recognize that some nodes of the graph are at a specific position just by chance.

  9. On Relations between Current Global Volcano Databases

    NASA Astrophysics Data System (ADS)

    Newhall, C. G.; Siebert, L.; Sparks, S.

    2009-12-01

    The Smithsonian’s Volcano Reference File (VRF), the database that underlies Volcanoes of the World and This Dynamic Planet, is the premier source for the “what, when, where, and how big?” of Holocene and historical eruptions. VOGRIPA (Volcanic Global Risk Identification and Analysis) will catalogue details of large eruptions, including specific phenomena and their impacts. CCDB (Collapse Caldera Database) also considers large eruptions with an emphasis on the resulting calderas. WOVOdat is bringing monitoring data from the world’s observatories into a centralized database in common formats, so that they can be searched and compared during volcanic crises and for research on preeruption processes. Oceanographic and space institutions worldwide have growing archives of volcano imagery and derivative products. Petrologic databases such as PETRODB and GEOROC offer compositions of many erupted and non-erupted magmas. Each of these informs and complements the others. Examples of interrelations include: ● Information in the VRF about individual volcanoes is the starting point and major source of background “volcano” data in WOVOdat, VOGRIPA, and petrologic databases. ● Images and digital topography from remote sensing archives offer high-resolution, consistent geospatial "base maps" for all of the other databases. ● VRF data about eruptions shows whether unrest of WOVOdat culminated in an eruption and, if yes, its type and magnitude. ● Data from WOVOdat fills in the “blanks” between eruptions in the VRF. ● VOGRIPA adds more detail to the VRF’s descriptions of eruptions, including quantification of runout distances, expanded estimated column heights and eruption impact data, and other parameters not included in the Smithsonian VRF. ● Petrologic databases can add detail to existing petrologic data of the VRF, WOVOdat, and VOGRIPA, e.g, detail needed to estimate viscosity of melt and its influence on magma and eruption dynamics ● Hazard

  10. Multiclient Identification System Using Adaptive Probabilistic Model

    NASA Astrophysics Data System (ADS)

    Lin, Chin-Teng; Siana, Linda; Shou, Yu-Wen; Yang, Chien-Ting

    2010-12-01

    This paper aims at integrating detection and identification of human faces in a more practical and real-time face recognition system. The proposed face detection system is based on the cascade Adaboost method to improve the precision and robustness toward unstable surrounding lightings. Our Adaboost method innovates to adjust the environmental lighting conditions by histogram lighting normalization and to accurately locate the face regions by a region-based-clustering process as well. We also address on the problem of multi-scale faces in this paper by using 12 different scales of searching windows and 5 different orientations for each client in pursuit of the multi-view independent face identification. There are majorly two methodological parts in our face identification system, including PCA (principal component analysis) facial feature extraction and adaptive probabilistic model (APM). The structure of our implemented APM with a weighted combination of simple probabilistic functions constructs the likelihood functions by the probabilistic constraint in the similarity measures. In addition, our proposed method can online add a new client and update the information of registered clients due to the constructed APM. The experimental results eventually show the superior performance of our proposed system for both offline and real-time online testing.

  11. Criteria for Comparing Children's Web Search Tools.

    ERIC Educational Resources Information Center

    Kuntz, Jerry

    1999-01-01

    Presents criteria for evaluating and comparing Web search tools designed for children. Highlights include database size; accountability; categorization; search access methods; help files; spell check; URL searching; links to alternative search services; advertising; privacy policy; and layout and design. (LRW)

  12. Custom Search Engines: Tools & Tips

    ERIC Educational Resources Information Center

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  13. JICST Factual DatabaseJICST Chemical Substance Safety Regulation Database

    NASA Astrophysics Data System (ADS)

    Abe, Atsushi; Sohma, Tohru

    JICST Chemical Substance Safety Regulation Database is based on the Database of Safety Laws for Chemical Compounds constructed by Japan Chemical Industry Ecology-Toxicology & Information Center (JETOC) sponsored by the Sience and Technology Agency in 1987. JICST has modified JETOC database system, added data and started the online service through JOlS-F (JICST Online Information Service-Factual database) in January 1990. JICST database comprises eighty-three laws and fourteen hundred compounds. The authors outline the database, data items, files and search commands. An example of online session is presented.

  14. Topics in Probabilistic Judgment Aggregation

    ERIC Educational Resources Information Center

    Wang, Guanchun

    2011-01-01

    This dissertation is a compilation of several studies that are united by their relevance to probabilistic judgment aggregation. In the face of complex and uncertain events, panels of judges are frequently consulted to provide probabilistic forecasts, and aggregation of such estimates in groups often yield better results than could have been made…

  15. Time Analysis for Probabilistic Workflows

    SciTech Connect

    Czejdo, Bogdan; Ferragut, Erik M

    2012-01-01

    There are many theoretical and practical results in the area of workflow modeling, especially when the more formal workflows are used. In this paper we focus on probabilistic workflows. We show algorithms for time computations in probabilistic workflows. With time of activities more precisely modeled, we can achieve improvement in the work cooperation and analyses of cooperation including simulation and visualization.

  16. Quantum probabilistic logic programming

    NASA Astrophysics Data System (ADS)

    Balu, Radhakrishnan

    2015-05-01

    We describe a quantum mechanics based logic programming language that supports Horn clauses, random variables, and covariance matrices to express and solve problems in probabilistic logic. The Horn clauses of the language wrap random variables, including infinite valued, to express probability distributions and statistical correlations, a powerful feature to capture relationship between distributions that are not independent. The expressive power of the language is based on a mechanism to implement statistical ensembles and to solve the underlying SAT instances using quantum mechanical machinery. We exploit the fact that classical random variables have quantum decompositions to build the Horn clauses. We establish the semantics of the language in a rigorous fashion by considering an existing probabilistic logic language called PRISM with classical probability measures defined on the Herbrand base and extending it to the quantum context. In the classical case H-interpretations form the sample space and probability measures defined on them lead to consistent definition of probabilities for well formed formulae. In the quantum counterpart, we define probability amplitudes on Hinterpretations facilitating the model generations and verifications via quantum mechanical superpositions and entanglements. We cast the well formed formulae of the language as quantum mechanical observables thus providing an elegant interpretation for their probabilities. We discuss several examples to combine statistical ensembles and predicates of first order logic to reason with situations involving uncertainty.

  17. Probabilistic cellular automata.

    PubMed

    Agapie, Alexandru; Andreica, Anca; Giuclea, Marius

    2014-09-01

    Cellular automata are binary lattices used for modeling complex dynamical systems. The automaton evolves iteratively from one configuration to another, using some local transition rule based on the number of ones in the neighborhood of each cell. With respect to the number of cells allowed to change per iteration, we speak of either synchronous or asynchronous automata. If randomness is involved to some degree in the transition rule, we speak of probabilistic automata, otherwise they are called deterministic. With either type of cellular automaton we are dealing with, the main theoretical challenge stays the same: starting from an arbitrary initial configuration, predict (with highest accuracy) the end configuration. If the automaton is deterministic, the outcome simplifies to one of two configurations, all zeros or all ones. If the automaton is probabilistic, the whole process is modeled by a finite homogeneous Markov chain, and the outcome is the corresponding stationary distribution. Based on our previous results for the asynchronous case-connecting the probability of a configuration in the stationary distribution to its number of zero-one borders-the article offers both numerical and theoretical insight into the long-term behavior of synchronous cellular automata.

  18. Is the basic conditional probabilistic?

    PubMed

    Goodwin, Geoffrey P

    2014-06-01

    Nine experiments examined whether individuals treat the meaning of basic conditional assertions as deterministic or probabilistic. In Experiments 1-4, participants were presented with either probabilistic or deterministic relations, which they had to describe with a conditional. These experiments consistently showed that people tend only to use the basic if p then q construction to describe deterministic relations between antecedent and consequent, whereas they use a probabilistically qualified construction, if p then probably q, to describe probabilistic relations-suggesting that the default interpretation of the conditional is deterministic. Experiments 5 and 6 showed that when directly asked, individuals typically report that conditional assertions admit no exceptions (i.e., they are seen as deterministic). Experiments 7-9 showed that individuals judge the truth of conditional assertions in accordance with this deterministic interpretation. Together, these results pose a challenge to probabilistic accounts of the meaning of conditionals and support mental models, formal rules, and suppositional accounts.

  19. Probabilistic Structural Analysis Theory Development

    NASA Technical Reports Server (NTRS)

    Burnside, O. H.

    1985-01-01

    The objective of the Probabilistic Structural Analysis Methods (PSAM) project is to develop analysis techniques and computer programs for predicting the probabilistic response of critical structural components for current and future space propulsion systems. This technology will play a central role in establishing system performance and durability. The first year's technical activity is concentrating on probabilistic finite element formulation strategy and code development. Work is also in progress to survey critical materials and space shuttle mian engine components. The probabilistic finite element computer program NESSUS (Numerical Evaluation of Stochastic Structures Under Stress) is being developed. The final probabilistic code will have, in the general case, the capability of performing nonlinear dynamic of stochastic structures. It is the goal of the approximate methods effort to increase problem solving efficiency relative to finite element methods by using energy methods to generate trial solutions which satisfy the structural boundary conditions. These approximate methods will be less computer intensive relative to the finite element approach.

  20. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  1. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

    PubMed Central

    Jones, Andrew R.; Siepen, Jennifer A.; Hubbard, Simon J.; Paton, Norman W.

    2010-01-01

    Tandem mass spectrometry, run in combination with liquid chromatography (LC-MS/MS), can generate large numbers of peptide and protein identifications, for which a variety of database search engines are available. Distinguishing correct identifications from false positives is far from trivial because all data sets are noisy, and tend to be too large for manual inspection, therefore probabilistic methods must be employed to balance the trade-off between sensitivity and specificity. Decoy databases are becoming widely used to place statistical confidence in results sets, allowing the false discovery rate (FDR) to be estimated. It has previously been demonstrated that different MS search engines produce different peptide identification sets, and as such, employing more than one search engine could result in an increased number of peptides being identified. However, such efforts are hindered by the lack of a single scoring framework employed by all search engines. We have developed a search engine independent scoring framework based on FDR which allows peptide identifications from different search engines to be combined, called the FDRScore. We observe that peptide identifications made by three search engines are infrequently false positives, and identifications made by only a single search engine, even with a strong score from the source search engine, are significantly more likely to be false positives. We have developed a second score based on the FDR within peptide identifications grouped according to the set of search engines that have made the identification, called the combined FDRScore. We demonstrate by searching large publicly available data sets that the combined FDRScore can differentiate between between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine. PMID:19253293

  2. 78 FR 15746 - Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-12

    ... COMMISSION Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State... document entitled: Compendium of Analyses to Investigate Select Level 1 Probabilistic Risk Assessment End..., select ``ADAMS Public Documents'' and then select ``Begin Web- based ADAMS Search.'' For problems...

  3. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  4. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  5. Probabilistic retinal vessel segmentation

    NASA Astrophysics Data System (ADS)

    Wu, Chang-Hua; Agam, Gady

    2007-03-01

    Optic fundus assessment is widely used for diagnosing vascular and non-vascular pathology. Inspection of the retinal vasculature may reveal hypertension, diabetes, arteriosclerosis, cardiovascular disease and stroke. Due to various imaging conditions retinal images may be degraded. Consequently, the enhancement of such images and vessels in them is an important task with direct clinical applications. We propose a novel technique for vessel enhancement in retinal images that is capable of enhancing vessel junctions in addition to linear vessel segments. This is an extension of vessel filters we have previously developed for vessel enhancement in thoracic CT scans. The proposed approach is based on probabilistic models which can discern vessels and junctions. Evaluation shows the proposed filter is better than several known techniques and is comparable to the state of the art when evaluated on a standard dataset. A ridge-based vessel tracking process is applied on the enhanced image to demonstrate the effectiveness of the enhancement filter.

  6. Probabilistic Fiber Composite Micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, Thomas A.

    1996-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. The variables in which uncertainties are accounted for include constituent and void volume ratios, constituent elastic properties and strengths, and fiber misalignment. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material property variations induced by random changes expected at the material micro level. Regression results are presented to show the relative correlation between predictor and response variables in the study. These computational procedures make possible a formal description of anticipated random processes at the intra-ply level, and the related effects of these on composite properties.

  7. Novel probabilistic neuroclassifier

    NASA Astrophysics Data System (ADS)

    Hong, Jiang; Serpen, Gursel

    2003-09-01

    A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.

  8. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  9. Probabilistic brains: knowns and unknowns

    PubMed Central

    Pouget, Alexandre; Beck, Jeffrey M; Ma, Wei Ji; Latham, Peter E

    2015-01-01

    There is strong behavioral and physiological evidence that the brain both represents probability distributions and performs probabilistic inference. Computational neuroscientists have started to shed light on how these probabilistic representations and computations might be implemented in neural circuits. One particularly appealing aspect of these theories is their generality: they can be used to model a wide range of tasks, from sensory processing to high-level cognition. To date, however, these theories have only been applied to very simple tasks. Here we discuss the challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference. PMID:23955561

  10. Probabilistic Design of Composite Structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2006-01-01

    A formal procedure for the probabilistic design evaluation of a composite structure is described. The uncertainties in all aspects of a composite structure (constituent material properties, fabrication variables, structural geometry, and service environments, etc.), which result in the uncertain behavior in the composite structural responses, are included in the evaluation. The probabilistic evaluation consists of: (1) design criteria, (2) modeling of composite structures and uncertainties, (3) simulation methods, and (4) the decision-making process. A sample case is presented to illustrate the formal procedure and to demonstrate that composite structural designs can be probabilistically evaluated with accuracy and efficiency.

  11. Online Petroleum Industry Bibliographic Databases: A Review.

    ERIC Educational Resources Information Center

    Anderson, Margaret B.

    This paper discusses the present status of the bibliographic database industry, reviews the development of online databases of interest to the petroleum industry, and considers future developments in online searching and their effect on libraries and information centers. Three groups of databases are described: (1) databases developed by the…

  12. Probabilistic graphic models applied to identification of diseases

    PubMed Central

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    ABSTRACT Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases. PMID:26154555

  13. Probabilistic graphic models applied to identification of diseases.

    PubMed

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases.

  14. A Probabilistic Tsunami Hazard Assessment Methodology

    NASA Astrophysics Data System (ADS)

    Gonzalez, Frank; Geist, Eric; Jaffe, Bruce; Kanoglu, Utku; Mofjeld, Harold; Synolakis, Costas; Titov, Vasily; Arcas, Diego

    2010-05-01

    A methodology for probabilistic tsunami hazard assessment (PTHA) will be described for multiple near- and far-field seismic sources. The method integrates tsunami inundation modeling with the approach of probabilistic seismic hazard assessment (PSHA). A database of inundation simulations is developed, with each simulation corresponding to an earthquake source for which the seismic parameters and mean interevent time have been estimated. A Poissonian model is then adopted for estimating the probability that tsunami flooding will exceed a given level during a specified period of time, taking into account multiple sources and multiple causes of uncertainty. Uncertainty in the tidal stage at tsunami arrival is dealt with by developing a parametric expression for the probability density function of the sum of the tides and a tsunami; uncertainty in the slip distribution of the near-field source is dealt with probabilistically by considering multiple sources in which width and slip values vary, subject to the constraint of a constant moment magnitude. The method was applied to Seaside, Oregon, to obtain estimates of the spatial distribution of 100- and 500-year maximum tsunami amplitudes, i.e., amplitudes with 1% and 0.2% annual probability of exceedance. These results will be presented and discussed, including the primary remaining sources of uncertainty -- those associated with interevent time estimates, the modeling of background sea level, and temporal changes in bathymetry and topography. PTHA represents an important contribution to tsunami hazard assessment techniques; viewed in the broader context of risk analysis, PTHA provides a method for quantifying estimates of the likelihood and severity of the tsunami hazard, which can then be combined with vulnerability and exposure to yield estimates of tsunami risk.

  15. Evaluation of Online Databases and Their Uses in Collection Evaluation.

    ERIC Educational Resources Information Center

    Rice, Barbara A.

    1985-01-01

    Addresses importance of online databases as part of library reference collection and focuses on bibliographic databases (ready reference, print versus online migration, bibliographic utilities); selection/evaluation of online databases (vendor selection, individual databases); full-text databases; numeric databases; end-user searching; and use of…

  16. Content-addressable holographic databases

    NASA Astrophysics Data System (ADS)

    Grawert, Felix; Kobras, Sebastian; Burr, Geoffrey W.; Coufal, Hans J.; Hanssen, Holger; Riedel, Marc; Jefferson, C. Michael; Jurich, Mark C.

    2000-11-01

    Holographic data storage allows the simultaneous search of an entire database by performing multiple optical correlations between stored data pages and a search argument. We have recently developed fuzzy encoding techniques for this fast parallel search and demonstrated a holographic data storage system that searches digital data records with high fidelity. This content-addressable retrieval is based on the ability to take the two-dimensional inner product between the search page and each stored data page. We show that this ability is lost when the correlator is defocussed to avoid material oversaturation, but can be regained by the combination of a random phase mask and beam confinement through total internal reflection. Finally, we propose an architecture in which spatially multiplexed holograms are distributed along the path of the search beam, allowing parallel search of large databases.

  17. Common Difficulties with Probabilistic Reasoning.

    ERIC Educational Resources Information Center

    Hope, Jack A.; Kelly, Ivan W.

    1983-01-01

    Several common errors reflecting difficulties in probabilistic reasoning are identified, relating to ambiguity, previous outcomes, sampling, unusual events, and estimating. Knowledge of these mistakes and interpretations may help mathematics teachers understand the thought processes of their students. (MNS)

  18. A Probabilistic Ontology Development Methodology

    DTIC Science & Technology

    2014-06-01

    to have a tool guiding the user on the steps necessary to create a probabilistic ontology and link this documentation to its implementation … [4...extension that is beyond the scope of this work and includes methods such as ONIONS , FCA-Merge, and PROMPT. The interested reader may find these...construction “It would be interesting to have a tool guiding the user on the steps necessary to create a probabilistic ontology and link this

  19. Evaluation of Federated Searching Options for the School Library

    ERIC Educational Resources Information Center

    Abercrombie, Sarah E.

    2008-01-01

    Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…

  20. Probabilistic theories with purification

    SciTech Connect

    Chiribella, Giulio; D'Ariano, Giacomo Mauro; Perinotti, Paolo

    2010-06-15

    We investigate general probabilistic theories in which every mixed state has a purification, unique up to reversible channels on the purifying system. We show that the purification principle is equivalent to the existence of a reversible realization of every physical process, that is, to the fact that every physical process can be regarded as arising from a reversible interaction of the system with an environment, which is eventually discarded. From the purification principle we also construct an isomorphism between transformations and bipartite states that possesses all structural properties of the Choi-Jamiolkowski isomorphism in quantum theory. Such an isomorphism allows one to prove most of the basic features of quantum theory, like, e.g., existence of pure bipartite states giving perfect correlations in independent experiments, no information without disturbance, no joint discrimination of all pure states, no cloning, teleportation, no programming, no bit commitment, complementarity between correctable channels and deletion channels, characterization of entanglement-breaking channels as measure-and-prepare channels, and others, without resorting to the mathematical framework of Hilbert spaces.

  1. Advanced Web Searching: Tricks of the Trade.

    ERIC Educational Resources Information Center

    Zorn, Peggy; And Others

    1996-01-01

    Discusses World Wide Web searching techniques for information professionals, and describes and evaluates four search systems that provide advanced search features and that search a comprehensive and authoritative database of Internet sites. Sample searches are explained and professional searching on the Web is discussed. (LRW)

  2. Probabilistic Plan Management

    DTIC Science & Technology

    2009-11-17

    115 6.3 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4 Experimental Results...6.4.2 Comparison of Strengthening Strategies . . . . . . . . . . . . . . . . . . . 124 6.4.3 Effects of Global Strengthening...103 6.1 The baseline strengthening strategy explores the full search space of the different orderings of backfills, swapping and pruning steps that can

  3. The ERIC Search: A Programmed Text.

    ERIC Educational Resources Information Center

    Szymanski, Cynthia; Arnold, Joann

    This text is intended as a guide for searching ERIC using the SilverPlatter CD-ROM database. The text describes the content and function of the ERIC database and demonstrates fundamental search techniques by following an example of a successful ERIC computer search. The search begins with the choice of a topic, the formulation of a search…

  4. A search for pre-main sequence stars in the high-latitude molecular clouds. II - A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris

    1990-01-01

    The preliminary results are reported of a survey of every EINSTEIN image which overlaps any high-latitude molecular cloud in a search for X-ray emitting pre-main sequence stars. This survey, together with complementary KPNO and IRAS data, will allow the determination of how prevalent low mass star formation is in these clouds in general and, particularly, in the translucent molecular clouds.

  5. Standardization of Keyword Search Mode

    ERIC Educational Resources Information Center

    Su, Di

    2010-01-01

    In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…

  6. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    NASA Astrophysics Data System (ADS)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption

  7. WWW Search Tools in Reference Services.

    ERIC Educational Resources Information Center

    Kimmel, Stacey

    1997-01-01

    Provides an introduction to World Wide Web search tools for reference services. Discusses characteristics of search services and types of tools available, including search engines/robot-generated databases, directories, metasearch engines, and review/rating sites. (AEF)

  8. Degradation monitoring using probabilistic inference

    NASA Astrophysics Data System (ADS)

    Alpay, Bulent

    In order to increase safety and improve economy and performance in a nuclear power plant (NPP), the source and extent of component degradations should be identified before failures and breakdowns occur. It is also crucial for the next generation of NPPs, which are designed to have a long core life and high fuel burnup to have a degradation monitoring system in order to keep the reactor in a safe state, to meet the designed reactor core lifetime and to optimize the scheduled maintenance. Model-based methods are based on determining the inconsistencies between the actual and expected behavior of the plant, and use these inconsistencies for detection and diagnostics of degradations. By defining degradation as a random abrupt change from the nominal to a constant degraded state of a component, we employed nonlinear filtering techniques based on state/parameter estimation. We utilized a Bayesian recursive estimation formulation in the sequential probabilistic inference framework and constructed a hidden Markov model to represent a general physical system. By addressing the problem of a filter's inability to estimate an abrupt change, which is called the oblivious filter problem in nonlinear extensions of Kalman filtering, and the sample impoverishment problem in particle filtering, we developed techniques to modify filtering algorithms by utilizing additional data sources to improve the filter's response to this problem. We utilized a reliability degradation database that can be constructed from plant specific operational experience and test and maintenance reports to generate proposal densities for probable degradation modes. These are used in a multiple hypothesis testing algorithm. We then test samples drawn from these proposal densities with the particle filtering estimates based on the Bayesian recursive estimation formulation with the Metropolis Hastings algorithm, which is a well-known Markov chain Monte Carlo method (MCMC). This multiple hypothesis testing

  9. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  10. Hawaii bibliographic database

    NASA Astrophysics Data System (ADS)

    Wright, Thomas L.; Takahashi, Taeko Jane

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and s or (if no ) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  11. Hawaii bibliographic database

    USGS Publications Warehouse

    Wright, T.L.; Takahashi, T.J.

    1998-01-01

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  12. PROBABILISTIC INFORMATION INTEGRATION TECHNOLOGY

    SciTech Connect

    J. BOOKER; M. MEYER; ET AL

    2001-02-01

    The Statistical Sciences Group at Los Alamos has successfully developed a structured, probabilistic, quantitative approach for the evaluation of system performance based on multiple information sources, called Information Integration Technology (IIT). The technology integrates diverse types and sources of data and information (both quantitative and qualitative), and their associated uncertainties, to develop distributions for performance metrics, such as reliability. Applications include predicting complex system performance, where test data are lacking or expensive to obtain, through the integration of expert judgment, historical data, computer/simulation model predictions, and any relevant test/experimental data. The technology is particularly well suited for tracking estimated system performance for systems under change (e.g. development, aging), and can be used at any time during product development, including concept and early design phases, prior to prototyping, testing, or production, and before costly design decisions are made. Techniques from various disciplines (e.g., state-of-the-art expert elicitation, statistical and reliability analysis, design engineering, physics modeling, and knowledge management) are merged and modified to develop formal methods for the data/information integration. The power of this technology, known as PREDICT (Performance and Reliability Evaluation with Diverse Information Combination and Tracking), won a 1999 R and D 100 Award (Meyer, Booker, Bement, Kerscher, 1999). Specifically the PREDICT application is a formal, multidisciplinary process for estimating the performance of a product when test data are sparse or nonexistent. The acronym indicates the purpose of the methodology: to evaluate the performance or reliability of a product/system by combining all available (often diverse) sources of information and then tracking that performance as the product undergoes changes.

  13. Probabilistic exposure fusion.

    PubMed

    Song, Mingli; Tao, Dacheng; Chen, Chun; Bu, Jiajun; Luo, Jiebo; Zhang, Chengqi

    2012-01-01

    The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range (SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is preferred over a number of existing approaches.

  14. Andromeda: a peptide search engine integrated into the MaxQuant environment.

    PubMed

    Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias

    2011-04-01

    A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.

  15. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  16. IDPredictor: predict database links in biomedical database.

    PubMed

    Mehlhorn, Hendrik; Lange, Matthias; Scholz, Uwe; Schreiber, Falk

    2012-06-26

    Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.

  17. A search for pre-main-sequence stars in high-latitude molecular clouds. 3: A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris; Fryer, Chris

    1995-01-01

    In order to discern whether the high-latitude molecular clouds are regions of ongoing star formation, we have used X-ray emission as a tracer of youthful stars. The entire Einstein database yields 18 images which overlap 10 of the clouds mapped partially or completely in the CO (1-0) transition, providing a total of approximately 6 deg squared of overlap. Five previously unidentified X-ray sources were detected: one has an optical counterpart which is a pre-main-sequence (PMS) star, and two have normal main-sequence stellar counterparts, while the other two are probably extragalactic sources. The PMS star is located in a high Galactic latitude Lynds dark cloud, so this result is not too suprising. The translucent clouds, though, have yet to reveal any evidence of star formation.

  18. Vagueness as Probabilistic Linguistic Knowledge

    NASA Astrophysics Data System (ADS)

    Lassiter, Daniel

    Consideration of the metalinguistic effects of utterances involving vague terms has led Barker [1] to treat vagueness using a modified Stalnakerian model of assertion. I present a sorites-like puzzle for factual beliefs in the standard Stalnakerian model [28] and show that it can be resolved by enriching the model to make use of probabilistic belief spaces. An analogous problem arises for metalinguistic information in Barker's model, and I suggest that a similar enrichment is needed here as well. The result is a probabilistic theory of linguistic representation that retains a classical metalanguage but avoids the undesirable divorce between meaning and use inherent in the epistemic theory [34]. I also show that the probabilistic approach provides a plausible account of the sorites paradox and higher-order vagueness and that it fares well empirically and conceptually in comparison to leading competitors.

  19. Probabilistic progressive buckling of trusses

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.

    1991-01-01

    A three-bay, space, cantilever truss is probabilistically evaluated to describe progressive buckling and truss collapse in view of the numerous uncertainties associated with the structural, material, and load variables (primitive variables) that describe the truss. Initially, the truss is deterministically analyzed for member forces, and member(s) in which the axial force exceeds the Euler buckling load are identified. These member(s) are then discretized with several intermediate nodes and a probabilistic buckling analysis is performed on the truss to obtain its probabilistic buckling loads and respective mode shapes. Furthermore, sensitivities associated with the uncertainties in the primitive variables are investigated, margin of safety values for the truss are determined, and truss end node displacements are noted. These steps are repeated by sequentially removing the buckled member(s) until onset of truss collapse is reached. Results show that this procedure yields an optimum truss configuration for a given loading and for a specified reliability.

  20. Probabilistic progressive buckling of trusses

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.

    1994-01-01

    A three-bay, space, cantilever truss is probabilistically evaluated to describe progressive buckling and truss collapse in view of the numerous uncertainties associated with the structural, material, and load variables that describe the truss. Initially, the truss is deterministically analyzed for member forces, and members in which the axial force exceeds the Euler buckling load are identified. These members are then discretized with several intermediate nodes, and a probabilistic buckling analysis is performed on the truss to obtain its probabilistic buckling loads and the respective mode shapes. Furthermore, sensitivities associated with the uncertainties in the primitive variables are investigated, margin of safety values for the truss are determined, and truss end node displacements are noted. These steps are repeated by sequentially removing buckled members until onset of truss collapse is reached. Results show that this procedure yields an optimum truss configuration for a given loading and for a specified reliability.

  1. Database Selection in the Life Sciences.

    ERIC Educational Resources Information Center

    Snow, Bonnie

    1985-01-01

    Focuses on indexing refinements in major life science databases--those specializing in biological/biomedical literature coverage--which influence cross-life searching decisions. Tables included highlight database descriptions, comparisons in coverage, ease of access (indexing of secondary concepts or search modifiers), chemical substance indexing…

  2. Online Database Coverage of Forensic Medicine.

    ERIC Educational Resources Information Center

    Snow, Bonnie; Ifshin, Steven L.

    1984-01-01

    Online seaches of sample topics in the area of forensic medicine were conducted in the following life science databases: Biosis Previews, Excerpta Medica, Medline, Scisearch, and Chemical Abstracts Search. Search outputs analyzed according to criteria of recall, uniqueness, overlap, and utility reveal the need for a cross-database approach to…

  3. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  4. Probabilistic inversion: a preliminary discussion

    NASA Astrophysics Data System (ADS)

    Battista Rossi, Giovanni; Crenna, Francesco

    2015-02-01

    We continue the discussion on the possibility of interpreting probability as a logic, that we have started in the previous IMEKO TC1-TC7-TC13 Symposium. We show here how a probabilistic logic can be extended up to including direct and inverse functions. We also discuss the relationship between this framework and the Bayes-Laplace rule, showing how the latter can be formally interpreted as a probabilistic inversion device. We suggest that these findings open a new perspective in the evaluation of measurement uncertainty.

  5. Comparison of probabilistic and deterministic fiber tracking of cranial nerves.

    PubMed

    Zolal, Amir; Sobottka, Stephan B; Podlesek, Dino; Linn, Jennifer; Rieger, Bernhard; Juratli, Tareq A; Schackert, Gabriele; Kitzler, Hagen H

    2016-12-16

    OBJECTIVE The depiction of cranial nerves (CNs) using diffusion tensor imaging (DTI) is of great interest in skull base tumor surgery and DTI used with deterministic tracking methods has been reported previously. However, there are still no good methods usable for the elimination of noise from the resulting depictions. The authors have hypothesized that probabilistic tracking could lead to more accurate results, because it more efficiently extracts information from the underlying data. Moreover, the authors have adapted a previously described technique for noise elimination using gradual threshold increases to probabilistic tracking. To evaluate the utility of this new approach, a comparison is provided with this work between the gradual threshold increase method in probabilistic and deterministic tracking of CNs. METHODS Both tracking methods were used to depict CNs II, III, V, and the VII+VIII bundle. Depiction of 240 CNs was attempted with each of the above methods in 30 healthy subjects, which were obtained from 2 public databases: the Kirby repository (KR) and Human Connectome Project (HCP). Elimination of erroneous fibers was attempted by gradually increasing the respective thresholds (fractional anisotropy [FA] and probabilistic index of connectivity [PICo]). The results were compared with predefined ground truth images based on corresponding anatomical scans. Two label overlap measures (false-positive error and Dice similarity coefficient) were used to evaluate the success of both methods in depicting the CN. Moreover, the differences between these parameters obtained from the KR and HCP (with higher angular resolution) databases were evaluated. Additionally, visualization of 10 CNs in 5 clinical cases was attempted with both methods and evaluated by comparing the depictions with intraoperative findings. RESULTS Maximum Dice similarity coefficients were significantly higher with probabilistic tracking (p < 0.001; Wilcoxon signed-rank test). The false

  6. Using Quick Search.

    ERIC Educational Resources Information Center

    Maxfield, Sandy, Ed.; Kabus, Karl, Ed.

    This document is a guide to the use of Quick Search, a library service that provides access to more than 100 databases which contain references to journal articles and other research materials through two commercial systems--BRS After/Dark and DIALOG's Knowledge Index. The guide is divided into five sections: (1) Using Quick Search; (2) The…

  7. A Probabilistic Reformulation of No Free Lunch: Continuous Lunches Are Not Free.

    PubMed

    Lockett, Alan J; Miikkulainen, Risto

    2016-10-04

    No Free Lunch (NFL) theorems have been developed in many settings over the last two decades. Whereas NFL is known to be possible in any domain based on set-theoretic concepts, probabilistic versions of NFL are presently believed to be impossible in continuous domains. This article develops a new formalization of probabilistic NFL that is sufficiently expressive to prove the existence of NFL in large search domains, such as continuous spaces or function spaces. This formulation is arguably more complicated than its set-theoretic variants, mostly as a result of the numerous technical complications within probability theory itself. However, a probabilistic conceptualization of NFL is important because stochastic optimization methods inherently need to be evaluated probabilistically. Thus the present study fills an important gap in the study of performance of stochastic optimizers.

  8. Full-Text Databases in Medicine.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; And Others

    1995-01-01

    Describes types of full-text databases in medicine; discusses features for searching full-text journal databases available through online vendors; reviews research on full-text databases in medicine; and describes the MEDLINE/Full-Text Research Project at the University of Missouri (Columbia) which investigated precision, recall, and relevancy.…

  9. Adjacency and Proximity Searching in the Science Citation Index and Google

    DTIC Science & Technology

    2005-01-01

    major database search engines , including commercial S&T database search engines (e.g., Science Citation Index (SCI), Engineering Compendex (EC...PubMed, OVID), Federal agency award database search engines (e.g., NSF, NIH, DOE, EPA, as accessed in Federal R&D Project Summaries), Web search Engines (e.g...searching. Some database search engines allow strict constrained co- occurrence searching as a user option (e.g., OVID, EC), while others do not (e.g., SCI

  10. Probabilistic assessment of composite structures

    NASA Technical Reports Server (NTRS)

    Shiao, Michael E.; Abumeri, Galib H.; Chamis, Christos C.

    1993-01-01

    A general computational simulation methodology for an integrated probabilistic assessment of composite structures is discussed and demonstrated using aircraft fuselage (stiffened composite cylindrical shell) structures with rectangular cutouts. The computational simulation was performed for the probabilistic assessment of the structural behavior including buckling loads, vibration frequencies, global displacements, and local stresses. The scatter in the structural response is simulated based on the inherent uncertainties in the primitive (independent random) variables at the fiber matrix constituent, ply, laminate, and structural scales that describe the composite structures. The effect of uncertainties due to fabrication process variables such as fiber volume ratio, void volume ratio, ply orientation, and ply thickness is also included. The methodology has been embedded in the computer code IPACS (Integrated Probabilistic Assessment of Composite Structures). In addition to the simulated scatter, the IPACS code also calculates the sensitivity of the composite structural behavior to all the primitive variables that influence the structural behavior. This information is useful for assessing reliability and providing guidance for improvement. The results from the probabilistic assessment for the composite structure with rectangular cutouts indicate that the uncertainty in the longitudinal ply stress is mainly caused by the uncertainty in the laminate thickness, and the large overlap of the scatter in the first four buckling loads implies that the buckling mode shape for a specific buckling load can be either of the four modes.

  11. Research on probabilistic information processing

    NASA Technical Reports Server (NTRS)

    Edwards, W.

    1973-01-01

    The work accomplished on probabilistic information processing (PIP) is reported. The research proposals and decision analysis are discussed along with the results of research on MSC setting, multiattribute utilities, and Bayesian research. Abstracts of reports concerning the PIP research are included.

  12. Making Probabilistic Relational Categories Learnable

    ERIC Educational Resources Information Center

    Jung, Wookyoung; Hummel, John E.

    2015-01-01

    Theories of relational concept acquisition (e.g., schema induction) based on structured intersection discovery predict that relational concepts with a probabilistic (i.e., family resemblance) structure ought to be extremely difficult to learn. We report four experiments testing this prediction by investigating conditions hypothesized to facilitate…

  13. Development of probabilistic internal dosimetry computer code

    NASA Astrophysics Data System (ADS)

    Noh, Siwan; Kwon, Tae-Eun; Lee, Jai-Ki

    2017-02-01

    Internal radiation dose assessment involves biokinetic models, the corresponding parameters, measured data, and many assumptions. Every component considered in the internal dose assessment has its own uncertainty, which is propagated in the intake activity and internal dose estimates. For research or scientific purposes, and for retrospective dose reconstruction for accident scenarios occurring in workplaces having a large quantity of unsealed radionuclides, such as nuclear power plants, nuclear fuel cycle facilities, and facilities in which nuclear medicine is practiced, a quantitative uncertainty assessment of the internal dose is often required. However, no calculation tools or computer codes that incorporate all the relevant processes and their corresponding uncertainties, i.e., from the measured data to the committed dose, are available. Thus, the objective of the present study is to develop an integrated probabilistic internal-dose-assessment computer code. First, the uncertainty components in internal dosimetry are identified, and quantitative uncertainty data are collected. Then, an uncertainty database is established for each component. In order to propagate these uncertainties in an internal dose assessment, a probabilistic internal-dose-assessment system that employs the Bayesian and Monte Carlo methods. Based on the developed system, we developed a probabilistic internal-dose-assessment code by using MATLAB so as to estimate the dose distributions from the measured data with uncertainty. Using the developed code, we calculated the internal dose distribution and statistical values ( e.g. the 2.5th, 5th, median, 95th, and 97.5th percentiles) for three sample scenarios. On the basis of the distributions, we performed a sensitivity analysis to determine the influence of each component on the resulting dose in order to identify the major component of the uncertainty in a bioassay. The results of this study can be applied to various situations. In cases of

  14. The comprehensive peptaibiotics database.

    PubMed

    Stoppacher, Norbert; Neumann, Nora K N; Burgstaller, Lukas; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer

    2013-05-01

    Peptaibiotics are nonribosomally biosynthesized peptides, which - according to definition - contain the marker amino acid α-aminoisobutyric acid (Aib) and possess antibiotic properties. Being known since 1958, a constantly increasing number of peptaibiotics have been described and investigated with a particular emphasis on hypocrealean fungi. Starting from the existing online 'Peptaibol Database', first published in 1997, an exhaustive literature survey of all known peptaibiotics was carried out and resulted in a list of 1043 peptaibiotics. The gathered information was compiled and used to create the new 'The Comprehensive Peptaibiotics Database', which is presented here. The database was devised as a software tool based on Microsoft (MS) Access. It is freely available from the internet at http://peptaibiotics-database.boku.ac.at and can easily be installed and operated on any computer offering a Windows XP/7 environment. It provides useful information on characteristic properties of the peptaibiotics included such as peptide category, group name of the microheterogeneous mixture to which the peptide belongs, amino acid sequence, sequence length, producing fungus, peptide subfamily, molecular formula, and monoisotopic mass. All these characteristics can be used and combined for automated search within the database, which makes The Comprehensive Peptaibiotics Database a versatile tool for the retrieval of valuable information about peptaibiotics. Sequence data have been considered as to December 14, 2012.

  15. Probabilistic Mass Growth Uncertainties

    NASA Technical Reports Server (NTRS)

    Plumer, Eric; Elliott, Darren

    2013-01-01

    Mass has been widely used as a variable input parameter for Cost Estimating Relationships (CER) for space systems. As these space systems progress from early concept studies and drawing boards to the launch pad, their masses tend to grow substantially, hence adversely affecting a primary input to most modeling CERs. Modeling and predicting mass uncertainty, based on historical and analogous data, is therefore critical and is an integral part of modeling cost risk. This paper presents the results of a NASA on-going effort to publish mass growth datasheet for adjusting single-point Technical Baseline Estimates (TBE) of masses of space instruments as well as spacecraft, for both earth orbiting and deep space missions at various stages of a project's lifecycle. This paper will also discusses the long term strategy of NASA Headquarters in publishing similar results, using a variety of cost driving metrics, on an annual basis. This paper provides quantitative results that show decreasing mass growth uncertainties as mass estimate maturity increases. This paper's analysis is based on historical data obtained from the NASA Cost Analysis Data Requirements (CADRe) database.

  16. MS-ONLINE Mass Spectral Database

    NASA Astrophysics Data System (ADS)

    Tokizane, Soichi; Nagaoka, Nobuaki

    A mass spectral database, MS-ONLINE, is described which is produced by FIZ Chemie, in Federal Republic of Germany and offered online through the INKADATA system. The data source of this database is WILEY/NBS MASS SPECTRAL DATA BASE and it includes 80,680 spectra. Spectral data can be retrieved from a substance search (by assigning molecular weight, molecular formula or name), a specific peak search, or a similarity search of peak patterns called SISCOM search. Further more, the system has functions supporting the component identification of mixtures and the identification from an isotopic abundance. The algorism of the SISCOM search is explained in detail.

  17. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  18. Probabilistic load simulation: Code development status

    NASA Technical Reports Server (NTRS)

    Newell, J. F.; Ho, H.

    1991-01-01

    The objective of the Composite Load Spectra (CLS) project is to develop generic load models to simulate the composite load spectra that are included in space propulsion system components. The probabilistic loads thus generated are part of the probabilistic design analysis (PDA) of a space propulsion system that also includes probabilistic structural analyses, reliability, and risk evaluations. Probabilistic load simulation for space propulsion systems demands sophisticated probabilistic methodology and requires large amounts of load information and engineering data. The CLS approach is to implement a knowledge based system coupled with a probabilistic load simulation module. The knowledge base manages and furnishes load information and expertise and sets up the simulation runs. The load simulation module performs the numerical computation to generate the probabilistic loads with load information supplied from the CLS knowledge base.

  19. Probabilistic load simulation: Code development status

    NASA Astrophysics Data System (ADS)

    Newell, J. F.; Ho, H.

    1991-05-01

    The objective of the Composite Load Spectra (CLS) project is to develop generic load models to simulate the composite load spectra that are included in space propulsion system components. The probabilistic loads thus generated are part of the probabilistic design analysis (PDA) of a space propulsion system that also includes probabilistic structural analyses, reliability, and risk evaluations. Probabilistic load simulation for space propulsion systems demands sophisticated probabilistic methodology and requires large amounts of load information and engineering data. The CLS approach is to implement a knowledge based system coupled with a probabilistic load simulation module. The knowledge base manages and furnishes load information and expertise and sets up the simulation runs. The load simulation module performs the numerical computation to generate the probabilistic loads with load information supplied from the CLS knowledge base.

  20. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    ERIC Educational Resources Information Center

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  1. Hierarchies of Indices for Text Searching.

    ERIC Educational Resources Information Center

    Baeza-Yates, Ricardo; And Others

    1996-01-01

    Discusses indexes for text databases and presents an efficient implementation of an index for text searching called PAT array, or suffix array, where the database is stored on secondary storage devices such as magnetic or optical disks. Additional hierarchical index structures and searching algorithms are proposed that improve searching time, and…

  2. The Weaknesses of Full-Text Searching

    ERIC Educational Resources Information Center

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  3. Zero Result Searches. . . How to Minimize Them.

    ERIC Educational Resources Information Center

    Atkinson, Steve

    1986-01-01

    Based on manual observation of 187 zero result online searches at a university library, this article addresses three types of problems that can produce such search results: multiple database searching, topic negotiation, and database availability. A summary of conceptual and practical recommendations for searchers are provided. (6 references) (EJS)

  4. (Meta)Search like Google

    ERIC Educational Resources Information Center

    Rochkind, Jonathan

    2007-01-01

    The ability to search and receive results in more than one database through a single interface--or metasearch--is something many users want. Google Scholar--the search engine of specifically scholarly content--and library metasearch products like Ex Libris's MetaLib, Serials Solution's Central Search, WebFeat, and products based on MuseGlobal used…

  5. Search Engines for Tomorrow's Scholars

    ERIC Educational Resources Information Center

    Fagan, Jody Condit

    2011-01-01

    Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…

  6. De novo protein conformational sampling using a probabilistic graphical model.

    PubMed

    Bhattacharya, Debswapna; Cheng, Jianlin

    2015-11-06

    Efficient exploration of protein conformational space remains challenging especially for large proteins when assembling discretized structural fragments extracted from a protein structure data database. We propose a fragment-free probabilistic graphical model, FUSION, for conformational sampling in continuous space and assess its accuracy using 'blind' protein targets with a length up to 250 residues from the CASP11 structure prediction exercise. The method reduces sampling bottlenecks, exhibits strong convergence, and demonstrates better performance than the popular fragment assembly method, ROSETTA, on relatively larger proteins with a length of more than 150 residues in our benchmark set. FUSION is freely available through a web server at http://protein.rnet.missouri.edu/FUSION/.

  7. Environmental probabilistic quantitative assessment methodologies

    USGS Publications Warehouse

    Crovelli, R.A.

    1995-01-01

    In this paper, four petroleum resource assessment methodologies are presented as possible pollution assessment methodologies, even though petroleum as a resource is desirable, whereas pollution is undesirable. A methodology is defined in this paper to consist of a probability model and a probabilistic method, where the method is used to solve the model. The following four basic types of probability models are considered: 1) direct assessment, 2) accumulation size, 3) volumetric yield, and 4) reservoir engineering. Three of the four petroleum resource assessment methodologies were written as microcomputer systems, viz. TRIAGG for direct assessment, APRAS for accumulation size, and FASPU for reservoir engineering. A fourth microcomputer system termed PROBDIST supports the three assessment systems. The three assessment systems have different probability models but the same type of probabilistic method. The type of advantages of the analytic method are in computational speed and flexibility, making it ideal for a microcomputer. -from Author

  8. Synaptic Computation Underlying Probabilistic Inference

    PubMed Central

    Soltani, Alireza; Wang, Xiao-Jing

    2010-01-01

    In this paper we propose that synapses may be the workhorse of neuronal computations that underlie probabilistic reasoning. We built a neural circuit model for probabilistic inference when information provided by different sensory cues needs to be integrated, and the predictive powers of individual cues about an outcome are deduced through experience. We found that bounded synapses naturally compute, through reward-dependent plasticity, the posterior probability that a choice alternative is correct given that a cue is presented. Furthermore, a decision circuit endowed with such synapses makes choices based on the summated log posterior odds and performs near-optimal cue combination. The model is validated by reproducing salient observations of, and provide insights into, a monkey experiment using a categorization task. Our model thus suggests a biophysical instantiation of the Bayesian decision rule, while predicting important deviations from it similar to ‘base-rate neglect’ observed in human studies when alternatives have unequal priors. PMID:20010823

  9. PCEMCAN - Probabilistic Ceramic Matrix Composites Analyzer: User's Guide, Version 1.0

    NASA Technical Reports Server (NTRS)

    Shah, Ashwin R.; Mital, Subodh K.; Murthy, Pappu L. N.

    1998-01-01

    PCEMCAN (Probabalistic CEramic Matrix Composites ANalyzer) is an integrated computer code developed at NASA Lewis Research Center that simulates uncertainties associated with the constituent properties, manufacturing process, and geometric parameters of fiber reinforced ceramic matrix composites and quantifies their random thermomechanical behavior. The PCEMCAN code can perform the deterministic as well as probabilistic analyses to predict thermomechanical properties. This User's guide details the step-by-step procedure to create input file and update/modify the material properties database required to run PCEMCAN computer code. An overview of the geometric conventions, micromechanical unit cell, nonlinear constitutive relationship and probabilistic simulation methodology is also provided in the manual. Fast probability integration as well as Monte-Carlo simulation methods are available for the uncertainty simulation. Various options available in the code to simulate probabilistic material properties and quantify sensitivity of the primitive random variables have been described. The description of deterministic as well as probabilistic results have been described using demonstration problems. For detailed theoretical description of deterministic and probabilistic analyses, the user is referred to the companion documents "Computational Simulation of Continuous Fiber-Reinforced Ceramic Matrix Composite Behavior," NASA TP-3602, 1996 and "Probabilistic Micromechanics and Macromechanics for Ceramic Matrix Composites", NASA TM 4766, June 1997.

  10. Demystifying the Search Button

    PubMed Central

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  11. Binary Encoded-Prototype Tree for Probabilistic Model Building GP

    NASA Astrophysics Data System (ADS)

    Yanase, Toshihiko; Hasegawa, Yoshihiko; Iba, Hitoshi

    In recent years, program evolution algorithms based on the estimation of distribution algorithm (EDA) have been proposed to improve search ability of genetic programming (GP) and to overcome GP-hard problems. One such method is the probabilistic prototype tree (PPT) based algorithm. The PPT based method explores the optimal tree structure by using the full tree whose number of child nodes is maximum among possible trees. This algorithm, however, suffers from problems arising from function nodes having different number of child nodes. These function nodes cause intron nodes, which do not affect the fitness function. Moreover, the function nodes having many child nodes increase the search space and the number of samples necessary for properly constructing the probabilistic model. In order to solve this problem, we propose binary encoding for PPT. In this article, we convert each function node to a subtree of binary nodes where the converted tree is correct in grammar. Our method reduces ineffectual search space, and the binary encoded tree is able to express the same tree structures as the original method. The effectiveness of the proposed method is demonstrated through the use of two computational experiments.

  12. Probabilistic methods for rotordynamics analysis

    NASA Technical Reports Server (NTRS)

    Wu, Y.-T.; Torng, T. Y.; Millwater, H. R.; Fossum, A. F.; Rheinfurth, M. H.

    1991-01-01

    This paper summarizes the development of the methods and a computer program to compute the probability of instability of dynamic systems that can be represented by a system of second-order ordinary linear differential equations. Two instability criteria based upon the eigenvalues or Routh-Hurwitz test functions are investigated. Computational methods based on a fast probability integration concept and an efficient adaptive importance sampling method are proposed to perform efficient probabilistic analysis. A numerical example is provided to demonstrate the methods.

  13. Probabilistic Simulation for Nanocomposite Characterization

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.; Coroneos, Rula M.

    2007-01-01

    A unique probabilistic theory is described to predict the properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths properties of a mononanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions.

  14. Applications of Probabilistic Risk Assessment

    SciTech Connect

    Burns, K.J.; Chapman, J.R.; Follen, S.M.; O'Regan, P.J. )

    1991-05-01

    This report provides a summary of potential and actual applications of Probabilistic Risk Assessment (PRA) technology and insights. Individual applications are derived from the experiences of a number of US nuclear utilities. This report identifies numerous applications of PRA techniques beyond those typically associated with PRAs. In addition, believing that the future use of PRA techniques should not be limited to those of the past, areas of plant operations, maintenance, and financial resource allocation are discussed. 9 refs., 3 tabs.

  15. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  16. Probabilistic Assessment of Cancer Risk for Astronauts on Lunar Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2009-01-01

    During future lunar missions, exposure to solar particle events (SPEs) is a major safety concern for crew members during extra-vehicular activities (EVAs) on the lunar surface or Earth-to-moon transit. NASA s new lunar program anticipates that up to 15% of crew time may be on EVA, with minimal radiation shielding. For the operational challenge to respond to events of unknown size and duration, a probabilistic risk assessment approach is essential for mission planning and design. Using the historical database of proton measurements during the past 5 solar cycles, a typical hazard function for SPE occurrence was defined using a non-homogeneous Poisson model as a function of time within a non-specific future solar cycle of 4000 days duration. Distributions ranging from the 5th to 95th percentile of particle fluences for a specified mission period were simulated. Organ doses corresponding to particle fluences at the median and at the 95th percentile for a specified mission period were assessed using NASA s baryon transport model, BRYNTRN. The cancer fatality risk for astronauts as functions of age, gender, and solar cycle activity were then analyzed. The probability of exceeding the NASA 30- day limit of blood forming organ (BFO) dose inside a typical spacecraft was calculated. Future work will involve using this probabilistic risk assessment approach to SPE forecasting, combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  17. Citation Searching: Search Smarter & Find More

    ERIC Educational Resources Information Center

    Hammond, Chelsea C.; Brown, Stephanie Willen

    2008-01-01

    The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…

  18. Probabilistic Aeroelastic Analysis of Turbomachinery Components

    NASA Technical Reports Server (NTRS)

    Reddy, T. S. R.; Mital, S. K.; Stefko, G. L.

    2004-01-01

    A probabilistic approach is described for aeroelastic analysis of turbomachinery blade rows. Blade rows with subsonic flow and blade rows with supersonic flow with subsonic leading edge are considered. To demonstrate the probabilistic approach, the flutter frequency, damping and forced response of a blade row representing a compressor geometry is considered. The analysis accounts for uncertainties in structural and aerodynamic design variables. The results are presented in the form of probabilistic density function (PDF) and sensitivity factors. For subsonic flow cascade, comparisons are also made with different probabilistic distributions, probabilistic methods, and Monte-Carlo simulation. The approach shows that the probabilistic approach provides a more realistic and systematic way to assess the effect of uncertainties in design variables on the aeroelastic instabilities and response.

  19. Typing mineral deposits using their associated rocks, grades and tonnages using a probabilistic neural network

    USGS Publications Warehouse

    Singer, D.A.

    2006-01-01

    A probabilistic neural network is employed to classify 1610 mineral deposits into 18 types using tonnage, average Cu, Mo, Ag, Au, Zn, and Pb grades, and six generalized rock types. The purpose is to examine whether neural networks might serve for integrating geoscience information available in large mineral databases to classify sites by deposit type. Successful classifications of 805 deposits not used in training - 87% with grouped porphyry copper deposits - and the nature of misclassifications demonstrate the power of probabilistic neural networks and the value of quantitative mineral-deposit models. The results also suggest that neural networks can classify deposits as well as experienced economic geologists. ?? International Association for Mathematical Geology 2006.

  20. Probabilistic Assessment of Cancer Risk from Solar Particle Events

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2010-01-01

    For long duration missions outside of the protection of the Earth s magnetic field, space radiation presents significant health risks including cancer mortality. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons (less than several hundred MeV); and galactic cosmic ray (GCR), which include high energy protons and heavy ions. While the frequency distribution of SPEs depends strongly upon the phase within the solar activity cycle, the individual SPE occurrences themselves are random in nature. We estimated the probability of SPE occurrence using a non-homogeneous Poisson model to fit the historical database of proton measurements. Distributions of particle fluences of SPEs for a specified mission period were simulated ranging from its 5 th to 95th percentile to assess the cancer risk distribution. Spectral variability of SPEs was also examined, because the detailed energy spectra of protons are important especially at high energy levels for assessing the cancer risk associated with energetic particles for large events. We estimated the overall cumulative probability of GCR environment for a specified mission period using a solar modulation model for the temporal characterization of the GCR environment represented by the deceleration potential (^). Probabilistic assessment of cancer fatal risk was calculated for various periods of lunar and Mars missions. This probabilistic approach to risk assessment from space radiation is in support of mission design and operational planning for future manned space exploration missions. In future work, this probabilistic approach to the space radiation will be combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  1. MetaBase—the wiki-database of biological databases

    PubMed Central

    Bolser, Dan M.; Chibon, Pierre-Yves; Palopoli, Nicolas; Gong, Sungsam; Jacob, Daniel; Angel, Victoria Dominguez Del; Swan, Dan; Bassi, Sebastian; González, Virginia; Suravajhala, Prashanth; Hwang, Seungwoo; Romano, Paolo; Edwards, Rob; Bishop, Bryan; Eargle, John; Shtatland, Timur; Provart, Nicholas J.; Clements, Dave; Renfro, Daniel P.; Bhak, Daeui; Bhak, Jong

    2012-01-01

    Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project. PMID:22139927

  2. USGS Dam Removal Science Database

    USGS Publications Warehouse

    Bellmore, J. Ryan; Vittum, Katherine; Duda, Jeff J.; Greene, Samantha L.

    2015-01-01

    This database is the result of an extensive literature search aimed at identifying documents relevant to the emerging field of dam removal science. In total the database contains 179 citations that contain empirical monitoring information associated with 130 different dam removals across the United States and abroad. Data includes publications through 2014 and supplemented with the U.S. Army Corps of Engineers National Inventory of Dams database, U.S. Geological Survey National Water Information System and aerial photos to estimate locations when coordinates were not provided. Publications were located using the Web of Science, Google Scholar, and Clearinghouse for Dam Removal Information.

  3. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  4. Probabilistic Evaluation of Blade Impact Damage

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Abumeri, G. H.

    2003-01-01

    The response to high velocity impact of a composite blade is probabilistically evaluated. The evaluation is focused on quantifying probabilistically the effects of uncertainties (scatter) in the variables that describe the impact, the blade make-up (geometry and material), the blade response (displacements, strains, stresses, frequencies), the blade residual strength after impact, and the blade damage tolerance. The results of probabilistic evaluations results are in terms of probability cumulative distribution functions and probabilistic sensitivities. Results show that the blade has relatively low damage tolerance at 0.999 probability of structural failure and substantial at 0.01 probability.

  5. Semantics for Biological Data Resource: Cell Image Database

    National Institute of Standards and Technology Data Gateway

    SRD 165 NIST Semantics for Biological Data Resource: Cell Image Database (Web, free access)   This Database is a prototype to test concepts for semantic searching of cell image data based on experimental details.

  6. Aggregated Interdisciplinary Databases and the Needs of Undergraduate Researchers

    ERIC Educational Resources Information Center

    Fister, Barbara; Gilbert, Julie; Fry, Amy Ray

    2008-01-01

    After seeing growing frustration among inexperienced undergraduate researchers searching a popular aggregated interdisciplinary database, the authors questioned whether the leading interdisciplinary databases are serving undergraduates' needs. As a preliminary exploration of this question, the authors queried vendors, analyzed their marketing…

  7. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  8. Search Moves Made by Novice End Users.

    ERIC Educational Resources Information Center

    Wildemuth, Barbara M.; And Others

    1992-01-01

    Describes a study at the University of North Carolina at Chapel Hill that analyzed the transaction logs of medical students' searches of a factual database to determine the overall frequency of search moves, the interaction between the problem statement and students' search strategies, the search moves selected, and the tactics used by students.…

  9. Probabilistic risk assessment: Number 219

    SciTech Connect

    Bari, R.A.

    1985-11-13

    This report describes a methodology for analyzing the safety of nuclear power plants. A historical overview of plants in the US is provided, and past, present, and future nuclear safety and risk assessment are discussed. A primer on nuclear power plants is provided with a discussion of pressurized water reactors (PWR) and boiling water reactors (BWR) and their operation and containment. Probabilistic Risk Assessment (PRA), utilizing both event-tree and fault-tree analysis, is discussed as a tool in reactor safety, decision making, and communications. (FI)

  10. Probabilistic approach to EMP assessment

    SciTech Connect

    Bevensee, R.M.; Cabayan, H.S.; Deadrick, F.J.; Martin, L.C.; Mensing, R.W.

    1980-09-01

    The development of nuclear EMP hardness requirements must account for uncertainties in the environment, in interaction and coupling, and in the susceptibility of subsystems and components. Typical uncertainties of the last two kinds are briefly summarized, and an assessment methodology is outlined, based on a probabilistic approach that encompasses the basic concepts of reliability. It is suggested that statements of survivability be made compatible with system reliability. Validation of the approach taken for simple antenna/circuit systems is performed with experiments and calculations that involve a Transient Electromagnetic Range, numerical antenna modeling, separate device failure data, and a failure analysis computer program.

  11. Probabilistic Simulation for Nanocomposite Fracture

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2010-01-01

    A unique probabilistic theory is described to predict the uniaxial strengths and fracture properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths and fracture of a nanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions from low probability to high.

  12. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and Specificity analysis.

    SciTech Connect

    Kapp, Eugene; Schutz, Frederick; Connolly, Lisa M.; Chakel, John A.; Meza, Jose E.; Miller, Christine A.; Fenyo, David; Eng, Jimmy K.; Adkins, Joshua N.; Omenn, Gilbert; Simpson, Richard

    2005-08-01

    MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X-Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, Peptide Prophet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X-Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of ''consensus scoring'', i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

  13. Artificial Intelligence Databases: A Survey and Comparison.

    ERIC Educational Resources Information Center

    Stern, David

    1990-01-01

    Identifies and describes online databases containing references to materials on artificial intelligence, robotics, and expert systems, and compares them in terms of scope and usage. Recommendations for conducting online searches on artificial intelligence and related fields are offered. (CLB)

  14. New probabilistic graphical models for genetic regulatory networks studies.

    PubMed

    Wang, Junbai; Cheung, Leo Wang-Kit; Delabie, Jan

    2005-12-01

    This paper introduces two new probabilistic graphical models for reconstruction of genetic regulatory networks using DNA microarray data. One is an independence graph (IG) model with either a forward or a backward search algorithm and the other one is a Gaussian network (GN) model with a novel greedy search method. The performances of both models were evaluated on four MAPK pathways in yeast and three simulated data sets. Generally, an IG model provides a sparse graph but a GN model produces a dense graph where more information about gene-gene interactions may be preserved. The results of our proposed models were compared with several other commonly used models, and our models have shown to give superior performance. Additionally, we found the same common limitations in the prediction of genetic regulatory networks when using only DNA microarray data.

  15. Expert searching in public health

    PubMed Central

    Alpi, Kristine M.

    2005-01-01

    Objective: The article explores the characteristics of public health information needs and the resources available to address those needs that distinguish it as an area of searching requiring particular expertise. Methods: Public health searching activities from reference questions and literature search requests at a large, urban health department library were reviewed to identify the challenges in finding relevant public health information. Results: The terminology of the information request frequently differed from the vocabularies available in the databases. Searches required the use of multiple databases and/or Web resources with diverse interfaces. Issues of the scope and features of the databases relevant to the search questions were considered. Conclusion: Expert searching in public health differs from other types of expert searching in the subject breadth and technical demands of the databases to be searched, the fluidity and lack of standardization of the vocabulary, and the relative scarcity of high-quality investigations at the appropriate level of geographic specificity. Health sciences librarians require a broad exposure to databases, gray literature, and public health terminology to perform as expert searchers in public health. PMID:15685281

  16. High Resolution Soil Water from Regional Databases and Satellite Images

    NASA Technical Reports Server (NTRS)

    Morris, Robin D.; Smelyanskly, Vadim N.; Coughlin, Joseph; Dungan, Jennifer; Clancy, Daniel (Technical Monitor)

    2002-01-01

    This viewgraph presentation provides information on the ways in which plant growth can be inferred from satellite data and can then be used to infer soil water. There are several steps in this process, the first of which is the acquisition of data from satellite observations and relevant information databases such as the State Soil Geographic Database (STATSGO). Then probabilistic analysis and inversion with the Bayes' theorem reveals sources of uncertainty. The Markov chain Monte Carlo method is also used.

  17. Integration of Information Retrieval and Database Management Systems.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Raghavan, Vijay V.

    1988-01-01

    Discusses the motivation for integrating information retrieval and database management systems, and proposes a probabilistic retrieval model in which records in a file may be composed of attributes (formatted data items) and descriptors (content indicators). The details and resolutions of difficulties involved in integrating such systems are…

  18. Probabilistic Cue Combination: Less Is More

    ERIC Educational Resources Information Center

    Yurovsky, Daniel; Boyer, Ty W.; Smith, Linda B.; Yu, Chen

    2013-01-01

    Learning about the structure of the world requires learning probabilistic relationships: rules in which cues do not predict outcomes with certainty. However, in some cases, the ability to track probabilistic relationships is a handicap, leading adults to perform non-normatively in prediction tasks. For example, in the "dilution effect,"…

  19. Error Discounting in Probabilistic Category Learning

    ERIC Educational Resources Information Center

    Craig, Stewart; Lewandowsky, Stephan; Little, Daniel R.

    2011-01-01

    The assumption in some current theories of probabilistic categorization is that people gradually attenuate their learning in response to unavoidable error. However, existing evidence for this error discounting is sparse and open to alternative interpretations. We report 2 probabilistic-categorization experiments in which we investigated error…

  20. Probabilistic Tsunami Hazard Assessment: the Seaside, Oregon Pilot Study

    NASA Astrophysics Data System (ADS)

    Gonzalez, F. I.; Geist, E. L.; Synolakis, C.; Titov, V. V.

    2004-12-01

    A pilot study of Seaside, Oregon is underway, to develop methodologies for probabilistic tsunami hazard assessments that can be incorporated into Flood Insurance Rate Maps (FIRMs) developed by FEMA's National Flood Insurance Program (NFIP). Current NFIP guidelines for tsunami hazard assessment rely on the science, technology and methodologies developed in the 1970s; although generally regarded as groundbreaking and state-of-the-art for its time, this approach is now superseded by modern methods that reflect substantial advances in tsunami research achieved in the last two decades. In particular, post-1990 technical advances include: improvements in tsunami source specification; improved tsunami inundation models; better computational grids by virtue of improved bathymetric and topographic databases; a larger database of long-term paleoseismic and paleotsunami records and short-term, historical earthquake and tsunami records that can be exploited to develop improved probabilistic methodologies; better understanding of earthquake recurrence and probability models. The NOAA-led U.S. National Tsunami Hazard Mitigation Program (NTHMP), in partnership with FEMA, USGS, NSF and Emergency Management and Geotechnical agencies of the five Pacific States, incorporates these advances into site-specific tsunami hazard assessments for coastal communities in Alaska, California, Hawaii, Oregon and Washington. NTHMP hazard assessment efforts currently focus on developing deterministic, "credible worst-case" scenarios that provide valuable guidance for hazard mitigation and emergency management. The NFIP focus, on the other hand, is on actuarial needs that require probabilistic hazard assessments such as those that characterize 100- and 500-year flooding events. There are clearly overlaps in NFIP and NTHMP objectives. NTHMP worst-case scenario assessments that include an estimated probability of occurrence could benefit the NFIP; NFIP probabilistic assessments of 100- and 500-yr

  1. Probabilistic analysis of tsunami hazards

    USGS Publications Warehouse

    Geist, E.L.; Parsons, T.

    2006-01-01

    Determining the likelihood of a disaster is a key component of any comprehensive hazard assessment. This is particularly true for tsunamis, even though most tsunami hazard assessments have in the past relied on scenario or deterministic type models. We discuss probabilistic tsunami hazard analysis (PTHA) from the standpoint of integrating computational methods with empirical analysis of past tsunami runup. PTHA is derived from probabilistic seismic hazard analysis (PSHA), with the main difference being that PTHA must account for far-field sources. The computational methods rely on numerical tsunami propagation models rather than empirical attenuation relationships as in PSHA in determining ground motions. Because a number of source parameters affect local tsunami runup height, PTHA can become complex and computationally intensive. Empirical analysis can function in one of two ways, depending on the length and completeness of the tsunami catalog. For site-specific studies where there is sufficient tsunami runup data available, hazard curves can primarily be derived from empirical analysis, with computational methods used to highlight deficiencies in the tsunami catalog. For region-wide analyses and sites where there are little to no tsunami data, a computationally based method such as Monte Carlo simulation is the primary method to establish tsunami hazards. Two case studies that describe how computational and empirical methods can be integrated are presented for Acapulco, Mexico (site-specific) and the U.S. Pacific Northwest coastline (region-wide analysis).

  2. Software for Probabilistic Risk Reduction

    NASA Technical Reports Server (NTRS)

    Hensley, Scott; Michel, Thierry; Madsen, Soren; Chapin, Elaine; Rodriguez, Ernesto

    2004-01-01

    A computer program implements a methodology, denoted probabilistic risk reduction, that is intended to aid in planning the development of complex software and/or hardware systems. This methodology integrates two complementary prior methodologies: (1) that of probabilistic risk assessment and (2) a risk-based planning methodology, implemented in a prior computer program known as Defect Detection and Prevention (DDP), in which multiple requirements and the beneficial effects of risk-mitigation actions are taken into account. The present methodology and the software are able to accommodate both process knowledge (notably of the efficacy of development practices) and product knowledge (notably of the logical structure of a system, the development of which one seeks to plan). Estimates of the costs and benefits of a planned development can be derived. Functional and non-functional aspects of software can be taken into account, and trades made among them. It becomes possible to optimize the planning process in the sense that it becomes possible to select the best suite of process steps and design choices to maximize the expectation of success while remaining within budget.

  3. Is probabilistic evidence a source of knowledge?

    PubMed

    Friedman, Ori; Turri, John

    2015-07-01

    We report a series of experiments examining whether people ascribe knowledge for true beliefs based on probabilistic evidence. Participants were less likely to ascribe knowledge for beliefs based on probabilistic evidence than for beliefs based on perceptual evidence (Experiments 1 and 2A) or testimony providing causal information (Experiment 2B). Denial of knowledge for beliefs based on probabilistic evidence did not arise because participants viewed such beliefs as unjustified, nor because such beliefs leave open the possibility of error. These findings rule out traditional philosophical accounts for why probabilistic evidence does not produce knowledge. The experiments instead suggest that people deny knowledge because they distrust drawing conclusions about an individual based on reasoning about the population to which it belongs, a tendency previously identified by "judgment and decision making" researchers. Consistent with this, participants were more willing to ascribe knowledge for beliefs based on probabilistic evidence that is specific to a particular case (Experiments 3A and 3B).

  4. Markovian Search Games in Heterogeneous Spaces

    SciTech Connect

    Griffin, Christopher H

    2009-01-01

    We consider how to search for a mobile evader in a large heterogeneous region when sensors are used for detection. Sensors are modeled using probability of detection. Due to environmental effects, this probability will not be constant over the entire region. We map this problem to a graph search problem and, even though deterministic graph search is NP-complete, we derive a tractable, optimal, probabilistic search strategy. We do this by defining the problem as a differential game played on a Markov chain. We prove that this strategy is optimal in the sense of Nash. Simulations of an example problem illustrate our approach and verify our claims.

  5. Are we safe? NLM's household products database.

    PubMed

    Bronson Fitzpatrick, Roberta

    2004-01-01

    This column features an overview of the Division of Specialized Information Services, National Library of Medicine Household Products Database. Basic searching techniques are presented, as well as a brief overview of the data contained in this file. The Household Products Database contains information on chemical ingredients in various products used in U.S. homes.

  6. Six Online Periodical Databases: A Librarian's View.

    ERIC Educational Resources Information Center

    Willems, Harry

    1999-01-01

    Compares the following World Wide Web-based periodical databases, focusing on their usefulness in K-12 school libraries: EBSCO, Electric Library, Facts on File, SIRS, Wilson, and UMI. Search interfaces, display options, help screens, printing, home access, copyright restrictions, database administration, and making a decision are discussed. A…

  7. Web Database Development: Implications for Academic Publishing.

    ERIC Educational Resources Information Center

    Fernekes, Bob

    This paper discusses the preliminary planning, design, and development of a pilot project to create an Internet accessible database and search tool for locating and distributing company data and scholarly work. Team members established four project objectives: (1) to develop a Web accessible database and decision tool that creates Web pages on the…

  8. Developing an Inhouse Database from Online Sources.

    ERIC Educational Resources Information Center

    Smith-Cohen, Deborah

    1993-01-01

    Describes the development of an in-house bibliographic database by the U.S. Army Corp of Engineers Cold Regions Research and Engineering Laboratory on arctic wetlands research. Topics discussed include planning; identifying relevant search terms and commercial online databases; downloading citations; criteria for software selection; management…

  9. Electronic Reference Library: Silverplatter's Database Networking Solution.

    ERIC Educational Resources Information Center

    Millea, Megan

    Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…

  10. RLIN Special Databases: Serving the Humanist.

    ERIC Educational Resources Information Center

    Muratori, Fred

    1990-01-01

    Describes online databases available through the Research Libraries Information Network (RLIN) that focus on research in the humanities. A survey of five databases (AVERY, ESTC, RIPD, SCIPIO, and AMC) is presented that includes contents, search capabilities, and record formats; and future plans are discussed. (12 references) (LRW)

  11. A database/knowledge structure for a robotics vision system

    NASA Technical Reports Server (NTRS)

    Dearholt, D. W.; Gonzales, N. N.

    1987-01-01

    Desirable properties of robotics vision database systems are given, and structures which possess properties appropriate for some aspects of such database systems are examined. Included in the structures discussed is a family of networks in which link membership is determined by measures of proximity between pairs of the entities stored in the database. This type of network is shown to have properties which guarantee that the search for a matching feature vector is monotonic. That is, the database can be searched with no backtracking, if there is a feature vector in the database which matches the feature vector of the external entity which is to be identified. The construction of the database is discussed, and the search procedure is presented. A section on the support provided by the database for description of the decision-making processes and the search path is also included.

  12. NASA Taxonomies for Searching Problem Reports and FMEAs

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Throop, David R.

    2006-01-01

    Many types of hazard and risk analyses are used during the life cycle of complex systems, including Failure Modes and Effects Analysis (FMEA), Hazard Analysis, Fault Tree and Event Tree Analysis, Probabilistic Risk Assessment, Reliability Analysis and analysis of Problem Reporting and Corrective Action (PRACA) databases. The success of these methods depends on the availability of input data and the analysts knowledge. Standard nomenclature can increase the reusability of hazard, risk and problem data. When nomenclature in the source texts is not standard, taxonomies with mapping words (sets of rough synonyms) can be combined with semantic search to identify items and tag them with metadata based on a rich standard nomenclature. Semantic search uses word meanings in the context of parsed phrases to find matches. The NASA taxonomies provide the word meanings. Spacecraft taxonomies and ontologies (generalization hierarchies with attributes and relationships, based on terms meanings) are being developed for types of subsystems, functions, entities, hazards and failures. The ontologies are broad and general, covering hardware, software and human systems. Semantic search of Space Station texts was used to validate and extend the taxonomies. The taxonomies have also been used to extract system connectivity (interaction) models and functions from requirements text. Now the Reconciler semantic search tool and the taxonomies are being applied to improve search in the Space Shuttle PRACA database, to discover recurring patterns of failure. Usual methods of string search and keyword search fall short because the entries are terse and have numerous shortcuts (irregular abbreviations, nonstandard acronyms, cryptic codes) and modifier words cannot be used in sentence context to refine the search. The limited and fixed FMEA categories associated with the entries do not make the fine distinctions needed in the search. The approach assigns PRACA report titles to problem classes in

  13. Windows on the brain: the emerging role of atlases and databases in neuroscience

    NASA Technical Reports Server (NTRS)

    Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

    2002-01-01

    Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.

  14. Spectroscopic data for an astronomy database

    NASA Technical Reports Server (NTRS)

    Parkinson, W. H.; Smith, Peter L.

    1995-01-01

    Very few of the atomic and molecular data used in analyses of astronomical spectra are currently available in World Wide Web (WWW) databases that are searchable with hypertext browsers. We have begun to rectify this situation by making extensive atomic data files available with simple search procedures. We have also established links to other on-line atomic and molecular databases. All can be accessed from our database homepage with URL: http:// cfa-www.harvard.edu/ amp/ data/ amdata.html.

  15. A Probabilistic Model for Reducing Medication Errors

    PubMed Central

    Nguyen, Phung Anh; Syed-Abdul, Shabbir; Iqbal, Usman; Hsu, Min-Huei; Huang, Chen-Ling; Li, Hsien-Chang; Clinciu, Daniel Livius; Jian, Wen-Shan; Li, Yu-Chuan Jack

    2013-01-01

    Background Medication errors are common, life threatening, costly but preventable. Information technology and automated systems are highly efficient for preventing medication errors and therefore widely employed in hospital settings. The aim of this study was to construct a probabilistic model that can reduce medication errors by identifying uncommon or rare associations between medications and diseases. Methods and Finding(s) Association rules of mining techniques are utilized for 103.5 million prescriptions from Taiwan’s National Health Insurance database. The dataset included 204.5 million diagnoses with ICD9-CM codes and 347.7 million medications by using ATC codes. Disease-Medication (DM) and Medication-Medication (MM) associations were computed by their co-occurrence and associations’ strength were measured by the interestingness or lift values which were being referred as Q values. The DMQs and MMQs were used to develop the AOP model to predict the appropriateness of a given prescription. Validation of this model was done by comparing the results of evaluation performed by the AOP model and verified by human experts. The results showed 96% accuracy for appropriate and 45% accuracy for inappropriate prescriptions, with a sensitivity and specificity of 75.9% and 89.5%, respectively. Conclusions We successfully developed the AOP model as an efficient tool for automatic identification of uncommon or rare associations between disease-medication and medication-medication in prescriptions. The AOP model helps to reduce medication errors by alerting physicians, improving the patients’ safety and the overall quality of care. PMID:24312659

  16. Probabilistic Reasoning for Plan Robustness

    NASA Technical Reports Server (NTRS)

    Schaffer, Steve R.; Clement, Bradley J.; Chien, Steve A.

    2005-01-01

    A planning system must reason about the uncertainty of continuous variables in order to accurately project the possible system state over time. A method is devised for directly reasoning about the uncertainty in continuous activity duration and resource usage for planning problems. By representing random variables as parametric distributions, computing projected system state can be simplified in some cases. Common approximation and novel methods are compared for over-constrained and lightly constrained domains. The system compares a few common approximation methods for an iterative repair planner. Results show improvements in robustness over the conventional non-probabilistic representation by reducing the number of constraint violations witnessed by execution. The improvement is more significant for larger problems and problems with higher resource subscription levels but diminishes as the system is allowed to accept higher risk levels.

  17. Probabilistic cloning of equidistant states

    SciTech Connect

    Jimenez, O.; Roa, Luis; Delgado, A.

    2010-08-15

    We study the probabilistic cloning of equidistant states. These states are such that the inner product between them is a complex constant or its conjugate. Thereby, it is possible to study their cloning in a simple way. In particular, we are interested in the behavior of the cloning probability as a function of the phase of the overlap among the involved states. We show that for certain families of equidistant states Duan and Guo's cloning machine leads to cloning probabilities lower than the optimal unambiguous discrimination probability of equidistant states. We propose an alternative cloning machine whose cloning probability is higher than or equal to the optimal unambiguous discrimination probability for any family of equidistant states. Both machines achieve the same probability for equidistant states whose inner product is a positive real number.

  18. Probabilistic direct counterfactual quantum communication

    NASA Astrophysics Data System (ADS)

    Zhang, Sheng

    2017-02-01

    It is striking that the quantum Zeno effect can be used to launch a direct counterfactual communication between two spatially separated parties, Alice and Bob. So far, existing protocols of this type only provide a deterministic counterfactual communication service. However, this counterfactuality should be payed at a price. Firstly, the transmission time is much longer than a classical transmission costs. Secondly, the chained-cycle structure makes them more sensitive to channel noises. Here, we extend the idea of counterfactual communication, and present a probabilistic-counterfactual quantum communication protocol, which is proved to have advantages over the deterministic ones. Moreover, the presented protocol could evolve to a deterministic one solely by adjusting the parameters of the beam splitters. Project supported by the National Natural Science Foundation of China (Grant No. 61300203).

  19. Probabilistic Fatigue Damage Program (FATIG)

    NASA Technical Reports Server (NTRS)

    Michalopoulos, Constantine

    2012-01-01

    FATIG computes fatigue damage/fatigue life using the stress rms (root mean square) value, the total number of cycles, and S-N curve parameters. The damage is computed by the following methods: (a) traditional method using Miner s rule with stress cycles determined from a Rayleigh distribution up to 3*sigma; and (b) classical fatigue damage formula involving the Gamma function, which is derived from the integral version of Miner's rule. The integration is carried out over all stress amplitudes. This software solves the problem of probabilistic fatigue damage using the integral form of the Palmgren-Miner rule. The software computes fatigue life using an approach involving all stress amplitudes, up to N*sigma, as specified by the user. It can be used in the design of structural components subjected to random dynamic loading, or by any stress analyst with minimal training for fatigue life estimates of structural components.

  20. Development of probabilistic multimedia multipathway computer codes.

    SciTech Connect

    Yu, C.; LePoire, D.; Gnanapragasam, E.; Arnish, J.; Kamboj, S.; Biwer, B. M.; Cheng, J.-J.; Zielen, A. J.; Chen, S. Y.; Mo, T.; Abu-Eid, R.; Thaggard, M.; Sallo, A., III.; Peterson, H., Jr.; Williams, W. A.; Environmental Assessment; NRC; EM

    2002-01-01

    The deterministic multimedia dose/risk assessment codes RESRAD and RESRAD-BUILD have been widely used for many years for evaluation of sites contaminated with residual radioactive materials. The RESRAD code applies to the cleanup of sites (soils) and the RESRAD-BUILD code applies to the cleanup of buildings and structures. This work describes the procedure used to enhance the deterministic RESRAD and RESRAD-BUILD codes for probabilistic dose analysis. A six-step procedure was used in developing default parameter distributions and the probabilistic analysis modules. These six steps include (1) listing and categorizing parameters; (2) ranking parameters; (3) developing parameter distributions; (4) testing parameter distributions for probabilistic analysis; (5) developing probabilistic software modules; and (6) testing probabilistic modules and integrated codes. The procedures used can be applied to the development of other multimedia probabilistic codes. The probabilistic versions of RESRAD and RESRAD-BUILD codes provide tools for studying the uncertainty in dose assessment caused by uncertain input parameters. The parameter distribution data collected in this work can also be applied to other multimedia assessment tasks and multimedia computer codes.

  1. Searches Conducted for Engineers.

    ERIC Educational Resources Information Center

    Lorenz, Patricia

    This paper reports an industrial information specialist's experience in performing online searches for engineers and surveys the databases used. Engineers seeking assistance fall into three categories: (1) those who recognize the value of online retrieval; (2) referrals by colleagues; and (3) those who do not seek help. As more successful searches…

  2. Hierarchical searching in model-based LADAR ATR using statistical separability tests

    NASA Astrophysics Data System (ADS)

    DelMarco, Stephen; Sobel, Erik; Douglas, Joel

    2006-05-01

    In this work we investigate simultaneous object identification improvement and efficient library search for model-based object recognition applications. We develop an algorithm to provide efficient, prioritized, hierarchical searching of the object model database. A common approach to model-based object recognition chooses the object label corresponding to the best match score. However, due to corrupting effects the best match score does not always correspond to the correct object model. To address this problem, we propose a search strategy which exploits information contained in a number of representative elements of the library to drill down to a small class with high probability of containing the object. We first optimally partition the library into a hierarchic taxonomy of disjoint classes. A small number of representative elements are used to characterize each object model class. At each hierarchy level, the observed object is matched against the representative elements of each class to generate score sets. A hypothesis testing problem, using a distribution-free statistical test, is defined on the score sets and used to choose the appropriate class for a prioritized search. We conduct a probabilistic analysis of the computational cost savings, and provide a formula measuring the computational advantage of the proposed approach. We generate numerical results using match scores derived from matching highly-detailed CAD models of civilian ground vehicles used in 3-D LADAR ATR. We present numerical results showing effects on classification performance of significance level and representative element number in the score set hypothesis testing problem.

  3. Needle Federated Search Engine

    SciTech Connect

    2009-12-01

    The Idaho National Laboratory (INL) has combined a number of technologies, tools, and resources to accomplish a new means of federating search results. The resulting product is a search engine called Needle, an open-source-based tool that the INL uses internally for researching across a wide variety of information repositories. Needle has a flexible search interface that allows end users to point at any available data source. A user can select multiple sources such as commercial databases (Web of Science, Engineering Index), external resources (WorldCat, Google Scholar), and internal corporate resources (email, document management system, library collections) in a single interface with one search query. In the future, INL hopes to offer this open-source engine to the public. This session will outline the development processes for making Needle™s search interface and simplifying the federation of internal and external data sources.

  4. A Search Engine Features Comparison.

    ERIC Educational Resources Information Center

    Vorndran, Gerald

    Until recently, the World Wide Web (WWW) public access search engines have not included many of the advanced commands, options, and features commonly available with the for-profit online database user interfaces, such as DIALOG. This study evaluates the features and characteristics common to both types of search interfaces, examines the Web search…

  5. Probabilistic population projections with migration uncertainty.

    PubMed

    Azose, Jonathan J; Ševčíková, Hana; Raftery, Adrian E

    2016-06-07

    We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations' Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find that uncertainty in migration projection is a substantial contributor to uncertainty in population projections for many countries. Prediction intervals for the populations of Northern America and Europe are over 70% wider, whereas prediction intervals for the populations of Africa, Asia, and the world as a whole are nearly unchanged. Out-of-sample validation shows that the model is reasonably well calibrated.

  6. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  7. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  8. Using KLSH to Rapidly Search Large Seismic Signal Archives on a Desktop Computer

    NASA Astrophysics Data System (ADS)

    Young, C. J.; Woodbridge, J.; Shaw, R.; Slinkard, M.

    2015-12-01

    The use of waveform correlation detection has become increasingly important in the last decade, and as the basic calculation is straightforward, and the online archives of past signals are ever-increasing, the use of technique should only become more widespread. Yet there is an inherent limitation in how widely the method can be applied due to the computational demands of searching large signal archives quickly. In this study, we investigate the applicability of Kernelized Locality-Sensitive Hashing (KLSH) to significantly decrease the computational requirements to the point that searches can be done on a commodity desktop computer. KLSH probabilistically interrogates the database such that much of the database is ignored when searching for closest matches, thereby dramatically reducing the number of correlations that need to be calculated. We evaluate KLSH using data from the IMS primary station MKAR. First we built a KLSH indexed archive using all associated signals from the IDC LEB catalog for 2002-2013 (~308,000 signals). We then tested the signal matching capability using the ~26,000 IDC-detected signals from 2014, including a variety of regional and teleseismic phases (56% are teleseismic P). We used the LEB phase assignments as ground-truth to score the results. Using a simple 0.60 correlation threshold, requiring at least two archive matches, and applying screening criteria based on consistency of metadata of archive matches, we were able to robustly identify 12% of the 2014 signals, including many teleseismic P phases from a variety of locations. Comparing KLSH against a full search, we established a recall rate of > 0.9, with search time on the order of 10 ms.

  9. Literature searches on Ayurveda: An update

    PubMed Central

    Aggithaya, Madhur G.; Narahari, Saravu R.

    2015-01-01

    Introduction: The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. Aims: To update on Ayurveda literature search and strategy to retrieve maximum publications. Methods: Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Results: Five among 46 databases are now relevant – AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. “The Researches in Ayurveda” and “Ayurvedic Research Database” (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. Conclusion: AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five

  10. Variable stars in the MACHO Collaboration database

    SciTech Connect

    Cook, K.H.; Alcock, C.; Allsman, R.A.

    1995-02-01

    The MACHO Collaboration`s search for baryonic dark matter via its gravitational microlensing signature has generated a massive database of time ordered photometry of millions Of stars in the LMC and the bulge of the Milky Way. The search`s experimental design and capabilities are reviewed and the dark matter results are briefly noted. Preliminary analysis of the {approximately} 39,000 variable stars discovered in the LMC database is presented and examples of periodic variables are shown. A class of aperiodically variable Be stars is described which is the closest background to microlensing which has been found. Plans for future work on variable stars using the MACHO data are described.

  11. Atomic Databases

    NASA Astrophysics Data System (ADS)

    Mendoza, Claudio

    2000-10-01

    Atomic and molecular data are required in a variety of fields ranging from the traditional astronomy, atmospherics and fusion research to fast growing technologies such as lasers, lighting, low-temperature plasmas, plasma assisted etching and radiotherapy. In this context, there are some research groups, both theoretical and experimental, scattered round the world that attend to most of this data demand, but the implementation of atomic databases has grown independently out of sheer necessity. In some cases the latter has been associated with the data production process or with data centers involved in data collection and evaluation; but sometimes it has been the result of individual initiatives that have been quite successful. In any case, the development and maintenance of atomic databases call for a number of skills and an entrepreneurial spirit that are not usually associated with most physics researchers. In the present report we present some of the highlights in this area in the past five years and discuss what we think are some of the main issues that have to be addressed.

  12. Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos.

    PubMed

    Demirkus, Meltem; Precup, Doina; Clark, James J; Arbel, Tal

    2016-06-01

    Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively.

  13. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.

  14. On Building a Search Interface Discovery System

    NASA Astrophysics Data System (ADS)

    Shestakov, Denis

    A huge portion of the Web known as the deep Web is accessible via search interfaces to myriads of databases on the Web. While relatively good approaches for querying the contents of web databases have been recently proposed, one cannot fully utilize them having most search interfaces unlocated. Thus, the automatic recognition of search interfaces to online databases is crucial for any application accessing the deep Web. This paper describes the architecture of the I-Crawler, a system for finding and classifying search interfaces. The I-Crawler is intentionally designed to be used in the deep web characterization surveys and for constructing directories of deep web resources.

  15. The Eruption Forecasting Information System (EFIS) database project

    NASA Astrophysics Data System (ADS)

    Ogburn, Sarah; Harpel, Chris; Pesicek, Jeremy; Wellik, Jay; Pallister, John; Wright, Heather

    2016-04-01

    The Eruption Forecasting Information System (EFIS) project is a new initiative of the U.S. Geological Survey-USAID Volcano Disaster Assistance Program (VDAP) with the goal of enhancing VDAP's ability to forecast the outcome of volcanic unrest. The EFIS project seeks to: (1) Move away from relying on the collective memory to probability estimation using databases (2) Create databases useful for pattern recognition and for answering common VDAP questions; e.g. how commonly does unrest lead to eruption? how commonly do phreatic eruptions portend magmatic eruptions and what is the range of antecedence times? (3) Create generic probabilistic event trees using global data for different volcano 'types' (4) Create background, volcano-specific, probabilistic event trees for frequently active or particularly hazardous volcanoes in advance of a crisis (5) Quantify and communicate uncertainty in probabilities A major component of the project is the global EFIS relational database, which contains multiple modules designed to aid in the construction of probabilistic event trees and to answer common questions that arise during volcanic crises. The primary module contains chronologies of volcanic unrest, including the timing of phreatic eruptions, column heights, eruptive products, etc. and will be initially populated using chronicles of eruptive activity from Alaskan volcanic eruptions in the GeoDIVA database (Cameron et al. 2013). This database module allows us to query across other global databases such as the WOVOdat database of monitoring data and the Smithsonian Institution's Global Volcanism Program (GVP) database of eruptive histories and volcano information. The EFIS database is in the early stages of development and population; thus, this contribution also serves as a request for feedback from the community.

  16. Subject Retrieval from Full-Text Databases in the Humanities

    ERIC Educational Resources Information Center

    East, John W.

    2007-01-01

    This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…

  17. Databases for Computer Science and Electronics: COMPENDEX, ELCOM, and INSPEC.

    ERIC Educational Resources Information Center

    Marsden, Tom; Laub, Barbara

    1981-01-01

    Describes the selection policies, subject access, search aids, indexing, coverage, and currency of three online databases in the fields of electronics and computer science: COMPENDEX, ELCOM, and INSPEC. Sample searches are displayed for each database. A bibliography cites five references. (FM)

  18. Robot-Generated Databases on the World Wide Web.

    ERIC Educational Resources Information Center

    Kimmel, Stacey

    1996-01-01

    Provides an overview of robots that retrieve World Wide Web documents and index data and then store it in a database. Nine robot-generated databases are described, including record content, services, search features, and sample search results; and sidebars discuss the controversy about Web robots and other resource discovery tools. (LRW)

  19. Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree.

    PubMed

    Carneiro, Gustavo; Georgescu, Bogdan; Good, Sara; Comaniciu, Dorin

    2008-09-01

    We propose a novel method for the automatic detection and measurement of fetal anatomical structures in ultrasound images. This problem offers a myriad of challenges, including: difficulty of modeling the appearance variations of the visual object of interest, robustness to speckle noise and signal dropout, and large search space of the detection procedure. Previous solutions typically rely on the explicit encoding of prior knowledge and formulation of the problem as a perceptual grouping task solved through clustering or variational approaches. These methods are constrained by the validity of the underlying assumptions and usually are not enough to capture the complex appearances of fetal anatomies. We propose a novel system for fast automatic detection and measurement of fetal anatomies that directly exploits a large database of expert annotated fetal anatomical structures in ultrasound images. Our method learns automatically to distinguish between the appearance of the object of interest and background by training a constrained probabilistic boosting tree classifier. This system is able to produce the automatic segmentation of several fetal anatomies using the same basic detection algorithm. We show results on fully automatic measurement of biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC), femur length (FL), humerus length (HL), and crown rump length (CRL). Notice that our approach is the first in the literature to deal with the HL and CRL measurements. Extensive experiments (with clinical validation) show that our system is, on average, close to the accuracy of experts in terms of segmentation and obstetric measurements. Finally, this system runs under half second on a standard dual-core PC computer.

  20. Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

    ERIC Educational Resources Information Center

    Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

    2010-01-01

    Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…

  1. Non-unitary probabilistic quantum computing

    NASA Technical Reports Server (NTRS)

    Gingrich, Robert M.; Williams, Colin P.

    2004-01-01

    We present a method for designing quantum circuits that perform non-unitary quantum computations on n-qubit states probabilistically, and give analytic expressions for the success probability and fidelity.

  2. Probabilistic micromechanics for high-temperature composites

    NASA Technical Reports Server (NTRS)

    Reddy, J. N.

    1993-01-01

    The three-year program of research had the following technical objectives: the development of probabilistic methods for micromechanics-based constitutive and failure models, application of the probabilistic methodology in the evaluation of various composite materials and simulation of expected uncertainties in unidirectional fiber composite properties, and influence of the uncertainties in composite properties on the structural response. The first year of research was devoted to the development of probabilistic methodology for micromechanics models. The second year of research focused on the evaluation of the Chamis-Hopkins constitutive model and Aboudi constitutive model using the methodology developed in the first year of research. The third year of research was devoted to the development of probabilistic finite element analysis procedures for laminated composite plate and shell structures.

  3. Overview of selected molecular biological databases

    SciTech Connect

    Rayl, K.D.; Gaasterland, T.

    1994-11-01

    This paper presents an overview of the purpose, content, and design of a subset of the currently available biological databases, with an emphasis on protein databases. Databases included in this summary are 3D-ALI, Berlin RNA databank, Blocks, DSSP, EMBL Nucleotide Database, EMP, ENZYME, FSSP, GDB, GenBank, HSSP, LiMB, PDB, PIR, PKCDD, ProSite, and SWISS-PROT. The goal is to provide a starting point for researchers who wish to take advantage of the myriad available databases. Rather than providing a complete explanation of each database, we present its content and form by explaining the details of typical entries. Pointers to more complete ``user guides`` are included, along with general information on where to search for a new database.

  4. Multidimensional analysis and probabilistic model of volcanic and seismic activities

    NASA Astrophysics Data System (ADS)

    Fedorov, V.

    2009-04-01

    .I. Gushchenko, 1979) and seismological (database of USGS/NEIC Significant Worldwide Earthquakes, 2150 B.C.- 1994 A.D.) information which displays dynamics of endogenic relief-forming processes over a period of 1900 to 1994. In the course of the analysis, a substitution of calendar variable by a corresponding astronomical one has been performed and the epoch superposition method was applied. In essence, the method consists in that the massifs of information on volcanic eruptions (over a period of 1900 to 1977) and seismic events (1900-1994) are differentiated with respect to value of astronomical parameters which correspond to the calendar dates of the known eruptions and earthquakes, regardless of the calendar year. The obtained spectra of volcanic eruptions and violent earthquake distribution in the fields of the Earth orbital movement parameters were used as a basis for calculation of frequency spectra and diurnal probability of volcanic and seismic activity. The objective of the proposed investigations is a probabilistic model development of the volcanic and seismic events, as well as GIS designing for monitoring and forecast of volcanic and seismic activities. In accordance with the stated objective, three probability parameters have been found in the course of preliminary studies; they form the basis for GIS-monitoring and forecast development. 1. A multidimensional analysis of volcanic eruption and earthquakes (of magnitude 7) have been performed in terms of the Earth orbital movement. Probability characteristics of volcanism and seismicity have been defined for the Earth as a whole. Time intervals have been identified with a diurnal probability twice as great as the mean value. Diurnal probability of volcanic and seismic events has been calculated up to 2020. 2. A regularity is found in duration of dormant (repose) periods has been established. A relationship has been found between the distribution of the repose period probability density and duration of the period. 3

  5. Probabilistic Assessment of Radiation Risk for Astronauts in Space Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee; DeAngelis, Giovanni; Cucinotta, Francis A.

    2009-01-01

    Accurate predictions of the health risks to astronauts from space radiation exposure are necessary for enabling future lunar and Mars missions. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons, (less than 100 MeV); and galactic cosmic rays (GCR), which include protons and heavy ions of higher energies. While the expected frequency of SPEs is strongly influenced by the solar activity cycle, SPE occurrences themselves are random in nature. A solar modulation model has been developed for the temporal characterization of the GCR environment, which is represented by the deceleration potential, phi. The risk of radiation exposure from SPEs during extra-vehicular activities (EVAs) or in lightly shielded vehicles is a major concern for radiation protection, including determining the shielding and operational requirements for astronauts and hardware. To support the probabilistic risk assessment for EVAs, which would be up to 15% of crew time on lunar missions, we estimated the probability of SPE occurrence as a function of time within a solar cycle using a nonhomogeneous Poisson model to fit the historical database of measurements of protons with energy > 30 MeV, (phi)30. The resultant organ doses and dose equivalents, as well as effective whole body doses for acute and cancer risk estimations are analyzed for a conceptual habitat module and a lunar rover during defined space mission periods. This probabilistic approach to radiation risk assessment from SPE and GCR is in support of mission design and operational planning to manage radiation risks for space exploration.

  6. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  7. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  8. Probabilistic regularization in inverse optical imaging.

    PubMed

    De Micheli, E; Viano, G A

    2000-11-01

    The problem of object restoration in the case of spatially incoherent illumination is considered. A regularized solution to the inverse problem is obtained through a probabilistic approach, and a numerical algorithm based on the statistical analysis of the noisy data is presented. Particular emphasis is placed on the question of the positivity constraint, which is incorporated into the probabilistically regularized solution by means of a quadratic programming technique. Numerical examples illustrating the main steps of the algorithm are also given.

  9. Probabilistic Approaches for Evaluating Space Shuttle Risks

    NASA Technical Reports Server (NTRS)

    Vesely, William

    2001-01-01

    The objectives of the Space Shuttle PRA (Probabilistic Risk Assessment) are to: (1) evaluate mission risks; (2) evaluate uncertainties and sensitivities; (3) prioritize contributors; (4) evaluate upgrades; (5) track risks; and (6) provide decision tools. This report discusses the significance of a Space Shuttle PRA and its participants. The elements and type of losses to be included are discussed. The program and probabilistic approaches are then discussed.

  10. Probabilistic cloning of three symmetric states

    SciTech Connect

    Jimenez, O.; Bergou, J.; Delgado, A.

    2010-12-15

    We study the probabilistic cloning of three symmetric states. These states are defined by a single complex quantity, the inner product among them. We show that three different probabilistic cloning machines are necessary to optimally clone all possible families of three symmetric states. We also show that the optimal cloning probability of generating M copies out of one original can be cast as the quotient between the success probability of unambiguously discriminating one and M copies of symmetric states.

  11. Parallel and Distributed Systems for Probabilistic Reasoning

    DTIC Science & Technology

    2012-12-01

    High-Level Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8 Future Work 156 8.1 Scalable Online Probabilistic Reasoning...this chapter can be obtained from our online repository at http://gonzalezlabs/thesis. 3.1 Belief Propagation A core operation in probabilistic...models is not strictly novel. In the setting of online inference in Russell and Norvig [1995] used the notion of Fixed Lag Smoothing to eliminate the

  12. Integrating Variances into an Analytical Database

    NASA Technical Reports Server (NTRS)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  13. Policy implications for familial searching.

    PubMed

    Kim, Joyce; Mammo, Danny; Siegel, Marni B; Katsanis, Sara H

    2011-11-01

    In the United States, several states have made policy decisions regarding whether and how to use familial searching of the Combined DNA Index System (CODIS) database in criminal investigations. Familial searching pushes DNA typing beyond merely identifying individuals to detecting genetic relatedness, an application previously reserved for missing persons identifications and custody battles. The intentional search of CODIS for partial matches to an item of evidence offers law enforcement agencies a powerful tool for developing investigative leads, apprehending criminals, revitalizing cold cases and exonerating wrongfully convicted individuals. As familial searching involves a range of logistical, social, ethical and legal considerations, states are now grappling with policy options for implementing familial searching to balance crime fighting with its potential impact on society. When developing policies for familial searching, legislators should take into account the impact of familial searching on select populations and the need to minimize personal intrusion on relatives of individuals in the DNA database. This review describes the approaches used to narrow a suspect pool from a partial match search of CODIS and summarizes the economic, ethical, logistical and political challenges of implementing familial searching. We examine particular US state policies and the policy options adopted to address these issues. The aim of this review is to provide objective background information on the controversial approach of familial searching to inform policy decisions in this area. Herein we highlight key policy options and recommendations regarding effective utilization of familial searching that minimize harm to and afford maximum protection of US citizens.

  14. Policy implications for familial searching

    PubMed Central

    2011-01-01

    In the United States, several states have made policy decisions regarding whether and how to use familial searching of the Combined DNA Index System (CODIS) database in criminal investigations. Familial searching pushes DNA typing beyond merely identifying individuals to detecting genetic relatedness, an application previously reserved for missing persons identifications and custody battles. The intentional search of CODIS for partial matches to an item of evidence offers law enforcement agencies a powerful tool for developing investigative leads, apprehending criminals, revitalizing cold cases and exonerating wrongfully convicted individuals. As familial searching involves a range of logistical, social, ethical and legal considerations, states are now grappling with policy options for implementing familial searching to balance crime fighting with its potential impact on society. When developing policies for familial searching, legislators should take into account the impact of familial searching on select populations and the need to minimize personal intrusion on relatives of individuals in the DNA database. This review describes the approaches used to narrow a suspect pool from a partial match search of CODIS and summarizes the economic, ethical, logistical and political challenges of implementing familial searching. We examine particular US state policies and the policy options adopted to address these issues. The aim of this review is to provide objective background information on the controversial approach of familial searching to inform policy decisions in this area. Herein we highlight key policy options and recommendations regarding effective utilization of familial searching that minimize harm to and afford maximum protection of US citizens. PMID:22040348

  15. Using the TIGR gene index databases for biological discovery.

    PubMed

    Lee, Yuandan; Quackenbush, John

    2003-11-01

    The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

  16. Probabilistic Prediction of Lifetimes of Ceramic Parts

    NASA Technical Reports Server (NTRS)

    Nemeth, Noel N.; Gyekenyesi, John P.; Jadaan, Osama M.; Palfi, Tamas; Powers, Lynn; Reh, Stefan; Baker, Eric H.

    2006-01-01

    ANSYS/CARES/PDS is a software system that combines the ANSYS Probabilistic Design System (PDS) software with a modified version of the Ceramics Analysis and Reliability Evaluation of Structures Life (CARES/Life) Version 6.0 software. [A prior version of CARES/Life was reported in Program for Evaluation of Reliability of Ceramic Parts (LEW-16018), NASA Tech Briefs, Vol. 20, No. 3 (March 1996), page 28.] CARES/Life models effects of stochastic strength, slow crack growth, and stress distribution on the overall reliability of a ceramic component. The essence of the enhancement in CARES/Life 6.0 is the capability to predict the probability of failure using results from transient finite-element analysis. ANSYS PDS models the effects of uncertainty in material properties, dimensions, and loading on the stress distribution and deformation. ANSYS/CARES/PDS accounts for the effects of probabilistic strength, probabilistic loads, probabilistic material properties, and probabilistic tolerances on the lifetime and reliability of the component. Even failure probability becomes a stochastic quantity that can be tracked as a response variable. ANSYS/CARES/PDS enables tracking of all stochastic quantities in the design space, thereby enabling more precise probabilistic prediction of lifetimes of ceramic components.

  17. Probabilistic simulation of uncertainties in thermal structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.; Shiao, Michael

    1990-01-01

    Development of probabilistic structural analysis methods for hot structures is a major activity at Lewis Research Center. It consists of five program elements: (1) probabilistic loads; (2) probabilistic finite element analysis; (3) probabilistic material behavior; (4) assessment of reliability and risk; and (5) probabilistic structural performance evaluation. Recent progress includes: (1) quantification of the effects of uncertainties for several variables on high pressure fuel turbopump (HPFT) blade temperature, pressure, and torque of the Space Shuttle Main Engine (SSME); (2) the evaluation of the cumulative distribution function for various structural response variables based on assumed uncertainties in primitive structural variables; (3) evaluation of the failure probability; (4) reliability and risk-cost assessment, and (5) an outline of an emerging approach for eventual hot structures certification. Collectively, the results demonstrate that the structural durability/reliability of hot structural components can be effectively evaluated in a formal probabilistic framework. In addition, the approach can be readily extended to computationally simulate certification of hot structures for aerospace environments.

  18. Sex determination using the Probabilistic Sex Diagnosis (DSP: Diagnose Sexuelle Probabiliste) tool in a virtual environment.

    PubMed

    Chapman, Tara; Lefevre, Philippe; Semal, Patrick; Moiseev, Fedor; Sholukha, Victor; Louryan, Stéphane; Rooze, Marcel; Van Sint Jan, Serge

    2014-01-01

    The hip bone is one of the most reliable indicators of sex in the human body due to the fact it is the most dimorphic bone. Probabilistic Sex Diagnosis (DSP: Diagnose Sexuelle Probabiliste) developed by Murail et al., in 2005, is a sex determination method based on a worldwide hip bone metrical database. Sex is determined by comparing specific measurements taken from each specimen using sliding callipers and computing the probability of specimens being female or male. In forensic science it is sometimes not possible to sex a body due to corpse decay or injury. Skeletalization and dissection of a body is a laborious process and desecrates the body. There were two aims to this study. The first aim was to examine the accuracy of the DSP method in comparison with a current visual sexing method on sex determination. A further aim was to see if it was possible to virtually utilise the DSP method on both the hip bone and the pelvic girdle in order to utilise this method for forensic sciences. For the first part of the study, forty-nine dry hip bones of unknown sex were obtained from the Body Donation Programme of the Université Libre de Bruxelles (ULB). A comparison was made between DSP analysis and visual sexing on dry bone by two researchers. CT scans of bones were then analysed to obtain three-dimensional (3D) virtual models and the method of DSP was analysed virtually by importing the models into a customised software programme called lhpFusionBox which was developed at ULB. The software enables DSP distances to be measured via virtually-palpated bony landmarks. There was found to be 100% agreement of sex between the manual and virtual DSP method. The second part of the study aimed to further validate the method by analysing thirty-nine supplementary pelvic girdles of known sex blind. There was found to be a 100% accuracy rate further demonstrating that the virtual DSP method is robust. Statistically significant differences were found in the identification of sex

  19. A probabilistic safety analysis of incidents in nuclear research reactors.

    PubMed

    Lopes, Valdir Maciel; Agostinho Angelo Sordi, Gian Maria; Moralles, Mauricio; Filho, Tufic Madi

    2012-06-01

    This work aims to evaluate the potential risks of incidents in nuclear research reactors. For its development, two databases of the International Atomic Energy Agency (IAEA) were used: the Research Reactor Data Base (RRDB) and the Incident Report System for Research Reactor (IRSRR). For this study, the probabilistic safety analysis (PSA) was used. To obtain the result of the probability calculations for PSA, the theory and equations in the paper IAEA TECDOC-636 were used. A specific program to analyse the probabilities was developed within the main program, Scilab 5.1.1. for two distributions, Fischer and chi-square, both with the confidence level of 90 %. Using Sordi equations, the maximum admissible doses to compare with the risk limits established by the International Commission on Radiological Protection (ICRP) were obtained. All results achieved with this probability analysis led to the conclusion that the incidents which occurred had radiation doses within the stochastic effects reference interval established by the ICRP-64.

  20. De novo protein conformational sampling using a probabilistic graphical model

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Debswapna; Cheng, Jianlin

    2015-11-01

    Efficient exploration of protein conformational space remains challenging especially for large proteins when assembling discretized structural fragments extracted from a protein structure data database. We propose a fragment-free probabilistic graphical model, FUSION, for conformational sampling in continuous space and assess its accuracy using ‘blind’ protein targets with a length up to 250 residues from the CASP11 structure prediction exercise. The method reduces sampling bottlenecks, exhibits strong convergence, and demonstrates better performance than the popular fragment assembly method, ROSETTA, on relatively larger proteins with a length of more than 150 residues in our benchmark set. FUSION is freely available through a web server at http://protein.rnet.missouri.edu/FUSION/.

  1. Robust Control Design for Systems With Probabilistic Uncertainty

    NASA Technical Reports Server (NTRS)

    Crespo, Luis G.; Kenny, Sean P.

    2005-01-01

    This paper presents a reliability- and robustness-based formulation for robust control synthesis for systems with probabilistic uncertainty. In a reliability-based formulation, the probability of violating design requirements prescribed by inequality constraints is minimized. In a robustness-based formulation, a metric which measures the tendency of a random variable/process to cluster close to a target scalar/function is minimized. A multi-objective optimization procedure, which combines stability and performance requirements in time and frequency domains, is used to search for robustly optimal compensators. Some of the fundamental differences between the proposed strategy and conventional robust control methods are: (i) unnecessary conservatism is eliminated since there is not need for convex supports, (ii) the most likely plants are favored during synthesis allowing for probabilistic robust optimality, (iii) the tradeoff between robust stability and robust performance can be explored numerically, (iv) the uncertainty set is closely related to parameters with clear physical meaning, and (v) compensators with improved robust characteristics for a given control structure can be synthesized.

  2. Medical literature search dot com.

    PubMed

    Jain, Vivek; Raut, Deepak K

    2011-01-01

    The Internet provides a quick access to a plethora of the medical literature, in the form of journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualized, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Commercial web resources (Medscape, MedConnect, MedicineNet) add to the list of resource databases providing some of their content for open access. Several web-libraries (Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. Availability of specific dermatology-related websites (DermIs, DermNet, and Genamics Jornalseek) is useful addition to the ever growing list of web-based resources. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature and levels of detail available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine.

  3. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  4. Search Cloud

    MedlinePlus

    ... this page: https://medlineplus.gov/cloud.html Search Cloud To use the sharing features on this page, ... chest pa and lateral Share the MedlinePlus search cloud with your users by embedding our search cloud ...

  5. Development and use of a train-level probabilistic risk assessment

    SciTech Connect

    Smith, C.L.; Fowler, R.D.; Wolfram, L.M. )

    1993-04-01

    The Idaho National Engineering Laboratory examined the potential for the development of train-level probabilistic risk assessment (PRA) databases. These train-level databases will allow the Nuclear Regulatory Commission to investigate effects on plant core damage frequency (CDF) given a train is failed or taken out of service. The intent of this task was to develop user-friendly databases that required a minimalamount of personnel involvement to be usable. It was originally intended that the train-level models would not be expanded to include basic events below the top gate of a train, with the possible exception of including some of the major train-related components (e.g., important pumps and motor-operated valves). It was found that a database similar to the original plant PRA provided the accuracy needed to measure the changes in plant CDF. The Peach Bottom Unit 2 NUREG-1150 PRA (a large fault tree model) and the Beaver Valley Unit 2 IPE (a large event tree model) were selected to demonstrate the feasibility of developing train-level databases. Five different methods for developing train-level databases were hypothesized and are examined. Ultimately, two train-level databases were developed using the Peach Bottom Unit 2 PRA and onetrain-level database was developed using the Beaver Valley Unit 2 IPE. The development, use, limitations, and results of these train-level databases are discussed.

  6. Probabilistic risk assessment familiarization training

    SciTech Connect

    Phillabaum, J.L.

    1989-01-01

    Philadelphia Electric Company (PECo) created a Nuclear Group Risk and Reliability Assessment Program Plan in order to focus the utilization of probabilistic risk assessment (PRA) in support of Limerick Generating Station and Peach Bottom Atomic Power Station. The continuation of a PRA program was committed by PECo to the U.S. Nuclear Regulatory Commission (NRC) prior to be the issuance of an operating license for Limerick Unit 1. It is believed that increased use of PRA techniques to support activities at Limerick and Peach Bottom will enhance PECo's overall nuclear excellence. Training for familiarization with PRA is designed for attendance once by all nuclear group personnel to understand PRA and its potential effect on their jobs. The training content describes the history of PRA and how it applies to PECo's nuclear activities. Key PRA concepts serve as the foundation for the familiarization training. These key concepts are covered in all classes to facilitate an appreciation of the remaining material, which is tailored to the audience. Some of the concepts covered are comparison of regulatory philosophy to PRA techniques, fundamentals of risk/success, risk equation/risk summation, and fault trees and event trees. Building on the concepts, PRA insights and applications are then described that are tailored to the audience.

  7. Probabilistic elastography: estimating lung elasticity.

    PubMed

    Risholm, Petter; Ross, James; Washko, George R; Wells, William M

    2011-01-01

    We formulate registration-based elastography in a probabilistic framework and apply it to study lung elasticity in the presence of emphysematous and fibrotic tissue. The elasticity calculations are based on a Finite Element discretization of a linear elastic biomechanical model. We marginalize over the boundary conditions (deformation) of the biomechanical model to determine the posterior distribution over elasticity parameters. Image similarity is included in the likelihood, an elastic prior is included to constrain the boundary conditions, while a Markov model is used to spatially smooth the inhomogeneous elasticity. We use a Markov Chain Monte Carlo (MCMC) technique to characterize the posterior distribution over elasticity from which we extract the most probable elasticity as well as the uncertainty of this estimate. Even though registration-based lung elastography with inhomogeneous elasticity is challenging due the problem's highly underdetermined nature and the sparse image information available in lung CT, we show promising preliminary results on estimating lung elasticity contrast in the presence of emphysematous and fibrotic tissue.

  8. Probabilistic modeling of children's handwriting

    NASA Astrophysics Data System (ADS)

    Puri, Mukta; Srihari, Sargur N.; Hanson, Lisa

    2013-12-01

    There is little work done in the analysis of children's handwriting, which can be useful in developing automatic evaluation systems and in quantifying handwriting individuality. We consider the statistical analysis of children's handwriting in early grades. Samples of handwriting of children in Grades 2-4 who were taught the Zaner-Bloser style were considered. The commonly occurring word "and" written in cursive style as well as hand-print were extracted from extended writing. The samples were assigned feature values by human examiners using a truthing tool. The human examiners looked at how the children constructed letter formations in their writing, looking for similarities and differences from the instructions taught in the handwriting copy book. These similarities and differences were measured using a feature space distance measure. Results indicate that the handwriting develops towards more conformity with the class characteristics of the Zaner-Bloser copybook which, with practice, is the expected result. Bayesian networks were learnt from the data to enable answering various probabilistic queries, such as determining students who may continue to produce letter formations as taught during lessons in school and determining the students who will develop a different and/or variation of the those letter formations and the number of different types of letter formations.

  9. Optimal probabilistic dense coding schemes

    NASA Astrophysics Data System (ADS)

    Kögler, Roger A.; Neves, Leonardo

    2017-04-01

    Dense coding with non-maximally entangled states has been investigated in many different scenarios. We revisit this problem for protocols adopting the standard encoding scheme. In this case, the set of possible classical messages cannot be perfectly distinguished due to the non-orthogonality of the quantum states carrying them. So far, the decoding process has been approached in two ways: (i) The message is always inferred, but with an associated (minimum) error; (ii) the message is inferred without error, but only sometimes; in case of failure, nothing else is done. Here, we generalize on these approaches and propose novel optimal probabilistic decoding schemes. The first uses quantum-state separation to increase the distinguishability of the messages with an optimal success probability. This scheme is shown to include (i) and (ii) as special cases and continuously interpolate between them, which enables the decoder to trade-off between the level of confidence desired to identify the received messages and the success probability for doing so. The second scheme, called multistage decoding, applies only for qudits ( d-level quantum systems with d>2) and consists of further attempts in the state identification process in case of failure in the first one. We show that this scheme is advantageous over (ii) as it increases the mutual information between the sender and receiver.

  10. Probabilistic description of traffic flow

    NASA Astrophysics Data System (ADS)

    Mahnke, R.; Kaupužs, J.; Lubashevsky, I.

    2005-03-01

    A stochastic description of traffic flow, called probabilistic traffic flow theory, is developed. The general master equation is applied to relatively simple models to describe the formation and dissolution of traffic congestions. Our approach is mainly based on spatially homogeneous systems like periodically closed circular rings without on- and off-ramps. We consider a stochastic one-step process of growth or shrinkage of a car cluster (jam). As generalization we discuss the coexistence of several car clusters of different sizes. The basic problem is to find a physically motivated ansatz for the transition rates of the attachment and detachment of individual cars to a car cluster consistent with the empirical observations in real traffic. The emphasis is put on the analogy with first-order phase transitions and nucleation phenomena in physical systems like supersaturated vapour. The results are summarized in the flux-density relation, the so-called fundamental diagram of traffic flow, and compared with empirical data. Different regimes of traffic flow are discussed: free flow, congested mode as stop-and-go regime, and heavy viscous traffic. The traffic breakdown is studied based on the master equation as well as the Fokker-Planck approximation to calculate mean first passage times or escape rates. Generalizations are developed to allow for on-ramp effects. The calculated flux-density relation and characteristic breakdown times coincide with empirical data measured on highways. Finally, a brief summary of the stochastic cellular automata approach is given.

  11. Symbolic representation of probabilistic worlds.

    PubMed

    Feldman, Jacob

    2012-04-01

    Symbolic representation of environmental variables is a ubiquitous and often debated component of cognitive science. Yet notwithstanding centuries of philosophical discussion, the efficacy, scope, and validity of such representation has rarely been given direct consideration from a mathematical point of view. This paper introduces a quantitative measure of the effectiveness of symbolic representation, and develops formal constraints under which such representation is in fact warranted. The effectiveness of symbolic representation hinges on the probabilistic structure of the environment that is to be represented. For arbitrary probability distributions (i.e., environments), symbolic representation is generally not warranted. But in modal environments, defined here as those that consist of mixtures of component distributions that are narrow ("spiky") relative to their spreads, symbolic representation can be shown to represent the environment with a relatively negligible loss of information. Modal environments support propositional forms, logical relations, and other familiar features of symbolic representation. Hence the assumption that our environment is, in fact, modal is a key tacit assumption underlying the use of symbols in cognitive science.

  12. Dynamical systems probabilistic risk assessment

    SciTech Connect

    Denman, Matthew R.; Ames, Arlo Leroy

    2014-03-01

    Probabilistic Risk Assessment (PRA) is the primary tool used to risk-inform nuclear power regulatory and licensing activities. Risk-informed regulations are intended to reduce inherent conservatism in regulatory metrics (e.g., allowable operating conditions and technical specifications) which are built into the regulatory framework by quantifying both the total risk profile as well as the change in the risk profile caused by an event or action (e.g., in-service inspection procedures or power uprates). Dynamical Systems (DS) analysis has been used to understand unintended time-dependent feedbacks in both industrial and organizational settings. In dynamical systems analysis, feedback loops can be characterized and studied as a function of time to describe the changes to the reliability of plant Structures, Systems and Components (SSCs). While DS has been used in many subject areas, some even within the PRA community, it has not been applied toward creating long-time horizon, dynamic PRAs (with time scales ranging between days and decades depending upon the analysis). Understanding slowly developing dynamic effects, such as wear-out, on SSC reliabilities may be instrumental in ensuring a safely and reliably operating nuclear fleet. Improving the estimation of a plant's continuously changing risk profile will allow for more meaningful risk insights, greater stakeholder confidence in risk insights, and increased operational flexibility.

  13. Database Management Systems: New Homes for Migrating Bibliographic Records.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.; Bierbaum, Esther G.

    1987-01-01

    Assesses bibliographic databases as part of visionary text systems such as hypertext and scholars' workstations. Downloading is discussed in terms of the capability to search records and to maintain unique bibliographic descriptions, and relational database management systems, file managers, and text databases are reviewed as possible hosts for…

  14. The Cystic Fibrosis Database: Content and Research Opportunities.

    ERIC Educational Resources Information Center

    Shaw, William M., Jr.; And Others

    1991-01-01

    Describes the files contained in the Cystic Fibrosis (CF) database and discusses educational and research opportunities using this database. Topics discussed include queries, evaluating the relevance of items retrieved, and use of the database in an online searching course in the School of Information and Library Science at the University of North…

  15. A survey of scholarly literature databases for clinical laboratory science.

    PubMed

    O'Malley, Donna L

    2008-01-01

    This article reviews the use of journal literature databases including CINAHL, EMBASE, and Web of Science; summarizing databases including Cochrane Database of Systematic Reviews, online textbooks, and clinical decision-support tools; and the Internet search engines Google and Google Scholar. The series closes with a practical example employing a cross-section of the knowledge and skills gained from all three articles.

  16. A Longitudinal Study of Database-Assisted Problem Solving.

    ERIC Educational Resources Information Center

    Wildemuth, Barbara M.; Friedman, Charles P.; Keyes, John; Downs, Stephen M.

    2000-01-01

    Examines the effects of database assistance on clinical problem solving across three cohorts of medical students and two database interfaces. Discusses the relationship between personal domain knowledge and problem solving, personal domain knowledge and database searching, and comparisons of different interface styles in information retrieval…

  17. The Majorana Parts Tracking Database

    SciTech Connect

    Abgrall, N.

    2015-01-16

    The Majorana Demonstrator is an ultra-low background physics experiment searching for the neutrinoless double beta decay of 76Ge. The Majorana Parts Tracking Database is used to record the history of components used in the construction of the Demonstrator. The tracking implementation takes a novel approach based on the schema-free database technology CouchDB. Transportation, storage, and processes undergone by parts such as machining or cleaning are linked to part records. Tracking parts provides a great logistics benefit and an important quality assurance reference during construction. In addition, the location history of parts provides an estimate of their exposure to cosmic radiation. In summary, a web application for data entry and a radiation exposure calculator have been developed as tools for achieving the extreme radio-purity required for this rare decay search.

  18. The Majorana Parts Tracking Database

    DOE PAGES

    Abgrall, N.

    2015-01-16

    The Majorana Demonstrator is an ultra-low background physics experiment searching for the neutrinoless double beta decay of 76Ge. The Majorana Parts Tracking Database is used to record the history of components used in the construction of the Demonstrator. The tracking implementation takes a novel approach based on the schema-free database technology CouchDB. Transportation, storage, and processes undergone by parts such as machining or cleaning are linked to part records. Tracking parts provides a great logistics benefit and an important quality assurance reference during construction. In addition, the location history of parts provides an estimate of their exposure to cosmic radiation.more » In summary, a web application for data entry and a radiation exposure calculator have been developed as tools for achieving the extreme radio-purity required for this rare decay search.« less

  19. An Enhanced Artificial Bee Colony Algorithm with Solution Acceptance Rule and Probabilistic Multisearch.

    PubMed

    Yurtkuran, Alkın; Emel, Erdal

    2016-01-01

    The artificial bee colony (ABC) algorithm is a popular swarm based technique, which is inspired from the intelligent foraging behavior of honeybee swarms. This paper proposes a new variant of ABC algorithm, namely, enhanced ABC with solution acceptance rule and probabilistic multisearch (ABC-SA) to address global optimization problems. A new solution acceptance rule is proposed where, instead of greedy selection between old solution and new candidate solution, worse candidate solutions have a probability to be accepted. Additionally, the acceptance probability of worse candidates is nonlinearly decreased throughout the search process adaptively. Moreover, in order to improve the performance of the ABC and balance the intensification and diversification, a probabilistic multisearch strategy is presented. Three different search equations with distinctive characters are employed using predetermined search probabilities. By implementing a new solution acceptance rule and a probabilistic multisearch approach, the intensification and diversification performance of the ABC algorithm is improved. The proposed algorithm has been tested on well-known benchmark functions of varying dimensions by comparing against novel ABC variants, as well as several recent state-of-the-art algorithms. Computational results show that the proposed ABC-SA outperforms other ABC variants and is superior to state-of-the-art algorithms proposed in the literature.

  20. A probabilistic coevolutionary biclustering algorithm for discovering coherent patterns in gene expression dataset

    PubMed Central

    2012-01-01

    Background Biclustering has been utilized to find functionally important patterns in biological problem. Here a bicluster is a submatrix that consists of a subset of rows and a subset of columns in a matrix, and contains homogeneous patterns. The problem of finding biclusters is still challengeable due to computational complex trying to capture patterns from two-dimensional features. Results We propose a Probabilistic COevolutionary Biclustering Algorithm (PCOBA) that can cluster the rows and columns in a matrix simultaneously by utilizing a dynamic adaptation of multiple species and adopting probabilistic learning. In biclustering problems, a coevolutionary search is suitable since it can optimize interdependent subcomponents formed of rows and columns. Furthermore, acquiring statistical information on two populations using probabilistic learning can improve the ability of search towards the optimum value. We evaluated the performance of PCOBA on synthetic dataset and yeast expression profiles. The results demonstrated that PCOBA outperformed previous evolutionary computation methods as well as other biclustering methods. Conclusions Our approach for searching particular biological patterns could be valuable for systematically understanding functional relationships between genes and other biological components at a genome-wide level. PMID:23282075

  1. Multitasking Information Seeking and Searching Processes.

    ERIC Educational Resources Information Center

    Spink, Amanda; Ozmutlu, H. Cenk; Ozmutlu, Seda

    2002-01-01

    Presents findings from four studies of the prevalence of multitasking information seeking and searching by Web (via the Excite search engine), information retrieval system (mediated online database searching), and academic library users. Highlights include human information coordinating behavior (HICB); and implications for models of information…

  2. Precision and Recall in Title Keyword Searches.

    ERIC Educational Resources Information Center

    McJunkin, Monica Cahill

    This study examines precision and recall for title and keyword searches performed in the "FirstSearch" WorldCat database when keywords are used with and without adjacency of terms specified. A random sample of 68 titles in economics were searched in the OCLC (Online Computer Library Center) Online Union Catalog in order to obtain their…

  3. Searching Chemical Abstracts Online in Undergraduate Chemistry.

    ERIC Educational Resources Information Center

    Krumpolc, Miroslav; And Others

    1987-01-01

    Discusses the advantages of conducting online computer searches of "Chemical Abstracts." Introduces the logical sequences involved in searching an online database. Explains Boolean logic, proximity operators, truncation, searchable fields, and command language, as they relate to the use of online searches in undergraduate chemistry…

  4. Integration of Evidence Base into a Probabilistic Risk Assessment

    NASA Technical Reports Server (NTRS)

    Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list

  5. The new international GLE database

    NASA Astrophysics Data System (ADS)

    Duldig, M. L.; Watts, D. J.

    2001-08-01

    The Australian Antarctic Division has agreed to host the international GLE database. Access to the database is via a world-wide-web interface and initially covers all GLEs since the start of the 22nd solar cycle. Access restriction for recent events is controlled by password protection and these data are available only to those groups contributing data to the database. The restrictions to data will be automatically removed for events older than 2 years, in accordance with the data exchange provisions of the Antarctic Treaty. Use of the data requires acknowledgment of the database as the source of the data and acknowledgment of the specific groups that provided the data used. Furthermore, some groups that provide data to the database have specific acknowledgment requirements or wording. A new submission format has been developed that will allow easier exchange of data, although the old format will be acceptable for some time. Data download options include direct web based download and email. Data may also be viewed as listings or plots with web browsers. Search options have also been incorporated. Development of the database will be ongoing with extension to viewing and delivery options, addition of earlier data and the development of mirror sites. It is expected that two mirror sites, one in North America and one in Europe, will be developed to enable fast access for the whole cosmic ray community.

  6. Rice Glycosyltransferase (GT) Phylogenomic Database

    DOE Data Explorer

    Ronald, Pamela

    The Ronald Laboratory staff at the University of California-Davis has a primary research focus on the genes of the rice plant. They study the role that genetics plays in the way rice plants respond to their environment. They created the Rice GT Database in order to integrate functional genomic information for putative rice Glycosyltransferases (GTs). This database contains information on nearly 800 putative rice GTs (gene models) identified by sequence similarity searches based on the Carbohydrate Active enZymes (CAZy) database. The Rice GT Database provides a platform to display user-selected functional genomic data on a phylogenetic tree. This includes sequence information, mutant line information, expression data, etc. An interactive chromosomal map shows the position of all rice GTs, and links to rice annotation databases are included. The format is intended to "facilitate the comparison of closely related GTs within different families, as well as perform global comparisons between sets of related families." [From http://ricephylogenomics.ucdavis.edu/cellwalls/gt/genInfo.shtml] See also the primary paper discussing this work: Peijian Cao, Laura E. Bartley, Ki-Hong Jung and Pamela C. Ronalda. Construction of a Rice Glycosyltransferase Phylogenomic Database and Identification of Rice-Diverged Glycosyltransferases. Molecular Plant, 2008, 1(5): 858-877.

  7. Probabilistic Methods for Uncertainty Propagation Applied to Aircraft Design

    NASA Technical Reports Server (NTRS)

    Green, Lawrence L.; Lin, Hong-Zong; Khalessi, Mohammad R.

    2002-01-01

    Three methods of probabilistic uncertainty propagation and quantification (the method of moments, Monte Carlo simulation, and a nongradient simulation search method) are applied to an aircraft analysis and conceptual design program to demonstrate design under uncertainty. The chosen example problems appear to have discontinuous design spaces and thus these examples pose difficulties for many popular methods of uncertainty propagation and quantification. However, specific implementation features of the first and third methods chosen for use in this study enable successful propagation of small uncertainties through the program. Input uncertainties in two configuration design variables are considered. Uncertainties in aircraft weight are computed. The effects of specifying required levels of constraint satisfaction with specified levels of input uncertainty are also demonstrated. The results show, as expected, that the designs under uncertainty are typically heavier and more conservative than those in which no input uncertainties exist.

  8. Probabilistic Modeling of the Renal Stone Formation Module

    NASA Technical Reports Server (NTRS)

    Best, Lauren M.; Myers, Jerry G.; Goodenow, Debra A.; McRae, Michael P.; Jackson, Travis C.

    2013-01-01

    The Integrated Medical Model (IMM) is a probabilistic tool, used in mission planning decision making and medical systems risk assessments. The IMM project maintains a database of over 80 medical conditions that could occur during a spaceflight, documenting an incidence rate and end case scenarios for each. In some cases, where observational data are insufficient to adequately define the inflight medical risk, the IMM utilizes external probabilistic modules to model and estimate the event likelihoods. One such medical event of interest is an unpassed renal stone. Due to a high salt diet and high concentrations of calcium in the blood (due to bone depletion caused by unloading in the microgravity environment) astronauts are at a considerable elevated risk for developing renal calculi (nephrolithiasis) while in space. Lack of observed incidences of nephrolithiasis has led HRP to initiate the development of the Renal Stone Formation Module (RSFM) to create a probabilistic simulator capable of estimating the likelihood of symptomatic renal stone presentation in astronauts on exploration missions. The model consists of two major parts. The first is the probabilistic component, which utilizes probability distributions to assess the range of urine electrolyte parameters and a multivariate regression to transform estimated crystal density and size distributions to the likelihood of the presentation of nephrolithiasis symptoms. The second is a deterministic physical and chemical model of renal stone growth in the kidney developed by Kassemi et al. The probabilistic component of the renal stone model couples the input probability distributions describing the urine chemistry, astronaut physiology, and system parameters with the physical and chemical outputs and inputs to the deterministic stone growth model. These two parts of the model are necessary to capture the uncertainty in the likelihood estimate. The model will be driven by Monte Carlo simulations, continuously

  9. Pre-Service Teachers' Use of Library Databases: Some Insights

    ERIC Educational Resources Information Center

    Lamb, Janeen; Howard, Sarah; Easey, Michael

    2014-01-01

    The aim of this study is to investigate if providing mathematics education pre-service teachers with animated library tutorials on library and database searches changes their searching practices. This study involved the completion of a survey by 138 students and seven individual interviews before and after library search demonstration videos were…

  10. Probabilistic Survivability Versus Time Modeling

    NASA Technical Reports Server (NTRS)

    Joyner, James J., Sr.

    2015-01-01

    This technical paper documents Kennedy Space Centers Independent Assessment team work completed on three assessments for the Ground Systems Development and Operations (GSDO) Program to assist the Chief Safety and Mission Assurance Officer (CSO) and GSDO management during key programmatic reviews. The assessments provided the GSDO Program with an analysis of how egress time affects the likelihood of astronaut and worker survival during an emergency. For each assessment, the team developed probability distributions for hazard scenarios to address statistical uncertainty, resulting in survivability plots over time. The first assessment developed a mathematical model of probabilistic survivability versus time to reach a safe location using an ideal Emergency Egress System at Launch Complex 39B (LC-39B); the second used the first model to evaluate and compare various egress systems under consideration at LC-39B. The third used a modified LC-39B model to determine if a specific hazard decreased survivability more rapidly than other events during flight hardware processing in Kennedys Vehicle Assembly Building (VAB).Based on the composite survivability versus time graphs from the first two assessments, there was a soft knee in the Figure of Merit graphs at eight minutes (ten minutes after egress ordered). Thus, the graphs illustrated to the decision makers that the final emergency egress design selected should have the capability of transporting the flight crew from the top of LC 39B to a safe location in eight minutes or less. Results for the third assessment were dominated by hazards that were classified as instantaneous in nature (e.g. stacking mishaps) and therefore had no effect on survivability vs time to egress the VAB. VAB emergency scenarios that degraded over time (e.g. fire) produced survivability vs time graphs that were line with aerospace industry norms.

  11. Probabilistic Modeling of Rosette Formation

    PubMed Central

    Long, Mian; Chen, Juan; Jiang, Ning; Selvaraj, Periasamy; McEver, Rodger P.; Zhu, Cheng

    2006-01-01

    Rosetting, or forming a cell aggregate between a single target nucleated cell and a number of red blood cells (RBCs), is a simple assay for cell adhesion mediated by specific receptor-ligand interaction. For example, rosette formation between sheep RBC and human lymphocytes has been used to differentiate T cells from B cells. Rosetting assay is commonly used to determine the interaction of Fc γ-receptors (FcγR) expressed on inflammatory cells and IgG coated on RBCs. Despite its wide use in measuring cell adhesion, the biophysical parameters of rosette formation have not been well characterized. Here we developed a probabilistic model to describe the distribution of rosette sizes, which is Poissonian. The average rosette size is predicted to be proportional to the apparent two-dimensional binding affinity of the interacting receptor-ligand pair and their site densities. The model has been supported by experiments of rosettes mediated by four molecular interactions: FcγRIII interacting with IgG, T cell receptor and coreceptor CD8 interacting with antigen peptide presented by major histocompatibility molecule, P-selectin interacting with P-selectin glycoprotein ligand 1 (PSGL-1), and L-selectin interacting with PSGL-1. The latter two are structurally similar and are different from the former two. Fitting the model to data enabled us to evaluate the apparent effective two-dimensional binding affinity of the interacting molecular pairs: 7.19 × 10−5 μm4 for FcγRIII-IgG interaction, 4.66 × 10−3 μm4 for P-selectin-PSGL-1 interaction, and 0.94 × 10−3 μm4 for L-selectin-PSGL-1 interaction. These results elucidate the biophysical mechanism of rosette formation and enable it to become a semiquantitative assay that relates the rosette size to the effective affinity for receptor-ligand binding. PMID:16603493

  12. Talent Searches.

    ERIC Educational Resources Information Center

    Silverman, Linda Kreger, Ed.

    1994-01-01

    Talent searches are discussed in this journal theme issue, with two feature articles and several recurring columns. "Talent Search: A Driving Force in Gifted Education," by Paula Olszewski-Kubilius, defines what a talent search is, how the Talent Search was developed by Dr. Julian Stanley at Johns Hopkins University in Maryland, the…

  13. Probabilistic numerics and uncertainty in computations

    PubMed Central

    Hennig, Philipp; Osborne, Michael A.; Girolami, Mark

    2015-01-01

    We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations. PMID:26346321

  14. Probabilistic numerics and uncertainty in computations.

    PubMed

    Hennig, Philipp; Osborne, Michael A; Girolami, Mark

    2015-07-08

    We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data have led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimizers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

  15. Summary Report for the SINBAD Search Tool Project

    SciTech Connect

    Cunha Da Silva, Alice

    2012-06-01

    The Shielding Integral Benchmark Archive Database (SINBAD) Search Tool has been developed to serve as an interface with the SINBAD database to facilitate a simple and quick means of searching for information related to experimental benchmark problems. The Search Tool is written in Java and provides a better and efficient way to retrieve information from the SINBAD database. Searches can be performed quickly and easily. With regard to improvements, users are no longer required to know the name of the benchmarks to search the database. Instead, a search can be performed by specifying the experimental facility, constituents of the experimental benchmark, etc. In summary, a new powerful database search tool has been developed for SINBAD.

  16. Probabilistic Exposure Analysis for Chemical Risk Characterization

    PubMed Central

    Bogen, Kenneth T.; Cullen, Alison C.; Frey, H. Christopher; Price, Paul S.

    2009-01-01

    This paper summarizes the state of the science of probabilistic exposure assessment (PEA) as applied to chemical risk characterization. Current probabilistic risk analysis methods applied to PEA are reviewed. PEA within the context of risk-based decision making is discussed, including probabilistic treatment of related uncertainty, interindividual heterogeneity, and other sources of variability. Key examples of recent experience gained in assessing human exposures to chemicals in the environment, and other applications to chemical risk characterization and assessment, are presented. It is concluded that, although improvements continue to be made, existing methods suffice for effective application of PEA to support quantitative analyses of the risk of chemically induced toxicity that play an increasing role in key decision-making objectives involving health protection, triage, civil justice, and criminal justice. Different types of information required to apply PEA to these different decision contexts are identified, and specific PEA methods are highlighted that are best suited to exposure assessment in these separate contexts. PMID:19223660

  17. bayesPop: Probabilistic Population Projections

    PubMed Central

    Ševčíková, Hana; Raftery, Adrian E.

    2016-01-01

    We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age- and sex-specific population counts, fertility rates and mortality rates, as well as future numbers of births and deaths. It provides graphical ways of summarizing this information, including trajectory plots and various kinds of probabilistic population pyramids. An expression language is introduced which allows the user to produce the predictive distribution of a wide variety of derived population quantities, such as the median age or the old age dependency ratio. The package produces aggregated projections for sets of countries, such as UN regions or trading blocs. The methodology has been used by the United Nations to produce their most recent official population projections for all countries, published in the World Population Prospects. PMID:28077933

  18. bayesPop: Probabilistic Population Projections.

    PubMed

    Ševčíková, Hana; Raftery, Adrian E

    2016-12-01

    We describe bayesPop, an R package for producing probabilistic population projections for all countries. This uses probabilistic projections of total fertility and life expectancy generated by Bayesian hierarchical models. It produces a sample from the joint posterior predictive distribution of future age- and sex-specific population counts, fertility rates and mortality rates, as well as future numbers of births and deaths. It provides graphical ways of summarizing this information, including trajectory plots and various kinds of probabilistic population pyramids. An expression language is introduced which allows the user to produce the predictive distribution of a wide variety of derived population quantities, such as the median age or the old age dependency ratio. The package produces aggregated projections for sets of countries, such as UN regions or trading blocs. The methodology has been used by the United Nations to produce their most recent official population projections for all countries, published in the World Population Prospects.

  19. A probabilistic approach to spectral graph matching.

    PubMed

    Egozi, Amir; Keller, Yosi; Guterman, Hugo

    2013-01-01

    Spectral Matching (SM) is a computationally efficient approach to approximate the solution of pairwise matching problems that are np-hard. In this paper, we present a probabilistic interpretation of spectral matching schemes and derive a novel Probabilistic Matching (PM) scheme that is shown to outperform previous approaches. We show that spectral matching can be interpreted as a Maximum Likelihood (ML) estimate of the assignment probabilities and that the Graduated Assignment (GA) algorithm can be cast as a Maximum a Posteriori (MAP) estimator. Based on this analysis, we derive a ranking scheme for spectral matchings based on their reliability, and propose a novel iterative probabilistic matching algorithm that relaxes some of the implicit assumptions used in prior works. We experimentally show our approaches to outperform previous schemes when applied to exhaustive synthetic tests as well as the analysis of real image sequences.

  20. Probabilistic Cue Combination: Less is More

    PubMed Central

    Yurovsky, Daniel; Boyer, Ty W.; Smith, Linda B.; Yu, Chen

    2012-01-01

    Learning about the structure of the world requires learning probabilistic relationships: rules in which cues do not predict outcomes with certainty. However, in some cases, the ability to track probabilistic relationships is a handicap, leading adults to perform non-normatively in prediction tasks. For example, in the dilution effect, predictions made from the combination of two cues of different strengths are less accurate than those made from the stronger cue alone. Here we show that dilution is an adult problem; 11-month-old infants combine strong and weak predictors normatively. These results extend and add support for the less is more hypothesis: limited cognitive resources can lead children to represent probabilistic information differently from adults, and this difference in representation can have important downstream consequences for prediction. PMID:23432826