Science.gov

Sample records for probabilistic database search

  1. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  2. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  3. XLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides.

    PubMed

    Ji, Chao; Li, Sujun; Reilly, James P; Radivojac, Predrag; Tang, Haixu

    2016-06-01

    Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein-protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch ). PMID:27068484

  4. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  5. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  6. Paleomagnetic database search possible

    NASA Astrophysics Data System (ADS)

    Harbert, William

    I have recently finished an on-line search program which allows remote users to search the “Abase” ASCII version of the World Paleomagnetic Database developed by Lock and McElhinny [1991]. The program is very simple to use and will search the Soviet, non-Soviet, rock unit, and reference databases and create output files that can be downloaded back to a researcher's local system using the ftp command.To use Search, telnet to 130.49.3.1 (earth.eps.pitt.edu) and login as the user “Search.rdquo There is no password, and the user is asked a series of questions, which define the geographic region and ages of interest. The program will also ask for an identifier with which to create the output file names. The program has three modes of operation: text-only, Tektronix graphics, or X11l/R5 graphics; the proper choice depends on the computer hardware that is used by the searcher.

  7. Online Search Patterns: NLM CATLINE Database.

    ERIC Educational Resources Information Center

    Tolle, John E.; Hah, Sehchang

    1985-01-01

    Presents analysis of online search patterns within user searching sessions of National Library of Medicine ELHILL system and examines user search patterns on the CATLINE database. Data previously analyzed on MEDLINE database for same period is used to compare the performance parameters of different databases within the same information system.…

  8. A probabilistic approach to information retrieval in heterogeneous databases

    SciTech Connect

    Chatterjee, A.; Segev, A.

    1991-08-01

    During the post decade, organizations have increased their scope and operations beyond their traditional geographic boundaries. At the same time, they have adopted heterogeneous and incompatible information systems independent of each other without a careful consideration that one day they may need to be integrated. As a result of this diversity, many important business applications today require access to data stored in multiple autonomous databases. This paper examines a problem of inter-database information retrieval in a heterogeneous environment, where conventional techniques are no longer efficient. To solve the problem, broader definitions for join, union, intersection and selection operators are proposed. Also, a probabilistic method to specify the selectivity of these operators is discussed. An algorithm to compute these probabilities is provided in pseudocode.

  9. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  10. Quantum search of a real unstructured database

    NASA Astrophysics Data System (ADS)

    Broda, Bogusław

    2016-02-01

    A simple circuit implementation of the oracle for Grover's quantum search of a real unstructured classical database is proposed. The oracle contains a kind of quantumly accessible classical memory, which stores the database.

  11. Ranking search for probabilistic fingerprinting codes

    NASA Astrophysics Data System (ADS)

    Schäfer, Marcel; Berchtold, Waldemar; Steinebach, Martin

    2012-03-01

    Digital transaction watermarking today is a widely accepted mechanism to discourage illegal distribution of multimedia. The transaction watermark is a user-specific message that is embedded in all copies of one content and thus makes it individual. Therewith it allows to trace back copyright infringements. One major threat on transaction watermarking are collusion attacks. Here, multiple individualized copies of the work are compared and/or combined to attack the integrity or availability of the embedded watermark message. One solution to counter such attacks are mathematical codes called collusion secure fingerprinting codes. Problems arise when applying such codes to multimedia files with small payload, e.g. short audio tracks or images. Therefore the code length has to be shortened which increases the error rates and/or the effort of the tracing algorithm. In this work we propose an approach whether to use as an addition to probabilistic fingerprinting codes for a reduction of the effort and increment of security, as well as a new separate method providing shorter codes at a very fast and high accurate tracing algorithm.

  12. Searching and Indexing Genomic Databases via Kernelization

    PubMed Central

    Gagie, Travis; Puglisi, Simon J.

    2015-01-01

    The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

  13. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  14. Active fault database of Japan: Its construction and search system

    NASA Astrophysics Data System (ADS)

    Yoshioka, T.; Miyamoto, F.

    2011-12-01

    The Active fault database of Japan was constructed by the Active Fault and Earthquake Research Center, GSJ/AIST and opened to the public on the Internet from 2005 to make a probabilistic evaluation of the future faulting event and earthquake occurrence on major active faults in Japan. The database consists of three sub-database, 1) sub-database on individual site, which includes long-term slip data and paleoseismicity data with error range and reliability, 2) sub-database on details of paleoseismicity, which includes the excavated geological units and faulting event horizons with age-control, 3) sub-database on characteristics of behavioral segments, which includes the fault-length, long-term slip-rate, recurrence intervals, most-recent-event, slip per event and best-estimate of cascade earthquake. Major seismogenic faults, those are approximately the best-estimate segments of cascade earthquake, each has a length of 20 km or longer and slip-rate of 0.1m/ky or larger and is composed from about two behavioral segments in average, are included in the database. This database contains information of active faults in Japan, sorted by the concept of "behavioral segments" (McCalpin, 1996). Each fault is subdivided into 550 behavioral segments based on surface trace geometry and rupture history revealed by paleoseismic studies. Behavioral segments can be searched on the Google Maps. You can select one behavioral segment directly or search segments in a rectangle area on the map. The result of search is shown on a fixed map or the Google Maps with information of geologic and paleoseismic parameters including slip rate, slip per event, recurrence interval, and calculated rupture probability in the future. Behavioral segments can be searched also by name or combination of fault parameters. All those data are compiled from journal articles, theses, and other documents. We are currently developing a revised edition, which is based on an improved database system. More than ten

  15. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins.

    PubMed

    Surujon, Defne; Ratner, David I

    2016-01-01

    The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms. PMID:26751210

  16. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins

    PubMed Central

    Surujon, Defne; Ratner, David I.

    2016-01-01

    The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms. PMID:26751210

  17. Online search patterns: NLM CATLINE database.

    PubMed

    Tolle, J E; Hah, S

    1985-03-01

    In this article the authors present their analysis of the online search patterns within user searching sessions of the National Library of Medicine ELHILL system and examine the user search patterns on the CATLINE database. In addition to the CATLINE analysis, a comparison is made using data previously analyzed on the MEDLINE database for the same time period, thus offering an opportunity to compare the performance parameters of different databases within the same information system. Data collection covers eight weeks and includes 441,282 transactions and over 11,067 user sessions, which accounted for 1680 hours of system usage. The descriptive analysis contained in this report can assists system design activities, while the predictive power of the transaction log analysis methodology may assists the development of real-time aids. PMID:10300015

  18. Efficient search and retrieval in biometric databases

    NASA Astrophysics Data System (ADS)

    Mhatre, Amit J.; Palla, Srinivas; Chikkerur, Sharat; Govindaraju, Venu

    2005-03-01

    Biometric identification has emerged as a reliable means of controlling access to both physical and virtual spaces. Fingerprints, face and voice biometrics are being increasingly used as alternatives to passwords, PINs and visual verification. In spite of the rapid proliferation of large-scale databases, the research has thus far been focused only on accuracy within small databases. In larger applications, response time and retrieval efficiency also become important in addition to accuracy. Unlike structured information such as text or numeric data that can be sorted, biometric data does not have any natural sorting order. Therefore indexing and binning of biometric databases represents a challenging problem. We present results using parallel combination of multiple biometrics to bin the database. Using hand geometry and signature features we show that the search space can be reduced to just 5% of the entire database.

  19. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  20. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  1. Perturbation method for probabilistic search for the traveling salesperson problem

    NASA Astrophysics Data System (ADS)

    Cohoon, James P.; Karro, John E.; Martin, Worthy N.; Niebel, William D.; Nagel, Klaus

    1998-10-01

    The Traveling Salesperson Problem (TSP), is an MP-complete combinatorial optimization problem of substantial importance in many scheduling applications. Here we show the viability of SPAN, a hybrid approach to solving the TSP that incorporates a perturbation method applied to a classic heuristic in the overall context of a probabilistic search control strategy. In particular, the heuristic for the TSP is based on the minimal spanning tree of the city locations, the perturbation method is a simple modification of the city locations, and the control strategy is a genetic algorithm (GA). The crucial concept here is that the perturbation of the problem allows variant solutions to be generated by the heuristic and applied to the original problem, thus providing the GA with capabilities for both exploration in its search process. We demonstrate that SPAN outperforms, with regard to solution quality, one of the best GA system reported in the literature.

  2. ZINCPharmer: pharmacophore search of the ZINC database

    PubMed Central

    Koes, David Ryan; Camacho, Carlos J.

    2012-01-01

    ZINCPharmer (http://zincpharmer.csb.pitt.edu) is an online interface for searching the purchasable compounds of the ZINC database using the Pharmer pharmacophore search technology. A pharmacophore describes the spatial arrangement of the essential features of an interaction. Compounds that match a well-defined pharmacophore serve as potential lead compounds for drug discovery. ZINCPharmer provides tools for constructing and refining pharmacophore hypotheses directly from molecular structure. A search of 176 million conformers of 18.3 million compounds typically takes less than a minute. The results can be immediately viewed, or the aligned structures may be downloaded for off-line analysis. ZINCPharmer enables the rapid and interactive search of purchasable chemical space. PMID:22553363

  3. Protein structure database search and evolutionary classification.

    PubMed

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw]. PMID:16885238

  4. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  5. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions

  6. Incremental learning of probabilistic rules from clinical databases based on rough set theory.

    PubMed Central

    Tsumoto, S.; Tanaka, H.

    1997-01-01

    Several rule induction methods have been introduced in order to discover meaningful knowledge from databases, including medical domain. However, most of the approaches induce rules from all the data in databases and cannot induce incrementally when new samples are derived. In this paper, a new approach to knowledge acquisition, which induce probabilistic rules incrementally by using rough set technique, is introduced and was evaluated on two clinical databases. The results show that this method induces the same rules as those induced by ordinary non-incremental learning methods, which extract rules from all the datasets, but that the former method requires more computational resources than the latter approach. PMID:9357616

  7. Database Search Strategies & Tips. Reprints from the Best of "ONLINE" [and]"DATABASE."

    ERIC Educational Resources Information Center

    Online, Inc., Weston, CT.

    Reprints of 17 articles presenting strategies and tips for searching databases online appear in this collection, which is one in a series of volumes of reprints from "ONLINE" and "DATABASE" magazines. Edited for information professionals who use electronically distributed databases, these articles address such topics as: (1) searching full-text…

  8. WAIS Searching of the Current Contents Database

    NASA Astrophysics Data System (ADS)

    Banholzer, P.; Grabenstein, M. E.

    The Homer E. Newell Memorial Library of NASA's Goddard Space Flight Center is developing capabilities to permit Goddard personnel to access electronic resources of the Library via the Internet. The Library's support services contractor, Maxima Corporation, and their subcontractor, SANAD Support Technologies have recently developed a World Wide Web Home Page (http://www-library.gsfc.nasa.gov) to provide the primary means of access. The first searchable database to be made available through the HomePage to Goddard employees is Current Contents, from the Institute for Scientific Information (ISI). The initial implementation includes coverage of articles from the last few months of 1992 to present. These records are augmented with abstracts and references, and often are more robust than equivalent records in bibliographic databases that currently serve the astronomical community. Maxima/SANAD selected Wais Incorporated's WAIS product with which to build the interface to Current Contents. This system allows access from Macintosh, IBM PC, and Unix hosts, which is an important feature for Goddard's multiplatform environment. The forms interface is structured to allow both fielded (author, article title, journal name, id number, keyword, subject term, and citation) and unfielded WAIS searches. The system allows a user to: Retrieve individual journal article records. Retrieve Table of Contents of specific issues of journals. Connect to articles with similar subject terms or keywords. Connect to other issues of the same journal in the same year. Browse journal issues from an alphabetical list of indexed journal names.

  9. Multiple Database Searching: Techniques and Pitfalls

    ERIC Educational Resources Information Center

    Hawkins, Donald T.

    1978-01-01

    Problems involved in searching multiple data bases are discussed including indexing differences, overlap among data bases, variant spellings, and elimination of duplicate items from search output. Discussion focuses on CA Condensates, Inspec, and Metadex data bases. (J PF)

  10. An efficient quantum search engine on unsorted database

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Zhang, Yingyu; Liu, Fang

    2013-10-01

    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O() queries to factorized oracles. But the algorithm requires that all the attribute values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the attribute values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O() for most cases.

  11. Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)

    2001-01-01

    To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.

  12. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  13. Chemical Substructure Searching: Comparing Three Commercially Available Databases.

    ERIC Educational Resources Information Center

    Wagner, A. Ben

    1986-01-01

    Compares the differences in coverage and utility of three substructure databases--Chemical Abstracts, Index Chemicus, and Chemical Information System's Nomenclature Search System. The differences between Chemical Abstracts with two different vendors--STN International and Questel--are described and a summary guide for choosing between databases is…

  14. Lost in Search: (Mal-)Adaptation to Probabilistic Decision Environments in Children and Adults

    ERIC Educational Resources Information Center

    Betsch, Tilmann; Lehmann, Anne; Lindow, Stefanie; Lang, Anna; Schoemann, Martin

    2016-01-01

    Adaptive decision making in probabilistic environments requires individuals to use probabilities as weights in predecisional information searches and/or when making subsequent choices. Within a child-friendly computerized environment (Mousekids), we tracked 205 children's (105 children 5-6 years of age and 100 children 9-10 years of age) and 103…

  15. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  16. Exhaustive Database Searching for Amino Acid Mutations in Proteomes

    SciTech Connect

    Hyatt, Philip Douglas; Pan, Chongle

    2012-01-01

    Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage.

  17. [Online tutorial for searching a dental database].

    PubMed

    Liem, S L

    2009-05-01

    With millions of resources available on the Internet, it is still difficult to search for appropriate and relevant information, even with the use of advanced search engines. With no systematic quality control of online resources, it is difficult to determine how reliable information is. The consortium Intute, which administers a databank of high quality information available via the Internet, which is intended to support scientific teaching and research, ensures that all information provided has been evaluated and investigated by its own team of specialists in various disciplines. A part of the website of Intute which is accessible free of charge is the Virtual Training Suite, by means of which one can improve one's competence in Internet searching and where a number of reliable and qualitatively superior sources for daily practice are available. PMID:19507421

  18. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  19. Assigning statistical significance to proteotypic peptides via database searches.

    PubMed

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2011-02-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  20. Privacy-preserving search for chemical compound databases

    PubMed Central

    2015-01-01

    Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650

  1. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  2. An Analysis of Performance and Cost Factors in Searching Large Text Databases Using Parallel Search Systems.

    ERIC Educational Resources Information Center

    Couvreur, T. R.; And Others

    1994-01-01

    Discusses the results of modeling the performance of searching large text databases via various parallel hardware architectures and search algorithms. The performance under load and the cost of each configuration are compared, and a common search workload used in the modeling is described. (Contains 26 references.) (LRW)

  3. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  4. Probabilistic Cuing in Large-Scale Environmental Search

    ERIC Educational Resources Information Center

    Smith, Alastair D.; Hood, Bruce M.; Gilchrist, Iain D.

    2010-01-01

    Finding an object in our environment is an important human ability that also represents a critical component of human foraging behavior. One type of information that aids efficient large-scale search is the likelihood of the object being in one location over another. In this study we investigated the conditions under which individuals respond to…

  5. Advanced exact structure searching in large databases of chemical compounds.

    PubMed

    Trepalin, Sergey V; Skorenko, Andrey V; Balakin, Konstantin V; Nasonov, Anatoly F; Lang, Stanley A; Ivashchenko, Andrey A; Savchuk, Nikolay P

    2003-01-01

    Efficient recognition of tautomeric compound forms in large corporate or commercially available compound databases is a difficult and labor intensive task. Our data indicate that up to 0.5% of commercially available compound collections for bioscreening contain tautomers. Though in the large registry databases, such as Beilstein and CAS, the tautomers are found in an automated fashion using high-performance computational technologies, their real-time recognition in the nonregistry corporate databases, as a rule, remains problematic. We have developed an effective algorithm for tautomer searching based on the proprietary chemoinformatics platform. This algorithm reduces the compound to a canonical structure. This feature enables rapid, automated computer searching of most of the known tautomeric transformations that occur in databases of organic compounds. Another useful extension of this methodology is related to the ability to effectively search for different forms of compounds that contain ionic and semipolar bonds. The computations are performed in the Windows environment on a standard personal computer, a very useful feature. The practical application of the proposed methodology is illustrated by several examples of successful recovery of tautomers and different forms of ionic compounds from real commercially available nonregistry databases. PMID:12767143

  6. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    SciTech Connect

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  7. Automatic sub-volume registration by probabilistic random search

    NASA Astrophysics Data System (ADS)

    Han, Jingfeng; Qiao, Min; Hornegger, Joachim; Kuwert, Torsten; Bautz, Werner; Römer, Wolfgang

    2006-03-01

    Registration of an individual's image data set to an anatomical atlas provides valuable information to the physician. In many cases, the individual image data sets are partial data, which may be mapped to one part or one organ of the entire atlas data. Most of the existing intensity based image registration approaches are designed to align images of the entire view. When they are applied to the registration with partial data, a manual pre-registration is usually required. This paper proposes a fully automatic approach to the registration of the incomplete image data to an anatomical atlas. The spatial transformations are modelled as any parametric functions. The proposed method is built upon a random search mechanism, which allows to find the optimal transformation randomly and globally even when the initialization is not ideal. It works more reliably than the existing methods for the partial data registration because it successfully overcomes the local optimum problem. With appropriate similarity measures, this framework is applicable to both mono-modal and multi-modal registration problems with partial data. The contribution of this work is the description of the mathematical framework of the proposed algorithm and the implementation of the related software. The medical evaluation on the MRI data and the comparison of the proposed method with different existing registration methods show the feasibility and superiority of the proposed method.

  8. A Taxonomic Search Engine: Federating taxonomic databases using web services

    PubMed Central

    Page, Roderic DM

    2005-01-01

    Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517

  9. Indexing and searching structure for multimedia database systems

    NASA Astrophysics Data System (ADS)

    Chen, ShuChing; Sista, Srinivas; Shyu, Mei-Ling; Kashyap, Rangasami L.

    1999-12-01

    Recently, multimedia database systems have emerged as a fruitful area for research due to the recent progress in high-speed communication networks, large capacity storage devices, digitized media,and data compression technologies over the last few years. Multimedia information has been used in a variety of applications including manufacturing, education, medicine, entertainment, etc. A multimedia database system integrates text, images, audio, graphics, database system is that all of the different media are brought together into one single unit, all controlled by a computer. As more information sources become available in multimedia systems, how to model and search the image processing techniques to model multimedia data. A Simultaneous Partition and Class Parameter Estimation algorithm that considers the problem of video frame segmentation as a joint estimation of the partition and class parameter variables has been developed and implemented to identify objects and their corresponding spatial relations. Based on the obtained object information, a web spatial model (WSM) is constructed. A WSM is a multimedia database searching structure to model the temporal and spatial relations of semantic objects so that multimedia database queries related to the objects' temporal and spatial relations on the images or video frames can be answered efficiently.

  10. Fast and accurate database searches with MS-GF+Percolator

    SciTech Connect

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  11. The MAO NASU glass archive database: search and management tools

    NASA Astrophysics Data System (ADS)

    Pakuliak, L.

    2005-06-01

    At the Main Astronomical Observatory of the National Academy of Sciences of Ukraine (MAO NASU) the astronomical glass archive counts more than 50,000 of plates obtained in various observational projects during last 50 years of the past century. The local single-user database of glass archive, created on the basis of observational logs and partly on measurement results, has been transformed into an online multy-user system to provide a remote access to the plate archive. In the paper online tools for data searching and database management are presented.

  12. Significant speedup of database searches with HMMs by search space reduction with PSSM family models

    PubMed Central

    Beckstette, Michael; Homann, Robert; Giegerich, Robert; Kurtz, Stefan

    2009-01-01

    Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92. Availability: The presented algorithms are implemented in the program PoSSuMsearch2, available for download at http://bibiserv.techfak.uni-bielefeld.de/possumsearch2/. Contact: beckstette@zbh.uni-hamburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19828575

  13. On the predictability of protein database search complexity and its relevance to optimization of distributed searches.

    PubMed

    Deciu, Cosmin; Sun, Jun; Wall, Mark A

    2007-09-01

    We discuss several aspects related to load balancing of database search jobs in a distributed computing environment, such as Linux cluster. Load balancing is a technique for making the most of multiple computational resources, which is particularly relevant in environments in which the usage of such resources is very high. The particular case of the Sequest program is considered here, but the general methodology should apply to any similar database search program. We show how the runtimes for Sequest searches of tandem mass spectral data can be predicted from profiles of previous representative searches, and how this information can be used for better load balancing of novel data. A well-known heuristic load balancing method is shown to be applicable to this problem, and its performance is analyzed for a variety of search parameters. PMID:17663575

  14. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  15. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches. PMID:25316439

  16. Enriching Great Britain's National Landslide Database by searching newspaper archives

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Freeborough, Katy; Demeritt, David

    2015-11-01

    Our understanding of where landslide hazard and impact will be greatest is largely based on our knowledge of past events. Here, we present a method to supplement existing records of landslides in Great Britain by searching an electronic archive of regional newspapers. In Great Britain, the British Geological Survey (BGS) is responsible for updating and maintaining records of landslide events and their impacts in the National Landslide Database (NLD). The NLD contains records of more than 16,500 landslide events in Great Britain. Data sources for the NLD include field surveys, academic articles, grey literature, news, public reports and, since 2012, social media. We aim to supplement the richness of the NLD by (i) identifying additional landslide events, (ii) acting as an additional source of confirmation of events existing in the NLD and (iii) adding more detail to existing database entries. This is done by systematically searching the Nexis UK digital archive of 568 regional newspapers published in the UK. In this paper, we construct a robust Boolean search criterion by experimenting with landslide terminology for four training periods. We then apply this search to all articles published in 2006 and 2012. This resulted in the addition of 111 records of landslide events to the NLD over the 2 years investigated (2006 and 2012). We also find that we were able to obtain information about landslide impact for 60-90% of landslide events identified from newspaper articles. Spatial and temporal patterns of additional landslides identified from newspaper articles are broadly in line with those existing in the NLD, confirming that the NLD is a representative sample of landsliding in Great Britain. This method could now be applied to more time periods and/or other hazards to add richness to databases and thus improve our ability to forecast future events based on records of past events.

  17. Building high dimensional imaging database for content based image search

    NASA Astrophysics Data System (ADS)

    Sun, Qinpei; Sun, Jianyong; Ling, Tonghui; Wang, Mingqing; Yang, Yuanyuan; Zhang, Jianguo

    2016-03-01

    In medical imaging informatics, content-based image retrieval (CBIR) techniques are employed to aid radiologists in the retrieval of images with similar image contents. CBIR uses visual contents, normally called as image features, to search images from large scale image databases according to users' requests in the form of a query image. However, most of current CBIR systems require a distance computation of image character feature vectors to perform query, and the distance computations can be time consuming when the number of image character features grows large, and thus this limits the usability of the systems. In this presentation, we propose a novel framework which uses a high dimensional database to index the image character features to improve the accuracy and retrieval speed of a CBIR in integrated RIS/PACS.

  18. Fast and accurate database searches with MS-GF+Percolator.

    PubMed

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-01

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community. PMID:24344789

  19. The Saccharomyces Genome Database: Advanced Searching Methods and Data Mining.

    PubMed

    Cherry, J Michael

    2015-12-01

    At the core of the Saccharomyces Genome Database (SGD) are chromosomal features that encode a product. These include protein-coding genes and major noncoding RNA genes, such as tRNA and rRNA genes. The basic entry point into SGD is a gene or open-reading frame name that leads directly to the locus summary information page. A keyword describing function, phenotype, selective condition, or text from abstracts will also provide a door into the SGD. A DNA or protein sequence can be used to identify a gene or a chromosomal region using BLAST. Protein and DNA sequence identifiers, PubMed and NCBI IDs, author names, and function terms are also valid entry points. The information in SGD has been gathered and is maintained by a group of scientific biocurators and software developers who are devoted to providing researchers with up-to-date information from the published literature, connections to all the major research resources, and tools that allow the data to be explored. All the collected information cannot be represented or summarized for every possible question; therefore, it is necessary to be able to search the structured data in the database. This protocol describes the YeastMine tool, which provides an advanced search capability via an interactive tool. The SGD also archives results from microarray expression experiments, and a strategy designed to explore these data using the SPELL (Serial Pattern of Expression Levels Locator) tool is provided. PMID:26631124

  20. 3-D lookup: Fast protein structure database searches

    SciTech Connect

    Holm. L.; Sander, C.

    1995-12-31

    There are far fewer classes of three-dimensional protein folds than sequence families but the problem of detecting three-dimensional similarities is NP-complete. We present a novel heuristic for identifying 3-D similarities between a query structure and the database of known protein structures. Many methods for structure alignment use a bottom-up approach, identifying first local matches and then solving a combinatorial problem in building up larger clusters of matching substructures. Here the top-down approach is to start with the global comparison and select a rough superimposition using a fast 3-D lookup of secondary structure motifs. The superimposition is then extended to an alignment of C{sup {alpha}} atoms by an iterative dynamic programming step. An all-against-all comparison of 385-representative proteins (150,000 pair comparisons) took 1 day of computer time on a single R8000 processor. In other words, one query structure is scanned against the database in a matter of minutes. The method is rated at 90% reliability at capturing statistically significant similarities. It is useful as a rapid preprocessor to a comprehensive protein structure database search system.

  1. Numerical database system based on a weighted search tree

    NASA Astrophysics Data System (ADS)

    Park, S. C.; Bahri, C.; Draayer, J. P.; Zheng, S.-Q.

    1994-09-01

    An on-line numerical database system, that is based on the concept of a weighted search tree and which functions like a file directory, is introduced. The system, which is designed to aid in reducing time-consuming redundant calculations in numerically intensive computations, can be used to fetch, insert and delete items from a dynamically generated list in optimal [ O(log n) where n is the number of items in the list] time. Items in the list are ordered according to a priority queue with the initial priority for each element set either automatically or by an user supplied algorithm. The priority queue is updated on-the-fly to reflect element hit frequency. Items can be added to a database so long as there is space to accommodate them, and when there is not, the lowest priority element(s) is removed to make room for an incoming element(s) with higher priority. The system acts passively and therefore can be applied to any number of databases, with the same or different structures, within a single application.

  2. Grover's search algorithm with an entangled database state

    NASA Astrophysics Data System (ADS)

    Alsing, Paul M.; McDonald, Nathan

    2011-05-01

    Grover's oracle based unstructured search algorithm is often stated as "given a phone number in a directory, find the associated name." More formally, the problem can be stated as "given as input a unitary black box Uf for computing an unknown function f:{0,1}n ->{0,1}find x=x0 an element of {0,1}n such that f(x0) =1, (and zero otherwise). The crucial role of the externally supplied oracle Uf (whose inner workings are unknown to the user) is to change the sign of the solution 0 x , while leaving all other states unaltered. Thus, Uf depends on the desired solution x0. This paper examines an amplitude amplification algorithm in which the user encodes the directory (e.g. names and telephone numbers) into an entangled database state, which at a later time can be queried on one supplied component entry (e.g. a given phone number t0) to find the other associated unknown component (e.g. name x0). For N=2n names x with N associated phone numbers t , performing amplitude amplification on a subspace of size N of the total space of size N2 produces the desired state 0 0 x t in √N steps. We discuss how and why sequential (though not concurrent parallel) searches can be performed on multiple database states. Finally, we show how this procedure can be generalized to databases with more than two correlated lists (e.g. x t s r ...).

  3. Accelerating chemical database searching using graphics processing units.

    PubMed

    Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric

    2011-08-22

    The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097. PMID:21696144

  4. A Multilevel Probabilistic Beam Search Algorithm for the Shortest Common Supersequence Problem

    PubMed Central

    Gallardo, José E.

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably. PMID:23300667

  5. A multilevel probabilistic beam search algorithm for the shortest common supersequence problem.

    PubMed

    Gallardo, José E

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably. PMID:23300667

  6. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  7. Global vs. Localized Search: A Comparison of Database Selection Methods in a Hierarchical Environment.

    ERIC Educational Resources Information Center

    Conrad, Jack G.; Claussen, Joanne Smestad; Yang, Changwen

    2002-01-01

    Compares standard global information retrieval searching with more localized techniques to address the database selection problem that users often have when searching for the most relevant database, based on experiences with the Westlaw Directory. Findings indicate that a browse plus search approach in a hierarchical environment produces the most…

  8. Accuracy of a probabilistic record-linkage methodology used to track blood donors in the Mortality Information System database

    PubMed Central

    Capuani, Ligia; Bierrenbach, Ana Luiza; Abreu, Fatima; Takecian, Pedro Losco; Ferreira, João Eduardo; Sabino, Ester Cerdeira

    2016-01-01

    The probabilistic record linkage (PRL) is based on a likelihood score that measures the degree of similarity of several matching variables. Screening test results for different diseases are available for the blood donor population. In this paper, we describe the accuracy of a PRL process used to track blood donors from the Fundação Pró-Sangue (FPS) in the Mortality Information System (SIM), in order that future studies might determine the blood donor’s cause of death. The databases used for linkage were SIM and the database made up of individuals that were living (200 blood donors in 2007) and dead (196 from the Hospital das Clinicas de São Paulo that died in 2001–2005). The method consists of cleaning and linking the databases using three blocking steps comparing the variables “Name/Mother’s Name/ Date of Birth” to determine a cut-off score. For a cut-off score of 7.06, the sensitivity and specificity of the method is 94.4% (95%CI: 90.0–97.0) and 100% (95%CI: 98.0–100.0), respectively. This method can be used in studies that aim to track blood donors from the FPS database in SIM. PMID:25210903

  9. An approach in building a chemical compound search engine in oracle database.

    PubMed

    Wang, H; Volarath, P; Harrison, R

    2005-01-01

    A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework. PMID:17282834

  10. On optimizing distance-based similarity search for biological databases.

    PubMed

    Mao, Rui; Xu, Weijia; Ramakrishnan, Smriti; Nuckolls, Glen; Miranker, Daniel P

    2005-01-01

    Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits. PMID:16447992

  11. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  12. A Comparison of Computer-Based Bibliographic Database Searching vs. Manual Bibliographic Searching. Final CARE Grant Report #5964.

    ERIC Educational Resources Information Center

    Pritchard, Eileen E.; Rockman, Ilene F.

    The purpose of this study was to improve upon previous research investigations by analyzing the elements of cost effectiveness, precision, recall, citation overlap, and hours searching in a comparison between computerized database searching and manual searching, using the setting of an academic library environment with a diverse group of students…

  13. Impact of Prior Knowledge of Informational Content and Organization on Learning Search Principles in a Database.

    ERIC Educational Resources Information Center

    Linde, Lena; Bergstrom, Monica

    1988-01-01

    The importance of prior knowledge of informational content and organization for search performance on a database was evaluated for 17 undergraduates. Pretraining related to content, and information did facilitate learning logical search principles in a relational database; contest pretraining was more efficient. (SLD)

  14. EasyKSORD: A Platform of Keyword Search Over Relational Databases

    NASA Astrophysics Data System (ADS)

    Peng, Zhaohui; Li, Jing; Wang, Shan

    Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.

  15. Seismic hazard assessment for Myanmar: Earthquake model database, ground-motion scenarios, and probabilistic assessments

    NASA Astrophysics Data System (ADS)

    Chan, C. H.; Wang, Y.; Thant, M.; Maung Maung, P.; Sieh, K.

    2015-12-01

    We have constructed an earthquake and fault database, conducted a series of ground-shaking scenarios, and proposed seismic hazard maps for all of Myanmar and hazard curves for selected cities. Our earthquake database integrates the ISC, ISC-GEM and global ANSS Comprehensive Catalogues, and includes harmonized magnitude scales without duplicate events. Our active fault database includes active fault data from previous studies. Using the parameters from these updated databases (i.e., the Gutenberg-Richter relationship, slip rate, maximum magnitude and the elapse time of last events), we have determined the earthquake recurrence models of seismogenic sources. To evaluate the ground shaking behaviours in different tectonic regimes, we conducted a series of tests by matching the modelled ground motions to the felt intensities of earthquakes. Through the case of the 1975 Bagan earthquake, we determined that Atkinson and Moore's (2003) scenario using the ground motion prediction equations (GMPEs) fits the behaviours of the subduction events best. Also, the 2011 Tarlay and 2012 Thabeikkyin events suggested the GMPEs of Akkar and Cagnan (2010) fit crustal earthquakes best. We thus incorporated the best-fitting GMPEs and site conditions based on Vs30 (the average shear-velocity down to 30 m depth) from analysis of topographic slope and microtremor array measurements to assess seismic hazard. The hazard is highest in regions close to the Sagaing Fault and along the Western Coast of Myanmar as seismic sources there have earthquakes occur at short intervals and/or last events occurred a long time ago. The hazard curves for the cities of Bago, Mandalay, Sagaing, Taungoo and Yangon show higher hazards for sites close to an active fault or with a low Vs30, e.g., the downtown of Sagaing and Shwemawdaw Pagoda in Bago.

  16. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  17. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  18. Probabilistic person identification in TV news programs using image web database

    NASA Astrophysics Data System (ADS)

    Battisti, F.; Carli, M.; Leo, M.; Neri, A.

    2014-02-01

    The automatic labeling of faces in TV broadcasting is still a challenging problem. The high variability in view points, facial expressions, general appearance, and lighting conditions, as well as occlusions, rapid shot changes, and camera motions, produce significant variations in image appearance. The application of automatic tools for face recognition is not yet fully established and the human intervention is needed. In this paper, we deal with the automatic face recognition in TV broadcasting programs. The target of the proposed method is to identify the presence of a specific person in a video by means of a set of images downloaded from Web using a specific search key.

  19. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  20. Efficiency of database search for identification of mutated and modified proteins via mass spectrometry.

    PubMed

    Pevzner, P A; Mulyukov, Z; Dancik, V; Tang, C L

    2001-02-01

    Although protein identification by matching tandem mass spectra (MS/MS) against protein databases is a widespread tool in mass spectrometry, the question about reliability of such searches remains open. Absence of rigorous significance scores in MS/MS database search makes it difficult to discard random database hits and may lead to erroneous protein identification, particularly in the case of mutated or post-translationally modified peptides. This problem is especially important for high-throughput MS/MS projects when the possibility of expert analysis is limited. Thus, algorithms that sort out reliable database hits from unreliable ones and identify mutated and modified peptides are sought. Most MS/MS database search algorithms rely on variations of the Shared Peaks Count approach that scores pairs of spectra by the peaks (masses) they have in common. Although this approach proved to be useful, it has a high error rate in identification of mutated and modified peptides. We describe new MS/MS database search tools, MS-CONVOLUTION and MS-ALIGNMENT, which implement the spectral convolution and spectral alignment approaches to peptide identification. We further analyze these approaches to identification of modified peptides and demonstrate their advantages over the Shared Peaks Count. We also use the spectral alignment approach as a filter in a new database search algorithm that reliably identifies peptides differing by up to two mutations/modifications from a peptide in a database. PMID:11157792

  1. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  2. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  3. Optimal design of groundwater remediation system using a probabilistic multi-objective fast harmony search algorithm under uncertainty

    NASA Astrophysics Data System (ADS)

    Luo, Qiankun; Wu, Jianfeng; Yang, Yun; Qian, Jiazhong; Wu, Jichun

    2014-11-01

    This study develops a new probabilistic multi-objective fast harmony search algorithm (PMOFHS) for optimal design of groundwater remediation systems under uncertainty associated with the hydraulic conductivity (K) of aquifers. The PMOFHS integrates the previously developed deterministic multi-objective optimization method, namely multi-objective fast harmony search algorithm (MOFHS) with a probabilistic sorting technique to search for Pareto-optimal solutions to multi-objective optimization problems in a noisy hydrogeological environment arising from insufficient K data. The PMOFHS is then coupled with the commonly used flow and transport codes, MODFLOW and MT3DMS, to identify the optimal design of groundwater remediation systems for a two-dimensional hypothetical test problem and a three-dimensional Indiana field application involving two objectives: (i) minimization of the total remediation cost through the engineering planning horizon, and (ii) minimization of the mass remaining in the aquifer at the end of the operational period, whereby the pump-and-treat (PAT) technology is used to clean up contaminated groundwater. Also, Monte Carlo (MC) analysis is employed to evaluate the effectiveness of the proposed methodology. Comprehensive analysis indicates that the proposed PMOFHS can find Pareto-optimal solutions with low variability and high reliability and is a potentially effective tool for optimizing multi-objective groundwater remediation problems under uncertainty.

  4. Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

    PubMed Central

    Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

    2007-01-01

    Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248

  5. InfoTrac's SearchBank Databases: Business Information and More.

    ERIC Educational Resources Information Center

    Mehta, Usha; Goodman, Beth

    1997-01-01

    Describes the InfoTrac SearchBank based on experiences at the University of Nevada, Reno, libraries where the service is available through the online catalog. Highlights include remote access through the Internet; indexing and abstracting; full-text access to 460 journal titles; a powerful search engine; and business-oriented databases.…

  6. A student's guide to searching the literature using online databases

    NASA Astrophysics Data System (ADS)

    Miller, Casey W.; Belyea, Dustin; Chabot, Michelle; Messina, Troy

    2012-02-01

    A method is described to empower students to efficiently perform general and specific literature searches using online resources [Miller et al., Am. J. Phys. 77, 1112 (2009)]. The method was tested on multiple groups, including undergraduate and graduate students with varying backgrounds in scientific literature searches. Students involved in this study showed marked improvement in their awareness of how and where to find scientific information. Repeated exposure to literature searching methods appears worthwhile, starting early in the undergraduate career, and even in graduate school orientation.

  7. STEPS: A Grid Search Methodology for Optimized Peptide Identification Filtering of MS/MS Database Search Results

    SciTech Connect

    Piehowski, Paul D.; Petyuk, Vladislav A.; Sandoval, John D.; Burnum, Kristin E.; Kiebel, Gary R.; Monroe, Matthew E.; Anderson, Gordon A.; Camp, David G.; Smith, Richard D.

    2013-03-01

    For bottom-up proteomics there are a wide variety of database searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection - referred to as STEPS - utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  8. Optimal design of groundwater remediation systems using a probabilistic multi-objective fast harmony search algorithm under uncertainty

    NASA Astrophysics Data System (ADS)

    Luo, Q.; Wu, J.; Qian, J.

    2013-12-01

    This study develops a new probabilistic multi-objective fast harmony search algorithm (PMOFHS) for optimal design of groundwater remediation system under uncertainty associated with the hydraulic conductivity of aquifers. The PMOFHS integrates the previously developed deterministic multi-objective optimization method, namely multi-objective fast harmony search algorithm (MOFHS) with a probabilistic Pareto domination ranking and probabilistic niche technique to search for Pareto-optimal solutions to multi-objective optimization problems in a noisy hydrogeological environment arising from insufficient hydraulic conductivity data. The PMOFHS is then coupled with the commonly used flow and transport codes, MODFLOW and MT3DMS, to identify the optimal groundwater remediation system of a two-dimensional hypothetical test problem involving two objectives: (i) minimization of the total remediation cost through the engineering planning horizon, and (ii) minimization of the percentage of mass remaining in the aquifer at the end of the operational period, which uses the Pump-and-Treat (PAT) technology to clean up contaminated groundwater. Also, Monte Carlo (MC) analysis is used to demonstrate the effectiveness of the proposed methodology. The MC analysis is taken to each Pareto solutions for every K realization. Then the statistical mean and the upper and lower bounds of uncertainty intervals of 95% confidence level are calculated. The MC analysis results show that all of the Pareto-optimal solutions are located between the upper and lower bounds of the MC analysis. Moreover, the root mean square errors (RMSEs) between the Pareto-optimal solutions by the PMOFHS and the average values of optimal solutions by the MC analysis are 0.0204 for the first objective and 0.0318 for the second objective, quite smaller than those RMSEs between the results by the existing probabilistic multi-objective genetic algorithm (PMOGA) and the MC analysis, 0.0384 and 0.0397, respectively. In

  9. Uninformed and probabilistic distributed agent combinatorial searches for the unary NP-complete disassembly line balancing problem

    NASA Astrophysics Data System (ADS)

    McGovern, Seamus M.; Gupta, Surendra M.

    2005-11-01

    Disassembly takes place in remanufacturing, recycling, and disposal, with a line being the best choice for automation. The disassembly line balancing problem seeks a sequence which: is feasible, minimizes workstations, and ensures similar idle times, as well as other end-of-life specific concerns. Finding the optimal balance is computationally intensive due to exponential growth. Combinatorial optimization methods hold promise for providing solutions to the disassembly line balancing problem, which is proven here to belong to the class of unary NP-complete problems. Probabilistic (ant colony optimization) and uninformed (H-K) search methods are presented and compared. Numerical results are obtained using a recent case study to illustrate the search implementations and compare their performance. Conclusions drawn include the consistent generation of near-optimal solutions, the ability to preserve precedence, the speed of the techniques, and their practicality due to ease of implementation.

  10. A Practical Introduction to Non-Bibliographic Database Searching.

    ERIC Educational Resources Information Center

    Rocke, Hans J.; And Others

    This guide comprises four reports on the Laboratory Animal Data Bank (LADB), the National Institute of Health Environmental Protection Agency (NIH/EPA) Chemical Information System (CIS), nonbibliographic databases for the social sciences, and the Toxicology Data Bank (TDB) and Registry of Toxic Effects of Chemical Substances (RTECS). The first…

  11. A student's guide to searching the literature using online databases

    NASA Astrophysics Data System (ADS)

    Miller, Casey W.; Belyea, Dustin; Chabot, Michelle; Messina, Troy

    2011-03-01

    A method is described to empower students to efficiently perform general and specific literature searches using online resources [Miller et al., Am. J. Phys. 77, 1112 (2009)]. The method was tested on undergraduate and graduate students with varying backgrounds in scientific literature. Students involved in this study showed marked improvement in their awareness of how and where to find scientific information. Repeated exposure to literature searching methods appears worthwhile, starting early in the undergraduate career, and even in graduate school orientation. Supported by NSF-CAREER, and the Mattie Allen Broyles and Gus S. Wortham Endowments.

  12. Using "Reader's Guide to Periodical Literature" on CD-Rom To Teach Database Searching to High School Students.

    ERIC Educational Resources Information Center

    Kern, Joanne F.

    The lack of opportunity for high school sophomores to learn database searching was addressed by the implementation of a computerized magazine article search program. "Reader's Guide to Periodical Literature" on CD-ROM was used to train students in database searching during the time they were assigned to the library to do research papers for…

  13. MIDAS: a database-searching algorithm for metabolite identification in metabolomics.

    PubMed

    Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P; Pan, Chongle

    2014-10-01

    A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org. PMID:25157598

  14. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  15. Sagace: A web-based search engine for biomedical databases in Japan

    PubMed Central

    2012-01-01

    Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816

  16. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    SciTech Connect

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division; Purdue Univ.

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, the necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.

  17. Toxic release inventory database. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-09-01

    The bibliography contains citations concerning the Toxic Release Inventory (TRI) database issued by the Environmental Protection Agency (EPA). The TRI database was begun by EPA in response to Section 313 of the Emergency Planning and Community Right-to-Know Act of the Superfund Amendments and Reauthorization Act of 1986, which required EPA to establish an inventory by states of routine toxic chemical emissions from certain facilities. There are over 300 chemicals and categories on these lists. The reporting requirement applies to owners and operators of manufacturing facilities that employ 10 or more full-time employees and that manufacture, process, or otherwise use a tested toxic chemical in excess of specified threshold quantities. The data file is contained on diskettes in dBASE III format or LOTUS 1-2-3 format available from the National Technical Information Service (NTIS). (Contains 250 citations and includes a subject term index and title list.)

  18. Using homology relations within a database markedly boosts protein sequence similarity search.

    PubMed

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  19. Using homology relations within a database markedly boosts protein sequence similarity search

    PubMed Central

    Tong, Jing; Sadreyev, Ruslan I.; Pei, Jimin; Kinch, Lisa N.; Grishin, Nick V.

    2015-01-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence–based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit’s known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  20. PLAST: parallel local alignment search tool for database comparison

    PubMed Central

    Nguyen, Van Hoa; Lavenier, Dominique

    2009-01-01

    Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems. PMID:19821978

  1. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases.

    PubMed

    Pols, David H J; Bramer, Wichor M; Bindels, Patrick J E; van de Laar, Floris A; Bohnen, Arthur M

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited. PMID:26195683

  2. Searching for planning operators with context-dependent and probabilistic effects

    SciTech Connect

    Oates, T.; Cohen, P.R.

    1996-12-31

    Providing a complete and accurate domain model for an agent situated in a complex environment can be an extremely difficult task. Actions may have different effects depending on the context in which they are taken, and actions may or may not induce their intended effects, with the probability of success again depending on context. We present an algorithm for automatically learning planning operators with context-dependent and probabilistic effects in environments where exogenous events change the state of the world. Empirical results show that the algorithm successfully finds operators that capture the true structure of an agent`s interactions with its environment, and avoids spurious associations between actions and exogenous events.

  3. Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases

    NASA Astrophysics Data System (ADS)

    Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng

    2013-03-01

    Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.

  4. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    PubMed

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases. PMID:26719890

  5. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search

    PubMed Central

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases PMID:26719890

  6. On a Probabilistic Approach to Determining the Similarity between Boolean Search Request Formulations.

    ERIC Educational Resources Information Center

    Radecki, Tadeusz

    1982-01-01

    Presents and discusses the results of research into similarity measures for search request formulations which employ Boolean combinations of index terms. The use of a weighting mechanism to indicate the importance of attributes in a search formulation is described. A 16-item reference list is included. (JL)

  7. Database search for safety information on cosmetic ingredients.

    PubMed

    Pauwels, Marleen; Rogiers, Vera

    2007-12-01

    Ethical considerations with respect to experimental animal use and regulatory testing are worldwide under heavy discussion and are, in certain cases, taken up in legislative measures. The most explicit example is the European cosmetic legislation, establishing a testing ban on finished cosmetic products since 11 September 2004 and enforcing that the safety of a cosmetic product is assessed by taking into consideration "the general toxicological profile of the ingredients, their chemical structure and their level of exposure" (OJ L151, 32-37, 23 June 1993; OJ L066, 26-35, 11 March 2003). Therefore the availability of referenced and reliable information on cosmetic ingredients becomes a dire necessity. Given the high-speed progress of the World Wide Web services and the concurrent drastic increase in free access to information, identification of relevant data sources and evaluation of the scientific value and quality of the retrieved data, are crucial. Based upon own practical experience, a survey is put together of freely and commercially available data sources with their individual description, field of application, benefits and drawbacks. It should be mentioned that the search strategies described are equally useful as a starting point for any quest for safety data on chemicals or chemical-related substances in general. PMID:17919791

  8. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  9. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.

    2009-05-06

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

  10. Planning for End-User Database Searching: Drexel and the Mac: A User-Consistent Interface.

    ERIC Educational Resources Information Center

    LaBorie, Tim; Donnelly, Leslie

    Drexel University instituted a microcomputing program in 1984 which required all freshmen to own Apple Macintosh microcomputers. All students were taught database searching on the BRS (Bibliographic Retrieval Services) system as part of the freshman humanities curriculum, and the university library was chosen as the site to house continuing…

  11. More Databases Searched by a Business Generalist--Part 2: A Veritable Cornucopia of Sources.

    ERIC Educational Resources Information Center

    Meredith, Meri

    1986-01-01

    This second installment describes databases irregularly searched in the Business Information Center, Cummins Engine Company (Columbus, Indiana). Highlights include typical research topics (happenings among similar manufacturers); government topics (Department of Defense contracts); market and industry topics; corporate intelligence; and personnel,…

  12. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  13. Parallel database search and prime factorization with magnonic holographic memory devices

    SciTech Connect

    Khitun, Alexander

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  14. Successful Keyword Searching: Initiating Research on Popular Topics Using Electronic Databases.

    ERIC Educational Resources Information Center

    MacDonald, Randall M.; MacDonald, Susan Priest

    Students are using electronic resources more than ever before to locate information for assignments. Without the proper search terms, results are incomplete, and students are frustrated. Using the keywords, key people, organizations, and Web sites provided in this book and compiled from the most commonly used databases, students will be able to…

  15. Parallel database search and prime factorization with magnonic holographic memory devices

    NASA Astrophysics Data System (ADS)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  16. An Investigation of the Optimization of Search Logic for the MEDLINE Database.

    ERIC Educational Resources Information Center

    Heine, M. H.; Tague, J. M.

    1991-01-01

    Discussion of the role of Boolean logic in the information retrieval process focuses on a study that investigated the optimization of search logic for the MEDLINE database. Measures of retrieval effectiveness are discussed, the relationship of weighting schema and the logical schema is considered, and further investigations are suggested. (17…

  17. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  18. Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the shortcoming of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS database searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications that are prohibitively time consuming with existing approaches. We further introduce gapped tags that have advantages over the conventional peptide sequence tags in filtration-based MS/MS database searches.

  19. Using Search Algorithms and Probabilistic Graphical Models to Understand the Influence of Atmospheric Circulation on Western US Drought

    NASA Astrophysics Data System (ADS)

    Malevich, S. B.; Woodhouse, C. A.

    2015-12-01

    This work explores a new approach to quantify cool-season mid-latitude circulation dynamics as they relate western US streamflow variability and drought. This information is used to probabilistically associate patterns of synoptic atmospheric circulation with spatial patterns of drought in western US streamflow. Cool-season storms transport moisture from the Pacific Ocean and are a primary source for western US streamflow. Studies overthe past several decades have emphasized that the western US hydroclimate is influenced by the intensity and phasing of ocean and atmosphere dynamics and teleconnections, such as ENSO and North Pacific variability. These complex interactions are realized in atmospheric circulation along the west coast of North America. The region's atmospheric circulation can encourage a preferential flow in winter storm tracks from the Pacific, and thus influence the moisture conditions of a given river basin over the course of the cool season. These dynamics have traditionally been measured with atmospheric indices based on values from fixed points in space or principal component loadings. This study uses collective search agents to quantify the position and intensity of potentially non-stationary atmosphere features in climate reanalysis datasets, relative to regional hydrology. Results underline the spatio-temporal relationship between semi-permanent atmosphere characteristics and naturalized streamflow from major river basins of the western US. A probabilistic graphical model quantifies this relationship while accounting for uncertainty from noisy climate processes, and eventually, limitations from dataset length. This creates probabilities for semi-permanent atmosphere features which we hope to associate with extreme droughts of the paleo record, based on our understanding of atmosphere-streamflow relations observed in the instrumental record.

  20. IMPROVED SEARCH OF PRINCIPAL COMPONENT ANALYSIS DATABASES FOR SPECTRO-POLARIMETRIC INVERSION

    SciTech Connect

    Casini, R.; Lites, B. W.; Ramos, A. Asensio

    2013-08-20

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 2{sup 4n} bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of ''compatible'' models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 2{sup 4n} as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  1. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures. PMID:27123583

  2. Improved Search of Principal Component Analysis Databases for Spectro-polarimetric Inversion

    NASA Astrophysics Data System (ADS)

    Casini, R.; Asensio Ramos, A.; Lites, B. W.; López Ariste, A.

    2013-08-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 24n bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 24n as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  3. Probabilistic modeling of eye movement data during conjunction search via feature-based attention.

    PubMed

    Rutishauser, Ueli; Koch, Christof

    2007-01-01

    Where the eyes fixate during search is not random; rather, gaze reflects the combination of information about the target and the visual input. It is not clear, however, what information about a target is used to bias the underlying neuronal responses. We here engage subjects in a variety of simple conjunction search tasks while tracking their eye movements. We derive a generative model that reproduces these eye movements and calculate the conditional probabilities that observers fixate, given the target, on or near an item in the display sharing a specific feature with the target. We use these probabilities to infer which features were biased by top-down attention: Color seems to be the dominant stimulus dimension for guiding search, followed by object size, and lastly orientation. We use the number of fixations it took to find the target as a measure of task difficulty. We find that only a model that biases multiple feature dimensions in a hierarchical manner can account for the data. Contrary to common assumptions, memory plays almost no role in search performance. Our model can be fit to average data of multiple subjects or to individual subjects. Small variations of a few key parameters account well for the intersubject differences. The model is compatible with neurophysiological findings of V4 and frontal eye fields (FEF) neurons and predicts the gain modulation of these cells. PMID:17685788

  4. Current Comparative Table (CCT) automates customized searches of dynamic biological databases

    PubMed Central

    Landsteiner, Benjamin R.; Olson, Michael R.; Rutherford, Robert

    2005-01-01

    The Current Comparative Table (CCT) software program enables working biologists to automate customized bioinformatics searches, typically of remote sequence or HMM (hidden Markov model) databases. CCT currently supports BLAST, hmmpfam and other programs useful for gene and ortholog identification. The software is web based, has a BioPerl core and can be used remotely via a browser or locally on Mac OS X or Linux machines. CCT is particularly useful to scientists who study large sets of molecules in today's evolving information landscape because it color-codes all result files by age and highlights even tiny changes in sequence or annotation. By empowering non-bioinformaticians to automate custom searches and examine current results in context at a glance, CCT allows a remote database submission in the evening to influence the next morning's bench experiment. A demonstration of CCT is available at and the open source software is freely available from . PMID:15980582

  5. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-01

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds. PMID:11428924

  6. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  7. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-01-01

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. PMID:27603022

  8. Development of smart searching algorithms for vulnerability and uncertainty analyses in probabilistic risk assessments

    SciTech Connect

    Benjamin, A.S.

    1993-11-01

    In order to evaluate the risk inherent in a complex system, including a reasonable accounting of the many uncertainties that characterize both the environments to which the system is exposed and the responses of the system to these environments, risk analysts are forced to use statistical approaches. If the system is very well designed for safety, then simple convolution or sampling approaches may have to be augmented by intelligent searching schemes. An excellent example is the prediction of the probability of an accidental nuclear detonation for a nuclear weapon system. Under practically all situations, accidental nuclear detonation is virtually impossible because modern nuclear weapon systems are designed to preclude it. They have a series of strong links and weak links that are guaranteed to fail in a prescribed order when exposed to virtually all credible abnormal environments, such as fires, impacts, punctures, crushes, external pressures, lightning or chemical attack. However, no engineered system is perfect, and under certain peculiar conditions involving combined environments that are spatially directs in the worst possible way, the strong links may fail before the weak links and an accidental nuclear detonation may conditions through an intelligent searching process and to determine whether their probability of occurrence is high enough to be of concern. In the weapon system application, LHS generates millions of sample members, each consisting of a different set of parameter values, and we must employ intelligent searching techniques to reduce this set to a more workable number. We do this through a discriminator subprocess, which we will describe.

  9. Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

    NASA Astrophysics Data System (ADS)

    Kutzleb, C. D.

    1997-02-01

    The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.

  10. BioSCAN: a network sharable computational resource for searching biosequence databases.

    PubMed

    Singh, R K; Hoffman, D L; Tell, S G; White, C T

    1996-06-01

    We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations. PMID:8872387

  11. A hybrid approach for addressing ring flexibility in 3D database searching.

    PubMed

    Sadowski, J

    1997-01-01

    A hybrid approach for flexible 3D database searching is presented that addresses the problem of ring flexibility. It combines the explicit storage of up to 25 multiple conformations of rings, with up to eight atoms, generated by the 3D structure generator CORINA with the power of a torsional fitting technique implemented in the 3D database system UNITY. A comparison with the original UNITY approach, using a database with about 130,000 entries and five different pharmacophore queries, was performed. The hybrid approach scored, on an average, 10-20% more hits than the reference run. Moreover, specific problems with unrealistic hit geometries produced by the original approach can be excluded. In addition, the influence of the maximum number of ring conformations per molecule was investigated. An optimal number of 10 conformations per molecule is recommended. PMID:9139112

  12. Improved classification of mass spectrometry database search results using newer machine learning approaches.

    PubMed

    Ulintz, Peter J; Zhu, Ji; Qin, Zhaohui S; Andrews, Philip C

    2006-03-01

    Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins. PMID:16321970

  13. Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

    NASA Astrophysics Data System (ADS)

    Munekawa, Yuma; Ino, Fumihiko; Hagihara, Kenichi

    This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.

  14. An efficient similarity search based on indexing in large DNA databases.

    PubMed

    Jeong, In-Seon; Park, Kyoung-Wook; Kang, Seung-Ho; Lim, Hyeong-Seok

    2010-04-01

    Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. PMID:20418167

  15. Data Analysis Provenance: Use Case for Exoplanet Search in CoRoT Database

    NASA Astrophysics Data System (ADS)

    de Souza, L.; Salete Marcon Gomes Vaz, M.; Emílio, M.; Ferreira da Rocha, J. C.; Janot Pacheco, E.; Carlos Boufleur, R.

    2012-09-01

    CoRoT (COnvection Rotation and Planetary Transits) is a mission led by the French national space agency CNES, in collaboration with Austria, Spain, Germany, Belgium and Brazil. The mission priority is dedicated to exoplanet search and stellar seismology. CoRoT light curves database became public after one year of their delivery to the CoRoT Co-Is, following the CoRoT data policy. The CoRoT archive contains thousands of light curves in FITS format. Several exoplanet search algorithms require detrend algorithms to remove both stellar and instrumental signal, improving the chance to detect a transit. Different detrend and transit detection algorithms can be applied to the same database. Tracking the origin of the information and how the data was derived in each level in the data analysis process is essential to allow sharing, reuse, reprocessing and further analysis. This work aims at applying a formalized and codified knowledge model by means of domain ontology. It allows to enrich the data analysis with semantic and standardization. It holds the provenance information in the database for a posteriori recovers by humans or software agents.

  16. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

    PubMed Central

    Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336

  17. Searching for patterns in remote sensing image databases using neural networks

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1995-01-01

    We have investigated a method, based on a successful neural network multispectral image classification system, of searching for single patterns in remote sensing databases. While defining the pattern to search for and the feature to be used for that search (spectral, spatial, temporal, etc.) is challenging, a more difficult task is selecting competing patterns to train against the desired pattern. Schemes for competing pattern selection, including random selection and human interpreted selection, are discussed in the context of an example detection of dense urban areas in Landsat Thematic Mapper imagery. When applying the search to multiple images, a simple normalization method can alleviate the problem of inconsistent image calibration. Another potential problem, that of highly compressed data, was found to have a minimal effect on the ability to detect the desired pattern. The neural network algorithm has been implemented using the PVM (Parallel Virtual Machine) library and nearly-optimal speedups have been obtained that help alleviate the long process of searching through imagery.

  18. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  19. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    SciTech Connect

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-15

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb{sub 1/2}Mn{sub 1/2})O{sub 3} as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb{sub 2}O{sub 4}; and (3) ferroelectric semiconductors with formula M{sub 2}P{sub 2}(S,Se){sub 6}. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties. - Graphical abstract: Integration of first-principles methods with crystallographic database mining, for the discovery and design of novel ferroelectric materials, could potentially lead to new classes of multifunctional materials. Highlights: Black-Right-Pointing-Pointer Integration of first-principles methods and database mining. Black-Right-Pointing-Pointer Minor structural families with desirable functional properties. Black-Right-Pointing-Pointer Survey of polar entries in the Inorganic Crystal Structural Database.

  20. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  1. Allie: a database and a search service of abbreviations and long forms

    PubMed Central

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  2. Protein structure determination by exhaustive search of Protein Data Bank derived databases

    PubMed Central

    Stokes-Rees, Ian; Sliz, Piotr

    2010-01-01

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database. PMID:21098306

  3. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    PubMed

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database. PMID:21098306

  4. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  5. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.

    PubMed

    Blakeley, Paul; Overton, Ian M; Hubbard, Simon J

    2012-11-01

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  6. Current Comparative Table (CCT) automates customized searches of dynamic biological databases.

    PubMed

    Landsteiner, Benjamin R; Olson, Michael R; Rutherford, Robert

    2005-07-01

    The Current Comparative Table (CCT) software program enables working biologists to automate customized bioinformatics searches, typically of remote sequence or HMM (hidden Markov model) databases. CCT currently supports BLAST, hmmpfam and other programs useful for gene and ortholog identification. The software is web based, has a BioPerl core and can be used remotely via a browser or locally on Mac OS X or Linux machines. CCT is particularly useful to scientists who study large sets of molecules in today's evolving information landscape because it color-codes all result files by age and highlights even tiny changes in sequence or annotation. By empowering non-bioinformaticians to automate custom searches and examine current results in context at a glance, CCT allows a remote database submission in the evening to influence the next morning's bench experiment. A demonstration of CCT is available at http://orb.public.stolaf.edu/CCTdemo and the open source software is freely available from http://sourceforge.net/projects/orb-cct. PMID:15980582

  7. Analysis of the tryptic search space in UniProt databases

    PubMed Central

    Alpi, Emanuele; Griss, Johannes; da Silva, Alan Wilter Sousa; Bely, Benoit; Antunes, Ricardo; Zellner, Hermann; Ríos, Daniel; O'Donovan, Claire; Vizcaíno, Juan Antonio; Martin, Maria J

    2015-01-01

    In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes. PMID:25307260

  8. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  9. SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    PubMed

    Walsh, Thomas P; Webber, Caleb; Searle, Stephen; Sturrock, Shane S; Barton, Geoffrey J

    2008-07-01

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP. PMID:18503088

  10. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  11. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  12. Decision-making in familial database searching: KI alone or not alone?

    PubMed

    Balding, David J; Krawczak, Michael; Buckleton, John S; Curran, James M

    2013-01-01

    We consider the comparison of hypotheses "parent-child" or "full siblings" against the alternative of "unrelated" for pairs of individuals for whom DNA profiles are available. This is a situation that occurs repeatedly in familial database searching. A decision rule that uses both the kinship index (KI), also known as the likelihood ratio, and the identity-by-state statistic (IBS) was advocated in a recent report as superior to the use of KI alone. Such proposal appears to conflict with the Neyman-Pearson Lemma of statistics, which states that the likelihood ratio alone provides the most powerful criterion for distinguishing between any two simple hypotheses. We therefore performed a simulation study that was two orders of magnitude larger than in the previous report, and our results corroborate the theoretical expectation that KI alone provides a better decision rule than KI combined with IBS. PMID:22749791

  13. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis. PMID:27155864

  14. [Evidence-based clinical practice. Part II--Searching evidence databases].

    PubMed

    Bernardo, Wanderley Marques; Nobre, Moacyr Roberto Cuce; Jatene, Fábio Biscegli

    2004-01-01

    The inadequacy of most of traditional sources for medical information, like textbook and review article, do not sustained the clinical decision based on the best evidence current available, exposing the patient to a unnecessary risk. Although not integrated around clinical problem areas in the convenient way of textbooks, current best evidence from specific studies of clinical problems can be found in an increasing number of Internet and electronic databases. The sources that have already undergone rigorous critical appraisal are classified as secondary information sources, others that provide access to original article or abstract, as primary information source, where the quality assessment of the article rely on the clinician oneself . The most useful primary information source are SciELO, the online collection of Brazilian scientific journals, and Medline, the most comprehensive database of the USA National Library of Medicine, where the search may start with use of keywords, that were obtained at the structured answer construction (P.I.C.O.), with the addition of boolean operators "AND", "OR", "NOT". Between the secondary information sources, some of them provide critically appraised articles, like ACP Journal Club, Evidence Based Medicine and InfoPOEMs, others provide evidences organized as online texts, such as "Clinical Evidence" and "UpToDate", and finally, Cochrane Library are composed by systematic reviews of randomized controlled trials. To get studies that could answer the clinical question is part of a mindful practice, that is, becoming quicker and quicker and dynamic with the use of PDAs, Palmtops and Notebooks. PMID:15253037

  15. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  16. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  17. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  18. Introducing a New Interface for the Online MagIC Database by Integrating Data Uploading, Searching, and Visualization

    NASA Astrophysics Data System (ADS)

    Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.

    2013-12-01

    The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.

  19. Trichinella spiralis: genome database searches for the presence and immunolocalization of protein disulphide isomerase family members.

    PubMed

    Freitas, C P; Clemente, I; Mendes, T; Novo, C

    2016-01-01

    The formation of nurse cells in host muscle cells during Trichinella spiralis infection is a key step in the infective mechanism. Collagen trimerization is set up via disulphide bond formation, catalysed by protein disulphide isomerase (PDI). In T. spiralis, some PDI family members have been identified but no localization is described and no antibodies specific for T. spiralis PDIs are available. In this work, computational approaches were used to search for non-described PDIs in the T. spiralis genome database and to check the cross-reactivity of commercial anti-human antibodies with T. spiralis orthologues. In addition to a previously described PDI (PDIA2), endoplasmic reticulum protein (ERp57/PDIA3), ERp72/PDIA4, and the molecular chaperones calreticulin (CRT), calnexin (CNX) and immunoglobulin-binding protein/glucose-regulated protein (BIP/GRP78), we identified orthologues of the human thioredoxin-related-transmembrane proteins (TMX1, TMX2 and TMX3) in the genome protein database, as well as ERp44 (PDIA10) and endoplasmic reticulum disulphide reductase (ERdj5/PDIA19). Immunocytochemical staining of paraffin sections of muscle infected by T. spiralis enabled us to localize some orthologues of the human PDIs (PDIA3 and TMX1) and the chaperone GRP78. A theoretical three-dimensional model for T. spiralis PDIA3 was constructed. The localization and characteristics of the predicted linear B-cell epitopes and amino acid sequence of the immunogens used for commercial production of anti-human PDIA3 antibodies validated the use of these antibodies for the immunolocalization of T. spiralis PDIA3 orthologues. These results suggest that further study of the role of the PDIs and chaperones during nurse cell formation is desirable. PMID:25475092

  20. Elimination of Duplicate Citations from Cross Database Searching Using an "Intelligent" Terminal to Produce Report Style Searches.

    ERIC Educational Resources Information Center

    Riley, Connie; And Others

    1981-01-01

    The Tarrytown Technical Information Center at General Foods produces report style searches by using upgraded equipment which allows search strategies to be stored and edited offline, thus reducing costs for both online searching and for the searcher's time. A computer program for eliminating duplicate bibliographic citations is included. (RBF)

  1. Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences.

    PubMed

    Stephens, Susie M; Chen, Jake Y; Davidson, Marcel G; Thomas, Shiby; Trute, Barry M

    2005-01-01

    As database management systems expand their array of analytical functionality, they become powerful research engines for biomedical data analysis and drug discovery. Databases can hold most of the data types commonly required in life sciences and consequently can be used as flexible platforms for the implementation of knowledgebases. Performing data analysis in the database simplifies data management by minimizing the movement of data from disks to memory, allowing pre-filtering and post-processing of datasets, and enabling data to remain in a secure, highly available environment. This article describes the Oracle Database 10g implementation of BLAST and Regular Expression Searches and provides case studies of their usage in bioinformatics. http://www.oracle.com/technology/software/index.html. PMID:15608287

  2. Searching biosignal databases by content and context: Research Oriented Integration System for ECG Signals (ROISES).

    PubMed

    Kokkinaki, Alexandra; Chouvarda, Ioanna; Maglaveras, Nicos

    2012-11-01

    Technological advances in textile, biosensor and electrocardiography domain induced the wide spread use of bio-signal acquisition devices leading to the generation of massive bio-signal datasets. Among the most popular bio-signals, electrocardiogram (ECG) possesses the longest tradition in bio-signal monitoring and recording, being a strong and relatively robust signal. As research resources are fostered, research community promotes the need to extract new knowledge from bio-signals towards the adoption of new medical procedures. However, integrated access, query and management of ECGs are impeded by the diversity and heterogeneity of bio-signal storage data formats. In this scope, the proposed work introduces a new methodology for the unified access to bio-signal databases and the accompanying metadata. It allows decoupling information retrieval from actual underlying datasource structures and enables transparent content and context based searching from multiple data resources. Our approach is based on the definition of an interactive global ontology which manipulates the similarities and the differences of the underlying sources to either establish similarity mappings or enrich its terminological structure. We also introduce ROISES (Research Oriented Integration System for ECG Signals), for the definition of complex content based queries against the diverse bio-signal data sources. PMID:21397354

  3. Preparing College Students To Search Full-Text Databases: Is Instruction Necessary?

    ERIC Educational Resources Information Center

    Riley, Cheryl; Wales, Barbara

    Full-text databases allow Central Missouri State University's clients to access some of the serials that libraries have had to cancel due to escalating subscription costs; EbscoHost, the subject of this study, is one such database. The database is available free to all Missouri residents. A survey was designed consisting of 21 questions intended…

  4. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling

    PubMed Central

    Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivation: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Availability and Implementation: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/. Contact: chengji@missouri.edu Supplementary information: Supplementary data are

  5. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    PubMed

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed. PMID:17482856

  6. User support for a library-managed online database search service: the BMA Library free MEDLINE service.

    PubMed

    Rowlands, J; Yeadon, J; Forrester, W; McSeán, T

    1997-07-01

    This paper discusses user support in the context of a library-managed online database search service. Experience is drawn from the British Medical Association (BMA) Library's Free MEDLINE Service. More than 9,600 BMA members, who are largely unfamiliar with computer communications and database searching, have registered as users of the service. User support has played a significant role in the development of the service and has comprised four main aspects: an information pack, a help desk, online help, and MEDLINE courses. The paper includes an analysis of help desk usage statistics collected from January 1996 through June 1996, and highlights other relevant research. Plans for further service enhancements and their implications in terms of future user support are discussed. PMID:9285124

  7. Methods and pitfalls in searching drug safety databases utilising the Medical Dictionary for Regulatory Activities (MedDRA).

    PubMed

    Brown, Elliot G

    2003-01-01

    The Medical Dictionary for Regulatory Activities (MedDRA) is a unified standard terminology for recording and reporting adverse drug event data. Its introduction is widely seen as a significant improvement on the previous situation, where a multitude of terminologies of widely varying scope and quality were in use. However, there are some complexities that may cause difficulties, and these will form the focus for this paper. Two methods of searching MedDRA-coded databases are described: searching based on term selection from all of MedDRA and searching based on terms in the safety database. There are several potential traps for the unwary in safety searches. There may be multiple locations of relevant terms within a system organ class (SOC) and lack of recognition of appropriate group terms; the user may think that group terms are more inclusive than is the case. MedDRA may distribute terms relevant to one medical condition across several primary SOCs. If the database supports the MedDRA model, it is possible to perform multiaxial searching: while this may help find terms that might have been missed, it is still necessary to consider the entire contents of the SOCs to find all relevant terms and there are many instances of incomplete secondary linkages. It is important to adjust for multiaxiality if data are presented using primary and secondary locations. Other sources for errors in searching are non-intuitive placement and the selection of terms as preferred terms (PTs) that may not be widely recognised. Some MedDRA rules could also result in errors in data retrieval if the individual is unaware of these: in particular, the lack of multiaxial linkages for the Investigations SOC, Social circumstances SOC and Surgical and medical procedures SOC and the requirement that a PT may only be present under one High Level Term (HLT) and one High Level Group Term (HLGT) within any single SOC. Special Search Categories (collections of PTs assembled from various SOCs by

  8. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks.

    PubMed

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  9. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

    PubMed Central

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  10. Probabilistic Risk Assessment: A Bibliography

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Probabilistic risk analysis is an integration of failure modes and effects analysis (FMEA), fault tree analysis and other techniques to assess the potential for failure and to find ways to reduce risk. This bibliography references 160 documents in the NASA STI Database that contain the major concepts, probabilistic risk assessment, risk and probability theory, in the basic index or major subject terms, An abstract is included with most citations, followed by the applicable subject terms.

  11. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  12. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  13. In search of a statistical probability model for petroleum-resource assessment : a critique of the probabilistic significance of certain concepts and methods used in petroleum-resource assessment : to that end, a probabilistic model is sketched

    USGS Publications Warehouse

    Grossling, Bernardo F.

    1975-01-01

    Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then

  14. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    SciTech Connect

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H; Uberbacher, Edward C; Leuze, Michael Rex

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  15. Lead generation using pharmacophore mapping and three-dimensional database searching: application to muscarinic M(3) receptor antagonists.

    PubMed

    Marriott, D P; Dougall, I G; Meghani, P; Liu, Y J; Flower, D R

    1999-08-26

    By using a pharmacophore model, a geometrical representation of the features necessary for molecules to show a particular biological activity, it is possible to search databases containing the 3D structures of molecules and identify novel compounds which may possess this activity. We describe our experiences of establishing a working 3D database system and its use in rational drug design. By using muscarinic M(3) receptor antagonists as an example, we show that it is possible to identify potent novel lead compounds using this approach. Pharmacophore generation based on the structures of known M(3) receptor antagonists, 3D database searching, and medium-throughput screening were used to identify candidate compounds. Three compounds were chosen to define the pharmacophore: a lung-selective M(3) antagonist patented by Pfizer and two Astra compounds which show affinity at the M(3) receptor. From these, a pharmacophore model was generated, using the program DISCO, and this was used subsequently to search a UNITY 3D database of proprietary compounds; 172 compounds were found to fit the pharmacophore. These compounds were then screened, and 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone (pA(2) 6.67) was identified as the best hit, with N-[2-(piperidin-1-ylmethyl)cycohexyl]-2-propoxybenz amide (pA(2) 4. 83) and phenylcarbamic acid 2-(morpholin-4-ylmethyl)cyclohexyl ester (pA(2) 5.54) demonstrating lower activity. As well as its potency, 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone is a simple structure with limited similarity to existing M(3) receptor antagonists. PMID:10464008

  16. High-performance hardware implementation of a parallel database search engine for real-time peptide mass fingerprinting

    PubMed Central

    Bogdán, István A.; Rivers, Jenny; Beynon, Robert J.; Coca, Daniel

    2008-01-01

    Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a ‘fingerprint’ that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution. Contact: d.coca@sheffield.ac.uk PMID:18453553

  17. Automated Assistance in the Formulation of Search Statements for Bibliographic Databases.

    ERIC Educational Resources Information Center

    Oakes, Michael P.; Taylor, Malcolm J.

    1998-01-01

    Reports on the design of an automated query system to help pharmacologists access the Derwent Drug File (DDF). Topics include knowledge types; knowledge representation; role of the search intermediary; vocabulary selection, thesaurus, and user input in natural language; browsing; evaluation methods; and search statement generation for the World…

  18. Boolean Logic: An Aid for Searching Computer Databases in Special Education and Rehabilitation.

    ERIC Educational Resources Information Center

    Summers, Edward G.

    1989-01-01

    The article discusses using Boolean logic as a tool for searching computerized information retrieval systems in special education and rehabilitation technology. It includes discussion of the Boolean search operators AND, OR, and NOT; Venn diagrams; and disambiguating parentheses. Six suggestions are offered for development of good Boolean logic…

  19. An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.

    PubMed

    Guturu, Parthasarathy; Dantu, Ram

    2008-06-01

    Many graph- and set-theoretic problems, because of their tremendous application potential and theoretical appeal, have been well investigated by the researchers in complexity theory and were found to be NP-hard. Since the combinatorial complexity of these problems does not permit exhaustive searches for optimal solutions, only near-optimal solutions can be explored using either various problem-specific heuristic strategies or metaheuristic global-optimization methods, such as simulated annealing, genetic algorithms, etc. In this paper, we propose a unified evolutionary algorithm (EA) to the problems of maximum clique finding, maximum independent set, minimum vertex cover, subgraph and double subgraph isomorphism, set packing, set partitioning, and set cover. In the proposed approach, we first map these problems onto the maximum clique-finding problem (MCP), which is later solved using an evolutionary strategy. The proposed impatient EA with probabilistic tabu search (IEA-PTS) for the MCP integrates the best features of earlier successful approaches with a number of new heuristics that we developed to yield a performance that advances the state of the art in EAs for the exploration of the maximum cliques in a graph. Results of experimentation with the 37 DIMACS benchmark graphs and comparative analyses with six state-of-the-art algorithms, including two from the smaller EA community and four from the larger metaheuristics community, indicate that the IEA-PTS outperforms the EAs with respect to a Pareto-lexicographic ranking criterion and offers competitive performance on some graph instances when individually compared to the other heuristic algorithms. It has also successfully set a new benchmark on one graph instance. On another benchmark suite called Benchmarks with Hidden Optimal Solutions, IEA-PTS ranks second, after a very recent algorithm called COVER, among its peers that have experimented with this suite. PMID:18558530

  20. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.; Genevey, A.; Delaney, R.; Baker, P.; Sbarbori, E.

    2005-12-01

    The Magnetics Information Consortium (MagIC) operates an online relational database including both rock and paleomagnetic data. The goal of MagIC is to store all measurements and their derived properties for studies of paleomagnetic directions (inclination, declination) and their intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. These nodes provide basic search capabilities based on location, reference, methods applied, material type and geological age, while allowing the user to drill down from sites all the way to the measurements. At each stage, the data can be saved and, if the available data supports it, the data can be visualized by plotting equal area plots, VGP location maps or typical Zijderveld, hysteresis, FORC, and various magnetization and remanence diagrams. All plots are made in SVG (scalable vector graphics) and thus can be saved and easily read into the user's favorite graphics programs without loss of resolution. User contributions to the MagIC database are critical to achieve a useful research tool. We have developed a standard data and metadata template (version 1.6) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate easy population of these templates within Microsoft Excel. These tools allow for the import/export of text files and they provide advanced functionality to manage/edit the data, and to perform various internal checks to high grade the data and to make them ready for uploading. The uploading is all done online by using the MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm that takes only a few minutes to process a contribution of approximately 5,000 data records. After uploading these standardized MagIC template files will be stored in the

  1. Combining history of medicine and library instruction: an innovative approach to teaching database searching to medical students.

    PubMed

    Timm, Donna F; Jones, Dee; Woodson, Deidra; Cyrus, John W

    2012-01-01

    Library faculty members at the Health Sciences Library at the LSU Health Shreveport campus offer a database searching class for third-year medical students during their surgery rotation. For a number of years, students completed "ten-minute clinical challenges," but the instructors decided to replace the clinical challenges with innovative exercises using The Edwin Smith Surgical Papyrus to emphasize concepts learned. The Surgical Papyrus is an online resource that is part of the National Library of Medicine's "Turning the Pages" digital initiative. In addition, vintage surgical instruments and historic books are displayed in the classroom to enhance the learning experience. PMID:22853300

  2. Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets.

    PubMed

    Blanco, Luca; Mead, Jennifer A; Bessant, Conrad

    2009-04-01

    Decoy database searches are used to filter out false positive protein identifications derived from search engines, but there is no consensus about which decoy is "the best". We evaluate nine different decoy designs using public data sets from samples of known composition. Statistically significant performance differences were found, but no single decoy stood out among the best performers. Ultimately, we recommend peptide level reverse decoys searched independently from the target. PMID:19714810

  3. The Magnetics Information Consortium (MagIC) Online Database: Uploading, Searching and Visualizing Paleomagnetic and Rock Magnetic Data

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Pisarevsky, S. A.; Jackson, M.; Solheid, P.; Banerjee, S.; Johnson, C.

    2006-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all measurements and the derived properties for studies of paleomagnetic directions (inclination, declination) and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and has two search nodes, one for paleomagnetism and one for rock magnetism. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. The query result set is displayed in a digestible tabular format allowing the user to descend through hierarchical levels such as from locations to sites, samples, specimens, and measurements. At each stage, the result set can be saved and, if supported by the data, can be visualized by plotting global location maps, equal area plots, or typical Zijderveld, hysteresis, and various magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (Version 2.1) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload and takes only a few minutes to process several thousand data records. The standardized MagIC template files are stored in the digital archives of EarthRef.org where they

  4. Searching the Cambridge Structural Database for the 'best' representative of each unique polymorph.

    PubMed

    van de Streek, Jacco

    2006-08-01

    A computer program has been written that removes suspicious crystal structures from the Cambridge Structural Database and clusters the remaining crystal structures as polymorphs or redeterminations. For every set of redeterminations, one crystal structure is selected to be the best representative of that polymorph. The results, 243,355 well determined crystal structures grouped by unique polymorph, are presented and analysed. PMID:16840806

  5. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  6. Searching Reference Databases: What Students Experience and What Teachers Believe that Students Experience

    ERIC Educational Resources Information Center

    Avdic, Anders; Eklund, Anders

    2010-01-01

    The Internet has made it possible for students to access a vast amount of high-quality references when writing papers. Yet research has shown that the use of reference databases is poor and the quality of student papers is consequently often below expectation. The objective of this article is twofold. First, it aims to describe the problems…

  7. Exchange, interpretation, and database-search of ion mobility spectra supported by data format JCAMP-DX

    NASA Technical Reports Server (NTRS)

    Baumback, J. I.; Davies, A. N.; Vonirmer, A.; Lampen, P. H.

    1995-01-01

    To assist peak assignment in ion mobility spectrometry it is important to have quality reference data. The reference collection should be stored in a database system which is capable of being searched using spectral or substance information. We propose to build such a database customized for ion mobility spectra. To start off with it is important to quickly reach a critical mass of data in the collection. We wish to obtain as many spectra combined with their IMS parameters as possible. Spectra suppliers will be rewarded for their participation with access to the database. To make the data exchange between users and system administration possible, it is important to define a file format specially made for the requirements of ion mobility spectra. The format should be computer readable and flexible enough for extensive comments to be included. In this document we propose a data exchange format, and we would like you to give comments on it. For the international data exchange it is important, to have a standard data exchange format. We propose to base the definition of this format on the JCAMP-DX protocol, which was developed for the exchange of infrared spectra. This standard made by the Joint Committee on Atomic and Molecular Physical Data is of a flexible design. The aim of this paper is to adopt JCAMP-DX to the special requirements of ion mobility spectra.

  8. Efficient HPLC method development using structure-based database search, physico-chemical prediction and chromatographic simulation.

    PubMed

    Wang, Lin; Zheng, Jinjian; Gong, Xiaoyi; Hartman, Robert; Antonucci, Vincent

    2015-02-01

    Development of a robust HPLC method for pharmaceutical analysis can be very challenging and time-consuming. In our laboratory, we have developed a new workflow leveraging ACD/Labs software tools to improve the performance of HPLC method development. First, we established ACD-based analytical method databases that can be searched by chemical structure similarity. By taking advantage of the existing knowledge of HPLC methods archived in the databases, one can find a good starting point for HPLC method development, or even reuse an existing method as is for a new project. Second, we used the software to predict compound physicochemical properties before running actual experiments to help select appropriate method conditions for targeted screening experiments. Finally, after selecting stationary and mobile phases, we used modeling software to simulate chromatographic separations for optimized temperature and gradient program. The optimized new method was then uploaded to internal databases as knowledge available to assist future method development efforts. Routine implementation of such standardized workflows has the potential to reduce the number of experiments required for method development and facilitate systematic and efficient development of faster, greener and more robust methods leading to greater productivity. In this article, we used Loratadine method development as an example to demonstrate efficient method development using this new workflow. PMID:25481084

  9. A Pseudo MS3 Approach for Identification of Disulfide-Bonded Proteins: Uncommon Product Ions and Database Search

    NASA Astrophysics Data System (ADS)

    Chen, Jianzhong; Shiyanov, Pavel; Schlager, John J.; Green, Kari B.

    2012-02-01

    It has previously been reported that disulfide and backbone bonds of native intact proteins can be concurrently cleaved using electrospray ionization (ESI) and collision-induced dissociation (CID) tandem mass spectrometry (MS/MS). However, the cleavages of disulfide bonds result in different cysteine modifications in product ions, making it difficult to identify the disulfide-bonded proteins via database search. To solve this identification problem, we have developed a pseudo MS3 approach by combining nozzle-skimmer dissociation (NSD) and CID on a quadrupole time-of-flight (Q-TOF) mass spectrometer using chicken lysozyme as a model. Although many of the product ions were similar to those typically seen in MS/MS spectra of enzymatically derived peptides, additional uncommon product ions were detected including ci-1 ions (the ith residue being aspartic acid, arginine, lysine and dehydroalanine) as well as those from a scrambled sequence. The formation of these uncommon types of product ions, likely caused by the lack of mobile protons, were proposed to involve bond rearrangements via a six-membered ring transition state and/or salt bridge(s). A search of 20 pseudo MS3 spectra against the Gallus gallus (chicken) database using Batch-Tag, a program originally designed for bottom up MS/MS analysis, identified chicken lysozyme as the only hit with the expectation values less than 0.02 for 12 of the spectra. The pseudo MS3 approach may help to identify disulfide-bonded proteins and determine the associated post-translational modifications (PTMs); the confidence in the identification may be improved by incorporating the fragmentation characteristics into currently available search programs.

  10. Meteor shower search in the CMN and SonotaCo orbital databases

    NASA Astrophysics Data System (ADS)

    Šegon, Damir; Gural, Peter; Andreić, Željko; Vida, Denis; Skokić, Ivica; Korlević, Korado; Novoselnik, Filip

    2014-01-01

    The following article is a summarized version of a paper published for the Meteoroids 2013 Conference on the topics of meteoroid-stream parent-body search and new stream discovery in which further details and published findings can be obtained (Šegon et al.; 2014).

  11. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  12. Selecting Telecommunications Hardware for a School Library's Online Database Searching Program.

    ERIC Educational Resources Information Center

    La Faille, Eugene

    1988-01-01

    Discussion of criteria for use in evaluating telecommunications hardware for a school library's online searching program focuses on modem selection. The differences between internal and external modems are described, and a prioritized checklist of recommended features for external modems is presented. (CLB)

  13. Computer, System, and Subject Knowledge in Novice Searching of a Full-Text, Multifile Database.

    ERIC Educational Resources Information Center

    Jacobson, Thomas; Fusani, David

    1992-01-01

    This study examined the experiences of 59 novice end users with a multifile, full-text information retrieval system. A regression model was developed of the relative contributions of computer, system, and subject knowledge to search success as measured by user judgments of the relevance of retrieved documents. Results indicated all three variables…

  14. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  15. A Search for Gamma-ray Burst Subgroups in the SWIFT and RHESSI Databases

    SciTech Connect

    Ripa, Jakub; Huja, David; Meszaros, Attila; Hudec, Rene; Hajdas, Wojtek; Wigger, Claudia

    2008-10-22

    A sample of 286 gamma-ray bursts (GRBs) detected by the Swift satellite and 358 GRBs detected by the RHESSI satellite are studied statistically. Previously published articles, based on the BATSE GRB Catalog, claimed the existence of an intermediate subgroup of GRBs with respect to duration. We use the statistical {chi}{sup 2} test and the F-test to compare the number of GRB subgroups in our databases with the earlier BATSE results. Similarly to the BATSE database, the short and long subgroups are well detected in the Swift and RHESSI data. However, contrary to the BATSE data, we have not found a statistically significant intermediate subgroup in either Swift or RHESSI data.

  16. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases

    PubMed Central

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-01-01

    Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within

  17. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  18. RAId_DbS: Method for Peptide ID using Database Search with Accurate Statistics

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Ogurtsov, Aleksey; Yu, Yi-Kuo

    2007-03-01

    The key to proteomics studies, essential in systems biology, is peptide identification. Under tandem mass spectrometry, each spectrum generated consists of a list of mass/charge peaks along with their intensities. Software analysis is then required to identify from the spectrum peptide candidates that best interpret the spectrum. The library search, which compares the spectral peaks against theoretical peaks generated by each peptide in a library, is among the most popular methods. This method, although robust, lacks good quantitative statistical underpinning. As we show, many library search algorithms suffer from statistical instability. The need for a better statistical basis prompted us to develop RAId_DbS. Taking into account the skewness in the peak intensity distribution while scoring peptides, RAId_DbS provides an accurate statistical significance assignment to each peptide candidate. RAId_DbS will be a valuable tool especially when one intends to identify proteins through peptide identifications.

  19. Recovery actions in PRA (probabilistic risk assessment) for the Risk Methods Integration and Evaluation Program (RMIEP): Volume 1, Development of the data-based method

    SciTech Connect

    Weston, L M; Whitehead, D W; Graves, N L

    1987-06-01

    In a probabilistic risk assessment (PRA) for a nuclear power plant, the analyst identifies a set of potential core damage events consisting of equipment failures and human errors and their estimated probabilities of occurrence. If operator recovery from an event within some specified time is considered, then the probability of this recovery can be included in the PRA. This report provides PRA analysts with an improved methodology for including recovery actions in a PRA. A recovery action can be divided into two distinct phases: a Diagnosis Phase (realizing that there is a problem with a critical parameter and deciding upon the correct course of action) and an Action Phase (physically accomplishing the required action). In this methodology, simulator data are used to estimate recovery probabilities for the diagnosis phase. Different time-reliability curves showing the probability of failure of diagnosis as a function of time from the compelling cue for the event are presented. These curves are based on simulator exercises, and the actions are grouped based upon their operational similarities. This is an improvement over existing diagnosis models that rely greatly upon subjective judgment to obtain such estimates. The action phase is modeled using estimates from available sources. The methodology also includes a recommendation on where and when to apply the recovery action in the PRA process.

  20. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ. PMID:26286455

  1. Pivotal Role of Computers and Software in Mass Spectrometry - SEQUEST and 20 Years of Tandem MS Database Searching

    NASA Astrophysics Data System (ADS)

    Yates, John R.

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures.

  2. FTP-Server for exchange, interpretation, and database-search of ion mobility spectra, literature, preprints and software

    NASA Technical Reports Server (NTRS)

    Baumbach, J. I.; Vonirmer, A.

    1995-01-01

    To assist current discussion in the field of ion mobility spectrometry, at the Institut fur Spectrochemie und angewandte Spektroskopie, Dortmund, start with 4th of December, 1994 work of an FTP-Server, available for all research groups at univerisities, institutes and research worker in industry. We support the exchange, interpretation, and database-search of ion mobility spectra through data format JCAMP-DS (Joint Committee on Atomic and Molecular Physical Data) as well as literature retrieval, pre-print, notice, and discussion board. We describe in general lines the entrance conditions, local addresses, and main code words. For further details, a monthly news report will be prepared for all common users. Internet email address for subscribing is included in document.

  3. An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics

    PubMed Central

    Chick, Joel M.; Kolippakkam, Deepak; Nusinow, David P.; Zhai, Bo; Rad, Ramin; Huttlin, Edward L.; Gygi, Steven P.

    2015-01-01

    Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ±500 Da. In a proteome-wide dataset on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation, and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies often >90%. We conclude that at least one third of unassigned spectra arise from peptides with substoichiometric modifications. PMID:26076430

  4. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  5. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  6. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  7. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  8. Chemical and biological warfare: General studies. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  9. Chemical and biological warfare: Detection and warning systems. (Latest citations from the NTIS database). Published Search

    SciTech Connect

    Not Available

    1993-03-01

    The bibliography contains citations concerning the design and testing of samplers and detectors to provide identification and warning of the presence of chemical and biological agents used in military operations. The sampling techniques are applicable to air and water testing, and evaluation of personnel and equipment exposure. Techniques involve enzyme alarms, chromotography, conductivity meters, spectrophotometry, luminescence, and solid state microsensor devices. Other Published Searches in this series on chemical warfare cover protection, defoliants, general studies, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  10. The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search*

    PubMed Central

    Kim, Sangtae; Mischerikow, Nikolai; Bandeira, Nuno; Navarro, J. Daniel; Wich, Louis; Mohammed, Shabaz; Heck, Albert J. R.; Pevzner, Pavel A.

    2010-01-01

    Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches. PMID:20829449

  11. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing. PMID:23856366

  12. Utility of rapid database searching for quality assurance: 'detective work' in uncovering radiology coding and billing errors

    NASA Astrophysics Data System (ADS)

    Horii, Steven C.; Kim, Woojin; Boonn, William; Iyoob, Christopher; Maston, Keith; Coleman, Beverly G.

    2011-03-01

    When the first quarter of 2010 Department of Radiology statistics were provided to the Section Chiefs, the authors (SH, BC) were alarmed to discover that Ultrasound showed a decrease of 2.5 percent in billed examinations. This seemed to be in direct contradistinction to the experience of the ultrasound faculty members and sonographers. Their experience was that they were far busier than during the same quarter of 2009. The one exception that all acknowledged was the month of February, 2010 when several major winter storms resulted in a much decreased Hospital admission and Emergency Department visit rate. Since these statistics in part help establish priorities for capital budget items, professional and technical staffing levels, and levels of incentive salary, they are taken very seriously. The availability of a desktop, Web-based RIS database search tool developed by two of the authors (WK, WB) and built-in database functions of the ultrasound miniPACS, made it possible for us very rapidly to develop and test hypotheses for why the number of billable examinations was declining in the face of what experience told the authors was an increasing number of examinations being performed. Within a short time, we identified the major cause as errors on the part of the company retained to verify billable Current Procedural Terminology (CPT) codes against ultrasound reports. This information is being used going forward to recover unbilled examinations and take measures to reduce or eliminate the types of coding errors that resulted in the problem.

  13. Generalized method for probability-based peptide and protein identification from tandem mass spectrometry data and sequence database searching.

    PubMed

    Ramos-Fernández, Antonio; Paradela, Alberto; Navajas, Rosana; Albar, Juan Pablo

    2008-09-01

    Tandem mass spectrometry-based proteomics is currently in great demand of computational methods that facilitate the elimination of likely false positives in peptide and protein identification. In the last few years, a number of new peptide identification programs have been described, but scores or other significance measures reported by these programs cannot always be directly translated into an easy to interpret error rate measurement such as the false discovery rate. In this work we used generalized lambda distributions to model frequency distributions of database search scores computed by MASCOT, X!TANDEM with k-score plug-in, OMSSA, and InsPecT. From these distributions, we could successfully estimate p values and false discovery rates with high accuracy. From the set of peptide assignments reported by any of these engines, we also defined a generic protein scoring scheme that enabled accurate estimation of protein-level p values by simulation of random score distributions that was also found to yield good estimates of protein-level false discovery rate. The performance of these methods was evaluated by searching four freely available data sets ranging from 40,000 to 285,000 MS/MS spectra. PMID:18515861

  14. A search for streams and associations in meteor databases. Method of Indices

    NASA Astrophysics Data System (ADS)

    Svoreň, J.; Neslušan, L.; Porubčan, V.

    2000-08-01

    A new method of searching for minor meteor streams and associations is presented and discussed. The procedure, based only on mathematical statistics, enables a parallel separation of major and minor streams or associations. The approach utilizes a division of the ranges of examined parameters into equidistant intervals. The method is tested on the IAU Meteor Data Center Lund catalogue of precise photographic orbits representing the most extensive set of photographic meteor orbits. Besides the five orbital elements incorporated in the Southworth-Hawkins D-criterion, we have also included in the procedure the coordinates of the radiant which belong to the most accurately known parameters and the geocentric velocity as a significant parameter characteristic for physically related orbits. The basic idea of the procedure is a division of the observed ranges of parameters into a number of equidistant intervals and assignment of indices to a meteor according to the intervals pertinent to its parameters. The meteors with equal indices are regarded as mutually related. Since various parameters listed in the catalogue contain various relative errors, it is necessary to use several intervals in the division of each parameter to obtain a good fit with the real orbital distribution. The relative ratios, approximated by small integers, corresponding to the reciprocal values of the relative errors, were applied as the basic numbers for the division of the parameters. To test the quality of this method, the first step presented in this paper is aimed at wider intervals providing a less detailed classification (a smaller branching). In this step all the major streams (except of the northern branch of δ-Aquarids) were identified, confirming the efficiency of the procedure. After combining the related groups, 16 streams were identified. The search program also identifies widely spread Taurids. There are separated orbits pertinent to some minor streams such as the o-Draconids, κ

  15. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    PubMed Central

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  16. The search for planets around CM Draconis: Analysis of the 1994--96 database

    NASA Astrophysics Data System (ADS)

    Martin, E. L.; Deeg, H.; TEP Collaboration

    We present results of the international collaboration ``TEP'' aimed at monitoring photometrically the eclipsing binary CM Draconis in search for circumbinary planets. Planetary companions with more than ~2 earth radii should produce detectable occultations at the accuracy of our measurements. The observations started in spring 1994 and are still continuing with the participation of the following observatories: Haute Provence (France), Kouvovka (Russia), Lick (USA), La Palma (Spain), Rochester (USA), Skinakas (Greece), Taejon (Korea), Teide (Spain) and Wise (Israel). We have obtained a homogeneous dataset based on a common set of reference stars. So far, we have obtained some 1,000 hours of effective integration time on CM Dra. The analysis of the 1994-96 observations is complete, and list of about 20 events in the lightcurve has been obtained. These are being evaluated for their compatibility with possible planetary transits. Observations will be taken in 1997 again, to determine periodicities among these events.

  17. A large database DNA sequence handling program with generalized searching specifications.

    PubMed

    Stockwell, P A

    1982-01-11

    The program described allows for the creation and manipulation of files of DNA sequence data up to very great lengths. The program uses its own paging system to load segments of the sequence into a small internal buffer so that the program does not have excessive memory requirements. The program offers a menu of functions to the user, and has been written to be forgiving of user errors. A code for the generalised specification of bases as a series of groups (i.e. A or T, Purine, etc.) has been devised and can be used in search specifications or in sequence files. Versions of the program have been developed to run with special efficiency under DIGITAL's RT11 operating system or to run under systems with a suitable implementation of FORTRAN VI. PMID:7063398

  18. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS database). Published Search

    SciTech Connect

    Not Available

    1993-04-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents. (Contains 250 citations and includes a subject term index and title list.)

  19. ANDY: A general, fault-tolerant tool for database searching oncomputer clusters

    SciTech Connect

    Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Summary: ANDY (seArch coordination aND analYsis) is a set ofPerl programs and modules for distributing large biological databasesearches, and in general any sequence of commands, across the nodes of aLinux computer cluster. ANDY is compatible with several commonly usedDistributed Resource Management (DRM) systems, and it can be easilyextended to new DRMs. A distinctive feature of ANDY is the choice ofeither dedicated or fair-use operation: ANDY is almost as efficient assingle-purpose tools that require a dedicated cluster, but it runs on ageneral-purpose cluster along with any other jobs scheduled by a DRM.Other features include communication through named pipes for performance,flexible customizable routines for error-checking and summarizingresults, and multiple fault-tolerance mechanisms. Availability: ANDY isfreely available and may be obtained fromhttp://compbio.berkeley.edu/proj/andy; this site also containssupplemental data and figures and amore detailed overview of thesoftware.

  20. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. PMID:26339591

  1. Uploading, Searching and Visualizing of Paleomagnetic and Rock Magnetic Data in the Online MagIC Database

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Donadini, F.

    2007-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all available measurements and derived properties from paleomagnetic studies of directions and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and will soon implement two search nodes, one for paleomagnetism and one for rock magnetism. Currently the PMAG node is operational. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. Users can also browse the database by data type or by data compilation to view all contributions associated with well known earlier collections like PINT, GMPDB or PSVRL. The query result set is displayed in a digestible tabular format allowing the user to descend from locations to sites, samples, specimens and measurements. At each stage, the result set can be saved and, where appropriate, can be visualized by plotting global location maps, equal area, XY, age, and depth plots, or typical Zijderveld, hysteresis, magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (version 2.3) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload

  2. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  3. PhenoMeter: A Metabolome Database Search Tool Using Statistical Similarity Matching of Metabolic Phenotypes for High-Confidence Detection of Functional Links

    PubMed Central

    Carroll, Adam J.; Zhang, Peng; Whitehead, Lynne; Kaines, Sarah; Tcherkez, Guillaume; Badger, Murray R.

    2015-01-01

    This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher’s Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher’s Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php). PMID:26284240

  4. Exploring Site-Specific N-Glycosylation Microheterogeneity of Haptoglobin using Glycopeptide CID Tandem Mass Spectra and Glycan Database Search

    PubMed Central

    Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav

    2013-01-01

    Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something which detached N-glycan and de-glycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy which takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false-discovery-rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at false-discovery-rate 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at false-discovery-rate 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http

  5. Exploring site-specific N-glycosylation microheterogeneity of haptoglobin using glycopeptide CID tandem mass spectra and glycan database search.

    PubMed

    Chandler, Kevin Brown; Pompach, Petr; Goldman, Radoslav; Edwards, Nathan

    2013-08-01

    Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something that detached N-glycan and deglycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy that takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false discovery rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at a false discovery rate of 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at a false discovery rate of 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http

  6. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  7. Astrobiological complexity with probabilistic cellular automata.

    PubMed

    Vukotić, Branislav; Ćirković, Milan M

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches. PMID:22832998

  8. Astrobiological Complexity with Probabilistic Cellular Automata

    NASA Astrophysics Data System (ADS)

    Vukotić, Branislav; Ćirković, Milan M.

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  9. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. PMID:27474314

  10. Integrated approach using multistep enzyme digestion, TiO2 enrichment, and database search for in-depth phosphoproteomic profiling.

    PubMed

    Han, Dohyun; Jin, Jonghwa; Yu, Jiyoung; Kim, Kyunggon; Kim, Youngsoo

    2015-01-01

    Protein phosphorylation is a major PTM that regulates important cell signaling mechanisms. In-depth phosphoproteomic analysis provides a method of examining this complex interplay, yielding a mechanistic understanding of the cellular processes and pathogenesis of various diseases. However, the analysis of protein phosphorylation is challenging, due to the low concentration of phosphoproteins in highly complex mixtures and the high variability of phosphorylation sites. Thus, typical phosphoproteome studies that are based on MS require large amounts of starting material and extensive fractionation steps to reduce the sample complexity. To this end, we present a simple strategy (integrated multistep enzyme digestion, enrichment, database search-iMEED) to improve coverage of the phosphoproteome from lower sample amounts which is faster than other commonly used approaches. It is inexpensive and adaptable to low sample amounts and saves time and effort with regard to sample preparation and mass spectrometric analysis, allowing samples to be prepared without prefractionation or specific instruments, such as HPLC. All MS data have been deposited in the ProteomeXchange with identifier PXD001033 (http://proteomecentral.proteomexchange.org/dataset/PXD001033). PMID:25159016

  11. Assigning in vivo carbamylation and acetylation in human lens proteins using tandem mass spectrometry and database searching

    NASA Astrophysics Data System (ADS)

    Park, Zee-Yong; Sadygov, Rovshan; Clark, Judy M.; Clark, John I.; Yates, John R., III

    2007-01-01

    In this paper, we show that ion trap mass spectrometers can differentiate acetylation and carbamylation modifications based on database search results for a lens protein sample. These types of modifications are difficult to distinguish on ion trap instruments because of their lower resolution and mass accuracy. The results were corroborated by using accurate mass information derived from MALDI TOF MS analysis of eluted peptides from a duplicate capillary RPLC separation. Tandem mass spectra of lysine carbamylated peptides were further verified by manual assignments of fragment ions and by the presence of characteristic fragment ions of carbamylated peptides. It was also observed that carbamylated peptides show a strong neutral loss of the carbamyl group in collision induced dissociation (CID), a feature that can be prognostic for carbamylation. In a lens tissue sample of a 67-year-old patient, 12 in vivo carbamylation sites were detected on 7 different lens proteins and 4 lysine acetylation sites were detected on 3 different lens proteins. Among the 12 in vivo carbamylation sites, 9 are novel in vivo carbamylation modification sites. Notably, in vivo carbamylation of [gamma]S crystallin, [beta]A4 crystallin, [beta]B1 crystallin, and [beta]B2 crystallin observed in this study have never been reported before.

  12. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

    PubMed Central

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  13. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  14. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa

    PubMed Central

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa. PMID:22123747

  15. Dietary Supplement Label Database (DSLD)

    MedlinePlus

    ... Print Report Error T he Dietary Supplement Label Database (DSLD) is a joint project of the National ... participants in the latest survey in the DSLD database (NHANES): The search options: Quick Search, Browse Dietary ...

  16. Semi-automated identification of N-Glycopeptides by hydrophilic interaction chromatography, nano-reverse-phase LC-MS/MS, and glycan database search.

    PubMed

    Pompach, Petr; Chandler, Kevin B; Lan, Renny; Edwards, Nathan; Goldman, Radoslav

    2012-03-01

    Glycoproteins fulfill many indispensable biological functions, and changes in protein glycosylation have been observed in various diseases. Improved analytical methods are needed to allow a complete characterization of this complex and common post-translational modification. In this study, we present a workflow for the analysis of the microheterogeneity of N-glycoproteins that couples hydrophilic interaction and nanoreverse-phase C18 chromatography to tandem QTOF mass spectrometric analysis. A glycan database search program, GlycoPeptideSearch, was developed to match N-glycopeptide MS/MS spectra with the glycopeptides comprised of a glycan drawn from the GlycomeDB glycan structure database and a peptide from a user-specified set of potentially glycosylated peptides. Application of the workflow to human haptoglobin and hemopexin, two microheterogeneous N-glycoproteins, identified a total of 57 distinct site-specific glycoforms in the case of haptoglobin and 14 site-specific glycoforms of hemopexin. Using glycan oxonium ions and peptide-characteristic glycopeptide fragment ions and by collapsing topologically redundant glycans, the search software was able to make unique N-glycopeptide assignments for 51% of assigned spectra, with the remaining assignments primarily representing isobaric topological rearrangements. The optimized workflow, coupled with GlycoPeptideSearch, is expected to make high-throughput semiautomated glycopeptide identification feasible for a wide range of users. PMID:22239659

  17. CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

    PubMed

    Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

    2010-12-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi. PMID:20696711

  18. Database search of spontaneous reports and pharmacological investigations on the sulfonylureas and glinides-induced atrophy in skeletal muscle

    PubMed Central

    Mele, Antonietta; Calzolaro, Sara; Cannone, Gianluigi; Cetrone, Michela; Conte, Diana; Tricarico, Domenico

    2014-01-01

    The ATP-sensitive K+ (KATP) channel is an emerging pathway in the skeletal muscle atrophy which is a comorbidity condition in diabetes. The “in vitro” effects of the sulfonylureas and glinides were evaluated on the protein content/muscle weight, fibers viability, mitochondrial succinic dehydrogenases (SDH) activity, and channel currents in oxidative soleus (SOL), glycolitic/oxidative flexor digitorum brevis (FDB), and glycolitic extensor digitorum longus (EDL) muscle fibers of mice using biochemical and cell-counting Kit-8 assay, image analysis, and patch-clamp techniques. The sulfonylureas were: tolbutamide, glibenclamide, and glimepiride; the glinides were: repaglinide and nateglinide. Food and Drug Administration-Adverse Effects Reporting System (FDA-AERS) database searching of atrophy-related signals associated with the use of these drugs in humans has been performed. The drugs after 24 h of incubation time reduced the protein content/muscle weight and fibers viability more effectively in FDB and SOL than in the EDL. The order of efficacy of the drugs in reducing the protein content in FDB was: repaglinide (EC50 = 5.21 × 10−6) ≥ glibenclamide(EC50 = 8.84 × 10−6) > glimepiride(EC50 = 2.93 × 10−5) > tolbutamide(EC50 = 1.07 × 10−4) > nateglinide(EC50 = 1.61 × 10−4) and it was: repaglinide(7.15 × 10−5) ≥ glibenclamide(EC50 = 9.10 × 10−5) > nateglinide(EC50 = 1.80 × 10−4) ≥ tolbutamide(EC50 = 2.19 × 10−4) > glimepiride(EC50=–) in SOL. The drug-induced atrophy can be explained by the KATP channel block and by the enhancement of the mitochondrial SDH activity. In an 8-month period, muscle atrophy was found in 0.27% of the glibenclamide reports in humans and in 0.022% of the other not sulfonylureas and glinides drugs. No reports of atrophy were found for the other sulfonylureas and glinides in the FDA-AERS. Glibenclamide induces atrophy in animal experiments and in human patients. Glimepiride shows less potential for inducing

  19. The Object-analogue approach for probabilistic forecasting

    NASA Astrophysics Data System (ADS)

    Frediani, M. E.; Hopson, T. M.; Anagnostou, E. N.; Hacker, J.

    2015-12-01

    The object-analogue is a new method to estimate forecast uncertainty and to derive probabilistic predictions of gridded forecast fields over larger regions rather than point locations. The method has been developed for improving the forecast of 10-meter wind speed over the northeast US, and it can be extended to other forecast variables, vertical levels, and other regions. The object-analogue approach combines the analog post-processing technique (Hopson 2005; Hamill 2006; Delle Monache 2011) with the Method for Object-based Diagnostic Evaluation (MODE) for forecast verification (Davis et al 2006a, b). Originally, MODE is used to verify mainly precipitation forecasts using features of a forecast region represented by an object. The analog technique is used to reduce the NWP systematic and random errors of a gridded forecast field. In this study we use MODE-derived objects to characterize the wind fields forecasts into attributes such as object area, centroid location, and intensity percentiles, and apply the analogue concept to these objects. The object-analogue method uses a database of objects derived from reforecasts and their respective reanalysis. Given a real-time forecast field, it searches the database and selects the top-ranked objects with the most similar set of attributes using the MODE fuzzy logic algorithm for object matching. The attribute probabilities obtained with the set of selected object-analogues are used to derive a multi-layer probabilistic prediction. The attribute probabilities are combined into three uncertainty layers that address the main concerns of most applications: location, area, and magnitude. The multi-layer uncertainty can be weighted and combined or used independently in such that it provides a more accurate prediction, adjusted according to the application interest. In this study we present preliminary results of the object-analogue method. Using a database with one hundred storms we perform a leave-one-out cross-validation to

  20. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We

  1. Savvy Searching.

    ERIC Educational Resources Information Center

    Jacso, Peter

    2002-01-01

    Explains desktop metasearch engines, which search the databases of several search engines simultaneously. Reviews two particular versions, the Copernic 2001 Pro and the BullsEye Pro 3, comparing costs, subject categories, display capabilities, and layout for presenting results. (LRW)

  2. HMMER web server: interactive sequence similarity searching

    PubMed Central

    Finn, Robert D.; Clements, Jody; Eddy, Sean R.

    2011-01-01

    HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them. PMID:21593126

  3. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  4. VIEWCACHE: An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nick; Sellis, Timoleon

    1991-01-01

    The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.

  5. Self-Service Computerized Bibliographic Retrieval: A Comparison of Colleague and PaperChase, Programs That Search the MEDLINE Database

    PubMed Central

    Porter, Douglas; Wigton, Robert S.; Reidelbach, Marie A.; Bleich, Howard L.; Slack, Warner V.

    1988-01-01

    Colleague and PaperChase are the two most widely used computer systems designed for clinicians and scientists who wish to search the National Library of Medicine's MEDLINE data base of biomedical references. The present study compares the performance of these two systems. Two matched groups of second-year medical students each received three hours of instruction, one group in Colleague, the other in PaperChase. Each student then attempted 10 test searches. The next day the groups were reversed, and each student attempted five additional searches. During the 3.5 hours allocated for searching, users of Colleague attempted 64 test searches and retrieved 326 target references; users of PaperChase attempted 78 searches and retrieved 496. Users of Colleague took a mean of 2.2 minutes and spent a mean of $1.20 to find each target reference; users of PaperChase took 1.6 minutes and spent $0.92. We conclude that after limited training, medical students find more references faster and at lower cost with PaperChase than with Colleague.

  6. Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database

    PubMed Central

    Hariri, Nadjla; Ravandi, Somayyeh Nadi

    2014-01-01

    Background: Medline is one of the most important databases in the biomedical field. One of the most important hosts for Medline is Elton B. Stephens CO. (EBSCO), which has presented different search methods that can be used based on the needs of the users. Visual search and MeSH-controlled search methods are among the most common methods. The goal of this research was to compare the precision of the retrieved sources in the EBSCO Medline base using MeSH-controlled and visual search methods. Methods: This research was a semi-empirical study. By holding training workshops, 70 students of higher education in different educational departments of Kashan University of Medical Sciences were taught MeSH-Controlled and visual search methods in 2012. Then, the precision of 300 searches made by these students was calculated based on Best Precision, Useful Precision, and Objective Precision formulas and analyzed in SPSS software using the independent sample T Test, and three precisions obtained with the three precision formulas were studied for the two search methods. Results: The mean precision of the visual method was greater than that of the MeSH-Controlled search for all three types of precision, i.e. Best Precision, Useful Precision, and Objective Precision, and their mean precisions were significantly different (P <0.001). Sixty-five percent of the researchers indicated that, although the visual method was better than the controlled method, the control of keywords in the controlled method resulted in finding more proper keywords for the searches. Fifty-three percent of the participants in the research also mentioned that the use of the combination of the two methods produced better results. Conclusion: For users, it is more appropriate to use a natural, language-based method, such as the visual method, in the EBSCO Medline host than to use the controlled method, which requires users to use special keywords. The potential reason for their preference was that the visual

  7. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    ERIC Educational Resources Information Center

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  8. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  9. Visualization Tools and Techniques for Search and Validation of Large Earth Science Spatial-Temporal Metadata Databases

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Herbert, A.; Kusterer, J.

    2014-12-01

    Spatial-temporal metadata databases are critical components of interactive data discovery services for ordering Earth Science datasets. The development staff at the Atmospheric Science Data Center (ASDC) works closely with satellite Earth Science mission teams such as CERES, CALIPSO, TES, MOPITT, and CATS to create and maintain metadata databases that are tailored to the data discovery needs of the Earth Science community. This presentation focuses on the visualization tools and techniques used by the ASDC software development team for data discovery and validation/optimization of spatial-temporal objects in large multi-mission spatial-temporal metadata databases. The following topics will be addressed: Optimizing the level of detail of spatial temporal metadata to provide interactive spatial query performance over a multi-year Earth Science mission Generating appropriately scaled sensor footprint gridded (raster) metadata from Level1 and Level2 Satellite and Aircraft time-series data granules Performance comparison of raster vs vector spatial granule footprint mask queries in large metadata database and a description of the visualization tools used to assist with this analysis

  10. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tandem mass spectrometry (MS/MS) is routinely used to identify proteins by comparing peptide spectra to those generated in silico from protein sequence databases. Wheat storage proteins (gliadins and glutenins) are difficult to distinguish by MS/MS as they have few cleavable tryptic sites, often res...

  11. Application of probabilistic ordinal optimization concepts to a continuous-variable probabilistic optimization problem.

    SciTech Connect

    Romero, Vicente Jose; Ayon, Douglas V.; Chen, Chun-Hung

    2003-09-01

    A very general and robust approach to solving optimization problems involving probabilistic uncertainty is through the use of Probabilistic Ordinal Optimization. At each step in the optimization problem, improvement is based only on a relative ranking of the probabilistic merits of local design alternatives, rather than on crisp quantification of the alternatives. Thus, we simply ask the question: 'Is that alternative better or worse than this one?' to some level of statistical confidence we require, not: 'HOW MUCH better or worse is that alternative to this one?'. In this paper we illustrate an elementary application of probabilistic ordinal concepts in a 2-D optimization problem. Two uncertain variables contribute to uncertainty in the response function. We use a simple Coordinate Pattern Search non-gradient-based optimizer to step toward the statistical optimum in the design space. We also discuss more sophisticated implementations, and some of the advantages and disadvantages versus non-ordinal approaches for optimization under uncertainty.

  12. What's My Substrate? Computational Function Assignment of Candida parapsilosis ADH5 by Genome Database Search, Virtual Screening, and QM/MM Calculations.

    PubMed

    Dhoke, Gaurao V; Ensari, Yunus; Davari, Mehdi D; Ruff, Anna Joëlle; Schwaneberg, Ulrich; Bocola, Marco

    2016-07-25

    Zinc-dependent medium chain reductase from Candida parapsilosis can be used in the reduction of carbonyl compounds to pharmacologically important chiral secondary alcohols. To date, the nomenclature of cpADH5 is differing (CPCR2/RCR/SADH) in the literature, and its natural substrate is not known. In this study, we utilized a substrate docking based virtual screening method combined with KEGG, MetaCyc pathway, and Candida genome databases search for the discovery of natural substrates of cpADH5. The virtual screening of 7834 carbonyl compounds from the ZINC database provided 94 aldehydes or methyl/ethyl ketones as putative carbonyl substrates. Out of which, 52 carbonyl substrates of cpADH5 with catalytically active docking pose were identified by employing mechanism based substrate docking protocol. Comparison of the virtual screening results with KEGG, MetaCyc database search, and Candida genome pathway analysis suggest that cpADH5 might be involved in the Ehrlich pathway (reduction of fusel aldehydes in leucine, isoleucine, and valine degradation). Our QM/MM calculations and experimental activity measurements affirmed that butyraldehyde substrates are the potential natural substrates of cpADH5, suggesting a carbonyl reductase role for this enzyme in butyraldehyde reduction in aliphatic amino acid degradation pathways. Phylogenetic tree analysis of known ADHs from Candida albicans shows that cpADH5 is close to caADH5. We therefore propose, according to the experimental substrate identification and sequence similarity, the common name butyraldehyde dehydrogenase cpADH5 for Candida parapsilosis CPCR2/RCR/SADH. PMID:27387009

  13. Local image descriptor-based searching framework of usable similar cases in a radiation treatment planning database for stereotactic body radiotherapy

    NASA Astrophysics Data System (ADS)

    Nonaka, Ayumi; Arimura, Hidetaka; Nakamura, Katsumasa; Shioyama, Yoshiyuki; Soufi, Mazen; Magome, Taiki; Honda, Hiroshi; Hirata, Hideki

    2014-03-01

    Radiation treatment planning (RTP) of the stereotactic body radiotherapy (SBRT) was more complex compared with conventional radiotherapy because of using a number of beam directions. We reported that similar planning cases could be helpful for determination of beam directions for treatment planners, who have less experiences of SBRT. The aim of this study was to develop a framework of searching for usable similar cases to an unplanned case in a RTP database based on a local image descriptor. This proposed framework consists of two steps searching and rearrangement. In the first step, the RTP database was searched for 10 cases most similar to object cases based on the shape similarity of two-dimensional lung region at the isocenter plane. In the second step, the 5 most similar cases were selected by using geometric features related to the location, size and shape of the planning target volume, lung and spinal cord. In the third step, the selected 5 cases were rearranged by use of the Euclidean distance of a local image descriptor, which is a similarity index based on the magnitudes and orientations of image gradients within a region of interest around an isocenter. It was assumed that the local image descriptor represents the information around lung tumors related to treatment planning. The cases, which were selected as cases most similar to test cases by the proposed method, were more resemble in terms of the tumor location than those selected by a conventional method. For evaluation of the proposed method, we applied a similar-cases-based beam arrangement method developed in the previous study to the similar cases selected by the proposed method based on a linear registration. The proposed method has the potential to suggest the superior beam-arrangements from the treatment point of view.

  14. [Method of traditional Chinese medicine formula design based on 3D-database pharmacophore search and patent retrieval].

    PubMed

    He, Yu-su; Sun, Zhi-yi; Zhang, Yan-ling

    2014-11-01

    By using the pharmacophore model of mineralocorticoid receptor antagonists as a starting point, the experiment stud- ies the method of traditional Chinese medicine formula design for anti-hypertensive. Pharmacophore models were generated by 3D-QSAR pharmacophore (Hypogen) program of the DS3.5, based on the training set composed of 33 mineralocorticoid receptor antagonists. The best pharmacophore model consisted of two Hydrogen-bond acceptors, three Hydrophobic and four excluded volumes. Its correlation coefficient of training set and test set, N, and CAI value were 0.9534, 0.6748, 2.878, and 1.119. According to the database screening, 1700 active compounds from 86 source plant were obtained. Because of lacking of available anti-hypertensive medi cation strategy in traditional theory, this article takes advantage of patent retrieval in world traditional medicine patent database, in order to design drug formula. Finally, two formulae was obtained for antihypertensive. PMID:25850277

  15. Integration of an Evidence Base into a Probabilistic Risk Assessment Model. The Integrated Medical Model Database: An Organized Evidence Base for Assessing In-Flight Crew Health Risk and System Design

    NASA Technical Reports Server (NTRS)

    Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.

  16. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

    SciTech Connect

    Kalyanaraman, Anantharaman; Cannon, William R.; Latt, Benjamin K.; Baxter, Douglas J.

    2011-11-01

    A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

  17. Probabilistic Structural Analysis Program

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.; Murthy, Pappu L. N.; Stefko, George L.; Riha, David S.; Thacker, Ben H.; Nagpal, Vinod K.; Mital, Subodh K.

    2010-01-01

    NASA/NESSUS 6.2c is a general-purpose, probabilistic analysis program that computes probability of failure and probabilistic sensitivity measures of engineered systems. Because NASA/NESSUS uses highly computationally efficient and accurate analysis techniques, probabilistic solutions can be obtained even for extremely large and complex models. Once the probabilistic response is quantified, the results can be used to support risk-informed decisions regarding reliability for safety-critical and one-of-a-kind systems, as well as for maintaining a level of quality while reducing manufacturing costs for larger-quantity products. NASA/NESSUS has been successfully applied to a diverse range of problems in aerospace, gas turbine engines, biomechanics, pipelines, defense, weaponry, and infrastructure. This program combines state-of-the-art probabilistic algorithms with general-purpose structural analysis and lifting methods to compute the probabilistic response and reliability of engineered structures. Uncertainties in load, material properties, geometry, boundary conditions, and initial conditions can be simulated. The structural analysis methods include non-linear finite-element methods, heat-transfer analysis, polymer/ceramic matrix composite analysis, monolithic (conventional metallic) materials life-prediction methodologies, boundary element methods, and user-written subroutines. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. NASA/NESSUS 6.2c is structured in a modular format with 15 elements.

  18. Chemical and biological warfare: Detection and warning systems. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning the design and testing of samplers and detectors to provide identification and warning of the presence of chemical and biological agents used in military operations. The sampling techniques are applicable to air and water testing, and evaluation of personnel and equipment exposure. Techniques involve enzyme alarms, chromotography, conductivity meters, spectrophotometry, luminescence, and solid state microsensor devices. Other Published Searches in this series on chemical warfare cover protection, defoliants, general studies, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  19. Detection and Identification of Heme c-Modified Peptides by Histidine Affinity Chromatography, High-Performance Liquid Chromatography-Mass Spectrometry, and Database Searching

    SciTech Connect

    Merkley, Eric D.; Anderson, Brian J.; Park, Jea H.; Belchik, Sara M.; Shi, Liang; Monroe, Matthew E.; Smith, Richard D.; Lipton, Mary S.

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry-(LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed, or if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, a bacterial decaheme cytochrome, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded three- to six-fold more confident peptide-spectrum matches to heme-c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for four of the ten expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering nine out of ten sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 10-4 was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  20. Accelerated Profile HMM Searches

    PubMed Central

    Eddy, Sean R.

    2011-01-01

    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches. PMID:22039361

  1. Compression of Probabilistic XML Documents

    NASA Astrophysics Data System (ADS)

    Veldman, Irma; de Keijzer, Ander; van Keulen, Maurice

    Database techniques to store, query and manipulate data that contains uncertainty receives increasing research interest. Such UDBMSs can be classified according to their underlying data model: relational, XML, or RDF. We focus on uncertain XML DBMS with as representative example the Probabilistic XML model (PXML) of [10,9]. The size of a PXML document is obviously a factor in performance. There are PXML-specific techniques to reduce the size, such as a push down mechanism, that produces equivalent but more compact PXML documents. It can only be applied, however, where possibilities are dependent. For normal XML documents there also exist several techniques for compressing a document. Since Probabilistic XML is (a special form of) normal XML, it might benefit from these methods even more. In this paper, we show that existing compression mechanisms can be combined with PXML-specific compression techniques. We also show that best compression rates are obtained with a combination of PXML-specific technique with a rather simple generic DAG-compression technique.

  2. An evaluation for cross-species proteomics research by publicly available expressed sequence tag database search using tandem mass spectral data.

    PubMed

    Huang, Mei; Chen, Tong; Chan, ZhuLong

    2006-01-01

    With 1383 tandem mass spectra derived from 120 individual protein spots separated by the two-dimensional (2-D) gel electrophoresis of protein samples from three different species, comparative analyses were performed by searching the Expressed Sequence Tag (EST) database (DB) and the NCBI non-redundant (nr) DB of green plants, respectively, which uses the Mascot search engine to establish a statistical basis. It was confirmed that the former could identify more peptides manually validated by de novo sequencing (DNS) from fewer species in more closely phylogenetic relationships than the latter in a statistically significant manner. Our data demonstrated that correct peptide identifications were given low Mascot scores (e.g. 6-14) and incorrect peptide identifications were given high Mascot scores (e.g. 68-83). Our data also showed that the current evaluation approaches to protein assignments are unsatisfactory because a few 'false-positive' proteins are recognized and several 'false-negative' proteins are rescued by manual validation. PMID:16941525

  3. High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples.

    PubMed

    Madsen, James A; Xu, Hua; Robinson, Michelle R; Horton, Andrew P; Shaw, Jared B; Giles, David K; Kaoud, Tamer S; Dalby, Kevin N; Trent, M Stephen; Brodbelt, Jennifer S

    2013-09-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS(1) and MS(2) data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of

  4. High-throughput Database Search and Large-scale Negative Polarity Liquid Chromatography–Tandem Mass Spectrometry with Ultraviolet Photodissociation for Complex Proteomic Samples*

    PubMed Central

    Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.

    2013-01-01

    The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of

  5. Robots for hazardous duties: Military, space, and nuclear facility applications. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1993-09-01

    The bibliography contains citations concerning the design and application of robots used in place of humans where the environment could be hazardous. Military applications include autonomous land vehicles, robotic howitzers, and battlefield support operations. Space operations include docking, maintenance, mission support, and intra-vehicular and extra-vehicular activities. Nuclear applications include operations within the containment vessel, radioactive waste operations, fueling operations, and plant security. Many of the articles reference control techniques and the use of expert systems in robotic operations. Applications involving industrial manufacturing, walking robots, and robot welding are cited in other published searches in this series. (Contains a minimum of 183 citations and includes a subject term index and title list.)

  6. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  7. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents. (Contains 250 citations and includes a subject term index and title list.)

  8. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  9. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-10-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents. (Contains 250 citations and includes a subject term index and title list.)

  10. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  11. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    Not Available

    1994-07-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents. (Contains 250 citations and includes a subject term index and title list.)

  12. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  13. Database in Artificial Intelligence.

    ERIC Educational Resources Information Center

    Wilkinson, Julia

    1986-01-01

    Describes a specialist bibliographic database of literature in the field of artificial intelligence created by the Turing Institute (Glasgow, Scotland) using the BRS/Search information retrieval software. The subscription method for end-users--i.e., annual fee entitles user to unlimited access to database, document provision, and printed awareness…

  14. CPDB: Carcinogenic Potency Database.

    PubMed

    Fitzpatrick, Roberta Bronson

    2008-01-01

    The Carcinogenic Potency Database reports analyses of animal cancer tests on 1,547 chemicals. These tests are used in support of cancer risk assessments for humans. Results are searchable and are made available via the National Library of Medicine's (NLM) TOXNET system. This column will provide background information on the database, as well as present search basics. PMID:19042710

  15. Probabilistic record linkage

    PubMed Central

    Sayers, Adrian; Ben-Shlomo, Yoav; Blom, Ashley W; Steele, Fiona

    2016-01-01

    Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a ‘black box’ research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the concept of deterministic linkage and contrast this with probabilistic linkage. We illustrate each step of the process using a simple exemplar and describe the data structure required to perform a probabilistic linkage. We describe the process of calculating and interpreting matched weights and how to convert matched weights into posterior probabilities of a match using Bayes theorem. We conclude this article with a brief discussion of some of the computational demands of record linkage, how you might assess the quality of your linkage algorithm, and how epidemiologists can maximize the value of their record-linked research using robust record linkage methods. PMID:26686842

  16. Probabilistic microcell prediction model

    NASA Astrophysics Data System (ADS)

    Kim, Song-Kyoo

    2002-06-01

    A microcell is a cell with 1-km or less radius which is suitable for heavily urbanized area such as a metropolitan city. This paper deals with the microcell prediction model of propagation loss which uses probabilistic techniques. The RSL (Receive Signal Level) is the factor which can evaluate the performance of a microcell and the LOS (Line-Of-Sight) component and the blockage loss directly effect on the RSL. We are combining the probabilistic method to get these performance factors. The mathematical methods include the CLT (Central Limit Theorem) and the SPC (Statistical Process Control) to get the parameters of the distribution. This probabilistic solution gives us better measuring of performance factors. In addition, it gives the probabilistic optimization of strategies such as the number of cells, cell location, capacity of cells, range of cells and so on. Specially, the probabilistic optimization techniques by itself can be applied to real-world problems such as computer-networking, human resources and manufacturing process.

  17. Probabilistic drug connectivity mapping

    PubMed Central

    2014-01-01

    Background The aim of connectivity mapping is to match drugs using drug-treatment gene expression profiles from multiple cell lines. This can be viewed as an information retrieval task, with the goal of finding the most relevant profiles for a given query drug. We infer the relevance for retrieval by data-driven probabilistic modeling of the drug responses, resulting in probabilistic connectivity mapping, and further consider the available cell lines as different data sources. We use a special type of probabilistic model to separate what is shared and specific between the sources, in contrast to earlier connectivity mapping methods that have intentionally aggregated all available data, neglecting information about the differences between the cell lines. Results We show that the probabilistic multi-source connectivity mapping method is superior to alternatives in finding functionally and chemically similar drugs from the Connectivity Map data set. We also demonstrate that an extension of the method is capable of retrieving combinations of drugs that match different relevant parts of the query drug response profile. Conclusions The probabilistic modeling-based connectivity mapping method provides a promising alternative to earlier methods. Principled integration of data from different cell lines helps to identify relevant responses for specific drug repositioning applications. PMID:24742351

  18. Method for the Compound Annotation of Conjugates in Nontargeted Metabolomics Using Accurate Mass Spectrometry, Multistage Product Ion Spectra and Compound Database Searching

    PubMed Central

    Ogura, Tairo; Bamba, Takeshi; Tai, Akihiro; Fukusaki, Eiichiro

    2015-01-01

    Owing to biotransformation, xenobiotics are often found in conjugated form in biological samples such as urine and plasma. Liquid chromatography coupled with accurate mass spectrometry with multistage collision-induced dissociation provides spectral information concerning these metabolites in complex materials. Unfortunately, compound databases typically do not contain a sufficient number of records for such conjugates. We report here on the development of a novel protocol, referred to as ChemProphet, to annotate compounds, including conjugates, using compound databases such as PubChem and ChemSpider. The annotation of conjugates involves three steps: 1. Recognition of the type and number of conjugates in the sample; 2. Compound search and annotation of the deconjugated form; and 3. In silico evaluation of the candidate conjugate. ChemProphet assigns a spectrum to each candidate by automatically exploring the substructures corresponding to the observed product ion spectrum. When finished, it annotates the candidates assigning a rank for each candidate based on the calculated score that ranks its relative likelihood. We assessed our protocol by annotating a benchmark dataset by including the product ion spectra for 102 compounds, annotating the commercially available standard for quercetin 3-glucuronide, and by conducting a model experiment using urine from mice that had been administered a green tea extract. The results show that by using the ChemProphet approach, it is possible to annotate not only the deconjugated molecules but also the conjugated molecules using an automatic interpretation method based on deconjugation that involves multistage collision-induced dissociation and in silico calculated conjugation. PMID:26819907

  19. Probabilistic liquefaction triggering based on the cone penetration test

    USGS Publications Warehouse

    Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Tokimatsu, K.

    2005-01-01

    Performance-based earthquake engineering requires a probabilistic treatment of potential failure modes in order to accurately quantify the overall stability of the system. This paper is a summary of the application portions of the probabilistic liquefaction triggering correlations proposed recently proposed by Moss and co-workers. To enable probabilistic treatment of liquefaction triggering, the variables comprising the seismic load and the liquefaction resistance were treated as inherently uncertain. Supporting data from an extensive Cone Penetration Test (CPT)-based liquefaction case history database were used to develop a probabilistic correlation. The methods used to measure the uncertainty of the load and resistance variables, how the interactions of these variables were treated using Bayesian updating, and how reliability analysis was applied to produce curves of equal probability of liquefaction are presented. The normalization for effective overburden stress, the magnitude correlated duration weighting factor, and the non-linear shear mass participation factor used are also discussed.

  20. Probabilistic Approaches: Composite Design

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Probabilistic composite design is described in terms of a computational simulation. This simulation tracks probabilistically the composite design evolution from constituent materials, fabrication process through composite mechanics, and structural component. Comparisons with experimental data are provided to illustrate selection of probabilistic design allowables, test methods/specimen guidelines, and identification of in situ versus pristine strength. For example, results show that: in situ fiber tensile strength is 90 percent of its pristine strength; flat-wise long-tapered specimens are most suitable for setting ply tensile strength allowables; a composite radome can be designed with a reliability of 0.999999; and laminate fatigue exhibits wide spread scatter at 90 percent cyclic-stress to static-strength ratios.

  1. Probabilistic boundary element method

    NASA Technical Reports Server (NTRS)

    Cruse, T. A.; Raveendra, S. T.

    1989-01-01

    The purpose of the Probabilistic Structural Analysis Method (PSAM) project is to develop structural analysis capabilities for the design analysis of advanced space propulsion system hardware. The boundary element method (BEM) is used as the basis of the Probabilistic Advanced Analysis Methods (PADAM) which is discussed. The probabilistic BEM code (PBEM) is used to obtain the structural response and sensitivity results to a set of random variables. As such, PBEM performs analogous to other structural analysis codes such as finite elements in the PSAM system. For linear problems, unlike the finite element method (FEM), the BEM governing equations are written at the boundary of the body only, thus, the method eliminates the need to model the volume of the body. However, for general body force problems, a direct condensation of the governing equations to the boundary of the body is not possible and therefore volume modeling is generally required.

  2. Formalizing Probabilistic Safety Claims

    NASA Technical Reports Server (NTRS)

    Herencia-Zapana, Heber; Hagen, George E.; Narkawicz, Anthony J.

    2011-01-01

    A safety claim for a system is a statement that the system, which is subject to hazardous conditions, satisfies a given set of properties. Following work by John Rushby and Bev Littlewood, this paper presents a mathematical framework that can be used to state and formally prove probabilistic safety claims. It also enables hazardous conditions, their uncertainties, and their interactions to be integrated into the safety claim. This framework provides a formal description of the probabilistic composition of an arbitrary number of hazardous conditions and their effects on system behavior. An example is given of a probabilistic safety claim for a conflict detection algorithm for aircraft in a 2D airspace. The motivation for developing this mathematical framework is that it can be used in an automated theorem prover to formally verify safety claims.

  3. Probabilistic Composite Design

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Probabilistic composite design is described in terms of a computational simulation. This simulation tracks probabilistically the composite design evolution from constituent materials, fabrication process, through composite mechanics and structural components. Comparisons with experimental data are provided to illustrate selection of probabilistic design allowables, test methods/specimen guidelines, and identification of in situ versus pristine strength, For example, results show that: in situ fiber tensile strength is 90% of its pristine strength; flat-wise long-tapered specimens are most suitable for setting ply tensile strength allowables: a composite radome can be designed with a reliability of 0.999999; and laminate fatigue exhibits wide-spread scatter at 90% cyclic-stress to static-strength ratios.

  4. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    PubMed

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching. PMID:22300544

  5. Probabilistic composite analysis

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Murthy, P. L. N.

    1991-01-01

    Formal procedures are described which are used to computationally simulate the probabilistic behavior of composite structures. The computational simulation starts with the uncertainties associated with all aspects of a composite structure (constituents, fabrication, assembling, etc.) and encompasses all aspects of composite behavior (micromechanics, macromechanics, combined stress failure, laminate theory, structural response, and tailoring) optimization. Typical cases are included to illustrate the formal procedure for computational simulation. The collective results of the sample cases demonstrate that uncertainties in composite behavior and structural response can be probabilistically quantified.

  6. HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    PubMed Central

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-01-01

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak

  7. Probabilistic Threshold Criterion

    SciTech Connect

    Gresshoff, M; Hrousis, C A

    2010-03-09

    The Probabilistic Shock Threshold Criterion (PSTC) Project at LLNL develops phenomenological criteria for estimating safety or performance margin on high explosive (HE) initiation in the shock initiation regime, creating tools for safety assessment and design of initiation systems and HE trains in general. Until recently, there has been little foundation for probabilistic assessment of HE initiation scenarios. This work attempts to use probabilistic information that is available from both historic and ongoing tests to develop a basis for such assessment. Current PSTC approaches start with the functional form of the James Initiation Criterion as a backbone, and generalize to include varying areas of initiation and provide a probabilistic response based on test data for 1.8 g/cc (Ultrafine) 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) and LX-17 (92.5% TATB, 7.5% Kel-F 800 binder). Application of the PSTC methodology is presented investigating the safety and performance of a flying plate detonator and the margin of an Ultrafine TATB booster initiating LX-17.

  8. Probabilistic, Multidimensional Unfolding Analysis

    ERIC Educational Resources Information Center

    Zinnes, Joseph L.; Griggs, Richard A.

    1974-01-01

    Probabilistic assumptions are added to single and multidimensional versions of the Coombs unfolding model for preferential choice (Coombs, 1950) and practical ways of obtaining maximum likelihood estimates of the scale parameters and goodness-of-fit tests of the model are presented. A Monte Carlo experiment is discussed. (Author/RC)

  9. Exact and Approximate Probabilistic Symbolic Execution

    NASA Technical Reports Server (NTRS)

    Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem

    2014-01-01

    Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.

  10. On Relations between Current Global Volcano Databases

    NASA Astrophysics Data System (ADS)

    Newhall, C. G.; Siebert, L.; Sparks, S.

    2009-12-01

    The Smithsonian’s Volcano Reference File (VRF), the database that underlies Volcanoes of the World and This Dynamic Planet, is the premier source for the “what, when, where, and how big?” of Holocene and historical eruptions. VOGRIPA (Volcanic Global Risk Identification and Analysis) will catalogue details of large eruptions, including specific phenomena and their impacts. CCDB (Collapse Caldera Database) also considers large eruptions with an emphasis on the resulting calderas. WOVOdat is bringing monitoring data from the world’s observatories into a centralized database in common formats, so that they can be searched and compared during volcanic crises and for research on preeruption processes. Oceanographic and space institutions worldwide have growing archives of volcano imagery and derivative products. Petrologic databases such as PETRODB and GEOROC offer compositions of many erupted and non-erupted magmas. Each of these informs and complements the others. Examples of interrelations include: ● Information in the VRF about individual volcanoes is the starting point and major source of background “volcano” data in WOVOdat, VOGRIPA, and petrologic databases. ● Images and digital topography from remote sensing archives offer high-resolution, consistent geospatial "base maps" for all of the other databases. ● VRF data about eruptions shows whether unrest of WOVOdat culminated in an eruption and, if yes, its type and magnitude. ● Data from WOVOdat fills in the “blanks” between eruptions in the VRF. ● VOGRIPA adds more detail to the VRF’s descriptions of eruptions, including quantification of runout distances, expanded estimated column heights and eruption impact data, and other parameters not included in the Smithsonian VRF. ● Petrologic databases can add detail to existing petrologic data of the VRF, WOVOdat, and VOGRIPA, e.g, detail needed to estimate viscosity of melt and its influence on magma and eruption dynamics ● Hazard

  11. Web Search Engines: Search Syntax and Features.

    ERIC Educational Resources Information Center

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  12. Conducting a Web Search.

    ERIC Educational Resources Information Center

    Miller-Whitehead, Marie

    Keyword and text string searches of online library catalogs often provide different results according to library and database used and depending upon how books and journals are indexed. For this reason, online databases such as ERIC often provide tutorials and recommendations for searching their site, such as how to use Boolean search strategies.…

  13. Influence of the conditions in pharmacophore generation, scoring, and 3D database search for chemical feature-based pharmacophore models: one application study of ETA- and ETB-selective antagonists.

    PubMed

    Cucarull-González, Joan R; Laggner, Christian; Langer, Thierry

    2006-01-01

    Using the commercial pharmacophore modeling suite Catalyst, we have studied the influence of the compare.scaledMultiBlobFeatureErrors . Catalyst parameter. The influence of this parameter has been studied in pharmacophore generation, hypothesis scoring, and database searching. This parameter, introduced in Catalyst 4.7, changed its default value in Catalyst 4.8, and it strongly influences the statistical quality of pharmacophore generation, scoring of the hypotheses, and database searching. Two different pharmacophore models have been constructed for the ETA and ETB receptor antagonists. Both models contain one positive ionizable, one negative ionizable, one hydrogen-bond acceptor, one hydrophobic aromatic, and one hydrophobic aliphatic feature. The models have been compared, and some differences in the position of the hydrogen-bond acceptor in the putative binding pocket have been highlighted. PMID:16711764

  14. Characterization of the phosphorylation sites of human high molecular weight neurofilament protein by electrospray ionization tandem mass spectrometry and database searching.

    PubMed

    Jaffe, H; Veeranna; Shetty, K T; Pant, H C

    1998-03-17

    Hyperphosphorylated high molecular weight neurofilament protein (NF-H) exhibits extensive phosphorylation on lysine-serine-proline (KSP) repeats in the C-terminal domain of the molecule. Specific phosphorylation sites in human NF-H were identified by proteolytic digestion and analysis of the resulting digests by a combination of microbore liquid chromatography, electrospray ionization tandem (MS/MS) ion trap mass spectrometry, and database searching. The computer programs utilized (PEPSEARCH and SEQUEST) are capable of identifying peptides and phosphorylation sites from uninterpreted MS/MS spectra, and by use of these methods, 27 phosphopeptides and their phosphorylated residues were identified. On the basis of these phosphopeptides, 38 phosphorylation sites in human NF-H were characterized. These include 33 KSP, lysine-threonine-proline (KTP) or arginine-serine-proline (RSP) sites and four unphosphorylated sites, all of which occur in the KSP repeat domain (residues 502-823); and one threonine phosphorylation site observed in a KVPTPEK motif. Six KSP sites were not characterized because of the failure to isolate and identify corresponding phosphopeptides. Heterogeneity in serine and threonine phosphorylation was observed at three sites or deduced to occur at three sites on the basis of enzyme specificity. As a result of the phosphorylated motifs identified (KSPAKEE, KSPVKEE, KS/TPEKAK, KSPEKEE, KSPVKAE, KSPAEAK, KSPPEAK, KSPEAKT, KSPAEVK, and KVPTPEK), human NF-H tail domain is postulated to be a substrate of proline-directed kinases. The threonine-phosphorylated KVPTPEK motif suggested the existence of a novel proline-directed kinase. PMID:9521714

  15. A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging.

    PubMed

    Zhou, Ning; Cheung, William K; Qiu, Guoping; Xue, Xiangyang

    2011-07-01

    The increasing availability of large quantities of user contributed images with labels has provided opportunities to develop automatic tools to tag images to facilitate image search and retrieval. In this paper, we present a novel hybrid probabilistic model (HPM) which integrates low-level image features and high-level user provided tags to automatically tag images. For images without any tags, HPM predicts new tags based solely on the low-level image features. For images with user provided tags, HPM jointly exploits both the image features and the tags in a unified probabilistic framework to recommend additional tags to label the images. The HPM framework makes use of the tag-image association matrix (TIAM). However, since the number of images is usually very large and user-provided tags are diverse, TIAM is very sparse, thus making it difficult to reliably estimate tag-to-tag co-occurrence probabilities. We developed a collaborative filtering method based on nonnegative matrix factorization (NMF) for tackling this data sparsity issue. Also, an L1 norm kernel method is used to estimate the correlations between image features and semantic concepts. The effectiveness of the proposed approach has been evaluated using three databases containing 5,000 images with 371 tags, 31,695 images with 5,587 tags, and 269,648 images with 5,018 tags, respectively. PMID:21079279

  16. Probabilistic authenticated quantum dialogue

    NASA Astrophysics Data System (ADS)

    Hwang, Tzonelih; Luo, Yi-Ping

    2015-12-01

    This work proposes a probabilistic authenticated quantum dialogue (PAQD) based on Bell states with the following notable features. (1) In our proposed scheme, the dialogue is encoded in a probabilistic way, i.e., the same messages can be encoded into different quantum states, whereas in the state-of-the-art authenticated quantum dialogue (AQD), the dialogue is encoded in a deterministic way; (2) the pre-shared secret key between two communicants can be reused without any security loophole; (3) each dialogue in the proposed PAQD can be exchanged within only one-step quantum communication and one-step classical communication. However, in the state-of-the-art AQD protocols, both communicants have to run a QKD protocol for each dialogue and each dialogue requires multiple quantum as well as classical communicational steps; (4) nevertheless, the proposed scheme can resist the man-in-the-middle attack, the modification attack, and even other well-known attacks.

  17. Probabilistic functional tractography of the human cortex.

    PubMed

    David, Olivier; Job, Anne-Sophie; De Palma, Luca; Hoffmann, Dominique; Minotti, Lorella; Kahane, Philippe

    2013-10-15

    Single-pulse direct electrical stimulation of cortical regions in patients suffering from focal drug-resistant epilepsy who are explored using intracranial electrodes induces cortico-cortical potentials that can be used to infer functional and anatomical connectivity. Here, we describe a neuroimaging framework that allows development of a new probabilistic atlas of functional tractography of the human cortex from those responses. This atlas is unique because it allows inference in vivo of the directionality and latency of cortico-cortical connectivity, which are still largely unknown at the human brain level. In this technical note, we include 1535 stimulation runs performed in 35 adult patients. We use a case of frontal lobe epilepsy to illustrate the asymmetrical connectivity between the posterior hippocampal gyrus and the orbitofrontal cortex. In addition, as a proof of concept for group studies, we study the probabilistic functional tractography between the posterior superior temporal gyrus and the inferior frontal gyrus. In the near future, the atlas database will be continuously increased, and the methods will be improved in parallel, for more accurate estimation of features of interest. Generated probabilistic maps will be freely distributed to the community because they provide critical information for further understanding and modelling of large-scale brain networks. PMID:23707583

  18. Probabilistic Models for Solar Particle Events

    NASA Technical Reports Server (NTRS)

    Adams, James H., Jr.; Dietrich, W. F.; Xapsos, M. A.; Welton, A. M.

    2009-01-01

    Probabilistic Models of Solar Particle Events (SPEs) are used in space mission design studies to provide a description of the worst-case radiation environment that the mission must be designed to tolerate.The models determine the worst-case environment using a description of the mission and a user-specified confidence level that the provided environment will not be exceeded. This poster will focus on completing the existing suite of models by developing models for peak flux and event-integrated fluence elemental spectra for the Z>2 elements. It will also discuss methods to take into account uncertainties in the data base and the uncertainties resulting from the limited number of solar particle events in the database. These new probabilistic models are based on an extensive survey of SPE measurements of peak and event-integrated elemental differential energy spectra. Attempts are made to fit the measured spectra with eight different published models. The model giving the best fit to each spectrum is chosen and used to represent that spectrum for any energy in the energy range covered by the measurements. The set of all such spectral representations for each element is then used to determine the worst case spectrum as a function of confidence level. The spectral representation that best fits these worst case spectra is found and its dependence on confidence level is parameterized. This procedure creates probabilistic models for the peak and event-integrated spectra.

  19. Geothermal probabilistic cost study

    NASA Technical Reports Server (NTRS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-01-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  20. Geothermal probabilistic cost study

    SciTech Connect

    Orren, L.H.; Ziman, G.M.; Jones, S.C.; Lee, T.K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model is used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents are analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance are examined. (MHR)

  1. Geothermal probabilistic cost study

    NASA Astrophysics Data System (ADS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  2. Probabilistic Fatigue: Computational Simulation

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2002-01-01

    Fatigue is a primary consideration in the design of aerospace structures for long term durability and reliability. There are several types of fatigue that must be considered in the design. These include low cycle, high cycle, combined for different cyclic loading conditions - for example, mechanical, thermal, erosion, etc. The traditional approach to evaluate fatigue has been to conduct many tests in the various service-environment conditions that the component will be subjected to in a specific design. This approach is reasonable and robust for that specific design. However, it is time consuming, costly and needs to be repeated for designs in different operating conditions in general. Recent research has demonstrated that fatigue of structural components/structures can be evaluated by computational simulation based on a novel paradigm. Main features in this novel paradigm are progressive telescoping scale mechanics, progressive scale substructuring and progressive structural fracture, encompassed with probabilistic simulation. These generic features of this approach are to probabilistically telescope scale local material point damage all the way up to the structural component and to probabilistically scale decompose structural loads and boundary conditions all the way down to material point. Additional features include a multifactor interaction model that probabilistically describes material properties evolution, any changes due to various cyclic load and other mutually interacting effects. The objective of the proposed paper is to describe this novel paradigm of computational simulation and present typical fatigue results for structural components. Additionally, advantages, versatility and inclusiveness of computational simulation versus testing are discussed. Guidelines for complementing simulated results with strategic testing are outlined. Typical results are shown for computational simulation of fatigue in metallic composite structures to demonstrate the

  3. Probabilistic simple splicing systems

    NASA Astrophysics Data System (ADS)

    Selvarajoo, Mathuri; Heng, Fong Wan; Sarmin, Nor Haniza; Turaev, Sherzod

    2014-06-01

    A splicing system, one of the early theoretical models for DNA computing was introduced by Head in 1987. Splicing systems are based on the splicing operation which, informally, cuts two strings of DNA molecules at the specific recognition sites and attaches the prefix of the first string to the suffix of the second string, and the prefix of the second string to the suffix of the first string, thus yielding the new strings. For a specific type of splicing systems, namely the simple splicing systems, the recognition sites are the same for both strings of DNA molecules. It is known that splicing systems with finite sets of axioms and splicing rules only generate regular languages. Hence, different types of restrictions have been considered for splicing systems in order to increase their computational power. Recently, probabilistic splicing systems have been introduced where the probabilities are initially associated with the axioms, and the probabilities of the generated strings are computed from the probabilities of the initial strings. In this paper, some properties of probabilistic simple splicing systems are investigated. We prove that probabilistic simple splicing systems can also increase the computational power of the splicing languages generated.

  4. Selecting Software for a Development Information Database.

    ERIC Educational Resources Information Center

    Geethananda, Hemamalee

    1991-01-01

    Describes software selection criteria considered for use with the bibliographic database of the Development Information Network for South Asia (DEVINSA), which is located in Sri Lanka. Highlights include ease of database creation, database size, input, editing, data validation, inverted files, searching, storing searches, vocabulary control, user…

  5. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  6. Probabilistic Tsunami Hazard Analysis

    NASA Astrophysics Data System (ADS)

    Thio, H. K.; Ichinose, G. A.; Somerville, P. G.; Polet, J.

    2006-12-01

    The recent tsunami disaster caused by the 2004 Sumatra-Andaman earthquake has focused our attention to the hazard posed by large earthquakes that occur under water, in particular subduction zone earthquakes, and the tsunamis that they generate. Even though these kinds of events are rare, the very large loss of life and material destruction caused by this earthquake warrant a significant effort towards the mitigation of the tsunami hazard. For ground motion hazard, Probabilistic Seismic Hazard Analysis (PSHA) has become a standard practice in the evaluation and mitigation of seismic hazard to populations in particular with respect to structures, infrastructure and lifelines. Its ability to condense the complexities and variability of seismic activity into a manageable set of parameters greatly facilitates the design of effective seismic resistant buildings but also the planning of infrastructure projects. Probabilistic Tsunami Hazard Analysis (PTHA) achieves the same goal for hazards posed by tsunami. There are great advantages of implementing such a method to evaluate the total risk (seismic and tsunami) to coastal communities. The method that we have developed is based on the traditional PSHA and therefore completely consistent with standard seismic practice. Because of the strong dependence of tsunami wave heights on bathymetry, we use a full waveform tsunami waveform computation in lieu of attenuation relations that are common in PSHA. By pre-computing and storing the tsunami waveforms at points along the coast generated for sets of subfaults that comprise larger earthquake faults, we can efficiently synthesize tsunami waveforms for any slip distribution on those faults by summing the individual subfault tsunami waveforms (weighted by their slip). This efficiency make it feasible to use Green's function summation in lieu of attenuation relations to provide very accurate estimates of tsunami height for probabilistic calculations, where one typically computes

  7. Online Patent Searching: The Realities.

    ERIC Educational Resources Information Center

    Kaback, Stuart M.

    1983-01-01

    Considers patent subject searching capabilities of major online databases, noting patent claims, "deep-indexed" files, test searches, retrieval of related references, multi-database searching, improvements needed in indexing of chemical structures, full text searching, improvements needed in handling numerical data, and augmenting a subject search…

  8. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  9. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  10. JICST Factual Database(2)

    NASA Astrophysics Data System (ADS)

    Araki, Keisuke

    The computer programme, which builds atom-bond connection tables from nomenclatures, is developed. Chemical substances with their nomenclature and varieties of trivial names or experimental code numbers are inputted. The chemical structures of the database are stereospecifically stored and are able to be searched and displayed according to stereochemistry. Source data are from laws and regulations of Japan, RTECS of US and so on. The database plays a central role within the integrated fact database service of JICST and makes interrelational retrieval possible.

  11. Online Petroleum Industry Bibliographic Databases: A Review.

    ERIC Educational Resources Information Center

    Anderson, Margaret B.

    This paper discusses the present status of the bibliographic database industry, reviews the development of online databases of interest to the petroleum industry, and considers future developments in online searching and their effect on libraries and information centers. Three groups of databases are described: (1) databases developed by the…

  12. Strategies for Introducing Databasing into Science.

    ERIC Educational Resources Information Center

    Anderson, Christopher L.

    1990-01-01

    Outlines techniques used in the context of a sixth grade science class to teach database structure and search strategies for science using the AppleWorks program. Provides templates and questions for class and element databases. (Author/YP)

  13. Switching strategies to optimize search

    NASA Astrophysics Data System (ADS)

    Shlesinger, Michael F.

    2016-03-01

    Search strategies are explored when the search time is fixed, success is probabilistic and the estimate for success can diminish with time if there is not a successful result. Under the time constraint the problem is to find the optimal time to switch a search strategy or search location. Several variables are taken into account, including cost, gain, rate of success if a target is present and the probability that a target is present.

  14. Time Analysis for Probabilistic Workflows

    SciTech Connect

    Czejdo, Bogdan; Ferragut, Erik M

    2012-01-01

    There are many theoretical and practical results in the area of workflow modeling, especially when the more formal workflows are used. In this paper we focus on probabilistic workflows. We show algorithms for time computations in probabilistic workflows. With time of activities more precisely modeled, we can achieve improvement in the work cooperation and analyses of cooperation including simulation and visualization.

  15. Topics in Probabilistic Judgment Aggregation

    ERIC Educational Resources Information Center

    Wang, Guanchun

    2011-01-01

    This dissertation is a compilation of several studies that are united by their relevance to probabilistic judgment aggregation. In the face of complex and uncertain events, panels of judges are frequently consulted to provide probabilistic forecasts, and aggregation of such estimates in groups often yield better results than could have been made…

  16. Probabilistic analysis of mechanical systems

    SciTech Connect

    Priddy, T.G.; Paez, T.L.; Veers, P.S.

    1993-09-01

    This paper proposes a framework for the comprehensive analysis of complex problems in probabilistic structural mechanics. Tools that can be used to accurately estimate the probabilistic behavior of mechanical systems are discussed, and some of the techniques proposed in the paper are developed and used in the solution of a problem in nonlinear structural dynamics.

  17. Probabilistic cellular automata.

    PubMed

    Agapie, Alexandru; Andreica, Anca; Giuclea, Marius

    2014-09-01

    Cellular automata are binary lattices used for modeling complex dynamical systems. The automaton evolves iteratively from one configuration to another, using some local transition rule based on the number of ones in the neighborhood of each cell. With respect to the number of cells allowed to change per iteration, we speak of either synchronous or asynchronous automata. If randomness is involved to some degree in the transition rule, we speak of probabilistic automata, otherwise they are called deterministic. With either type of cellular automaton we are dealing with, the main theoretical challenge stays the same: starting from an arbitrary initial configuration, predict (with highest accuracy) the end configuration. If the automaton is deterministic, the outcome simplifies to one of two configurations, all zeros or all ones. If the automaton is probabilistic, the whole process is modeled by a finite homogeneous Markov chain, and the outcome is the corresponding stationary distribution. Based on our previous results for the asynchronous case-connecting the probability of a configuration in the stationary distribution to its number of zero-one borders-the article offers both numerical and theoretical insight into the long-term behavior of synchronous cellular automata. PMID:24999557

  18. Quantum probabilistic logic programming

    NASA Astrophysics Data System (ADS)

    Balu, Radhakrishnan

    2015-05-01

    We describe a quantum mechanics based logic programming language that supports Horn clauses, random variables, and covariance matrices to express and solve problems in probabilistic logic. The Horn clauses of the language wrap random variables, including infinite valued, to express probability distributions and statistical correlations, a powerful feature to capture relationship between distributions that are not independent. The expressive power of the language is based on a mechanism to implement statistical ensembles and to solve the underlying SAT instances using quantum mechanical machinery. We exploit the fact that classical random variables have quantum decompositions to build the Horn clauses. We establish the semantics of the language in a rigorous fashion by considering an existing probabilistic logic language called PRISM with classical probability measures defined on the Herbrand base and extending it to the quantum context. In the classical case H-interpretations form the sample space and probability measures defined on them lead to consistent definition of probabilities for well formed formulae. In the quantum counterpart, we define probability amplitudes on Hinterpretations facilitating the model generations and verifications via quantum mechanical superpositions and entanglements. We cast the well formed formulae of the language as quantum mechanical observables thus providing an elegant interpretation for their probabilities. We discuss several examples to combine statistical ensembles and predicates of first order logic to reason with situations involving uncertainty.

  19. Criteria for Comparing Children's Web Search Tools.

    ERIC Educational Resources Information Center

    Kuntz, Jerry

    1999-01-01

    Presents criteria for evaluating and comparing Web search tools designed for children. Highlights include database size; accountability; categorization; search access methods; help files; spell check; URL searching; links to alternative search services; advertising; privacy policy; and layout and design. (LRW)

  20. Probabilistic Finite Element: Variational Theory

    NASA Technical Reports Server (NTRS)

    Belytschko, T.; Liu, W. K.

    1985-01-01

    The goal of this research is to provide techniques which are cost-effective and enable the engineer to evaluate the effect of uncertainties in complex finite element models. Embedding the probabilistic aspects in a variational formulation is a natural approach. In addition, a variational approach to probabilistic finite elements enables it to be incorporated within standard finite element methodologies. Therefore, once the procedures are developed, they can easily be adapted to existing general purpose programs. Furthermore, the variational basis for these methods enables them to be adapted to a wide variety of structural elements and to provide a consistent basis for incorporating probabilistic features in many aspects of the structural problem. Tasks concluded include the theoretical development of probabilistic variational equations for structural dynamics, the development of efficient numerical algorithms for probabilistic sensitivity displacement and stress analysis, and integration of methodologies into a pilot computer code.

  1. 78 FR 15746 - Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-12

    ... COMMISSION Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State... document entitled: Compendium of Analyses to Investigate Select Level 1 Probabilistic Risk Assessment End..., select ``ADAMS Public Documents'' and then select ``Begin Web- based ADAMS Search.'' For problems...

  2. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  3. Novel probabilistic neuroclassifier

    NASA Astrophysics Data System (ADS)

    Hong, Jiang; Serpen, Gursel

    2003-09-01

    A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.

  4. Probabilistic fracture finite elements

    NASA Technical Reports Server (NTRS)

    Liu, W. K.; Belytschko, T.; Lua, Y. J.

    1991-01-01

    The Probabilistic Fracture Mechanics (PFM) is a promising method for estimating the fatigue life and inspection cycles for mechanical and structural components. The Probability Finite Element Method (PFEM), which is based on second moment analysis, has proved to be a promising, practical approach to handle problems with uncertainties. As the PFEM provides a powerful computational tool to determine first and second moment of random parameters, the second moment reliability method can be easily combined with PFEM to obtain measures of the reliability of the structural system. The method is also being applied to fatigue crack growth. Uncertainties in the material properties of advanced materials such as polycrystalline alloys, ceramics, and composites are commonly observed from experimental tests. This is mainly attributed to intrinsic microcracks, which are randomly distributed as a result of the applied load and the residual stress.

  5. Probabilistic Fiber Composite Micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, Thomas A.

    1996-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. The variables in which uncertainties are accounted for include constituent and void volume ratios, constituent elastic properties and strengths, and fiber misalignment. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material property variations induced by random changes expected at the material micro level. Regression results are presented to show the relative correlation between predictor and response variables in the study. These computational procedures make possible a formal description of anticipated random processes at the intra-ply level, and the related effects of these on composite properties.

  6. Probabilistic retinal vessel segmentation

    NASA Astrophysics Data System (ADS)

    Wu, Chang-Hua; Agam, Gady

    2007-03-01

    Optic fundus assessment is widely used for diagnosing vascular and non-vascular pathology. Inspection of the retinal vasculature may reveal hypertension, diabetes, arteriosclerosis, cardiovascular disease and stroke. Due to various imaging conditions retinal images may be degraded. Consequently, the enhancement of such images and vessels in them is an important task with direct clinical applications. We propose a novel technique for vessel enhancement in retinal images that is capable of enhancing vessel junctions in addition to linear vessel segments. This is an extension of vessel filters we have previously developed for vessel enhancement in thoracic CT scans. The proposed approach is based on probabilistic models which can discern vessels and junctions. Evaluation shows the proposed filter is better than several known techniques and is comparable to the state of the art when evaluated on a standard dataset. A ridge-based vessel tracking process is applied on the enhanced image to demonstrate the effectiveness of the enhancement filter.

  7. Searching LEXIS and WESTLAW: Part II.

    ERIC Educational Resources Information Center

    Franklin, Carl

    1986-01-01

    This second of a three-part series compares search features (i.e., truncation symbols, boolean operators, proximity operators, phrase searching, save searches) of two databases providing legal information. Search tips concerning charges and effective searching and tables listing functions of commands and proximity operators for both databases are…

  8. Probabilistic graphic models applied to identification of diseases.

    PubMed

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases. PMID:26154555

  9. Probabilistic graphic models applied to identification of diseases

    PubMed Central

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    ABSTRACT Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases. PMID:26154555

  10. Databases for K-8 Students

    ERIC Educational Resources Information Center

    Young, Terrence E., Jr.

    2004-01-01

    Today's elementary school students have been exposed to computers since birth, so it is not surprising that they are so proficient at using them. As a result, they are ready to search databases that include topics and information appropriate for their age level. Subscription databases are digital copies of magazines, newspapers, journals,…

  11. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

    PubMed Central

    Jones, Andrew R.; Siepen, Jennifer A.; Hubbard, Simon J.; Paton, Norman W.

    2010-01-01

    Tandem mass spectrometry, run in combination with liquid chromatography (LC-MS/MS), can generate large numbers of peptide and protein identifications, for which a variety of database search engines are available. Distinguishing correct identifications from false positives is far from trivial because all data sets are noisy, and tend to be too large for manual inspection, therefore probabilistic methods must be employed to balance the trade-off between sensitivity and specificity. Decoy databases are becoming widely used to place statistical confidence in results sets, allowing the false discovery rate (FDR) to be estimated. It has previously been demonstrated that different MS search engines produce different peptide identification sets, and as such, employing more than one search engine could result in an increased number of peptides being identified. However, such efforts are hindered by the lack of a single scoring framework employed by all search engines. We have developed a search engine independent scoring framework based on FDR which allows peptide identifications from different search engines to be combined, called the FDRScore. We observe that peptide identifications made by three search engines are infrequently false positives, and identifications made by only a single search engine, even with a strong score from the source search engine, are significantly more likely to be false positives. We have developed a second score based on the FDR within peptide identifications grouped according to the set of search engines that have made the identification, called the combined FDRScore. We demonstrate by searching large publicly available data sets that the combined FDRScore can differentiate between between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine. PMID:19253293

  12. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  13. Human genome protein function database.

    PubMed Central

    Sorenson, D. K.

    1991-01-01

    A database which focuses on the normal functions of the currently-known protein products of the Human Genome was constructed. Information is stored as text, figures, tables, and diagrams. The program contains built-in functions to modify, update, categorize, hypertext, search, create reports, and establish links to other databases. The semi-automated categorization feature of the database program was used to classify these proteins in terms of biomedical functions. PMID:1807638

  14. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    NASA Astrophysics Data System (ADS)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption

  15. Probabilistic brains: knowns and unknowns

    PubMed Central

    Pouget, Alexandre; Beck, Jeffrey M; Ma, Wei Ji; Latham, Peter E

    2015-01-01

    There is strong behavioral and physiological evidence that the brain both represents probability distributions and performs probabilistic inference. Computational neuroscientists have started to shed light on how these probabilistic representations and computations might be implemented in neural circuits. One particularly appealing aspect of these theories is their generality: they can be used to model a wide range of tasks, from sensory processing to high-level cognition. To date, however, these theories have only been applied to very simple tasks. Here we discuss the challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference. PMID:23955561

  16. Probabilistic Design of Composite Structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2006-01-01

    A formal procedure for the probabilistic design evaluation of a composite structure is described. The uncertainties in all aspects of a composite structure (constituent material properties, fabrication variables, structural geometry, and service environments, etc.), which result in the uncertain behavior in the composite structural responses, are included in the evaluation. The probabilistic evaluation consists of: (1) design criteria, (2) modeling of composite structures and uncertainties, (3) simulation methods, and (4) the decision-making process. A sample case is presented to illustrate the formal procedure and to demonstrate that composite structural designs can be probabilistically evaluated with accuracy and efficiency.

  17. Probabilistic methods for structural response analysis

    NASA Technical Reports Server (NTRS)

    Wu, Y.-T.; Burnside, O. H.; Cruse, T. A.

    1988-01-01

    This paper addresses current work to develop probabilistic structural analysis methods for integration with a specially developed probabilistic finite element code. The goal is to establish distribution functions for the structural responses of stochastic structures under uncertain loadings. Several probabilistic analysis methods are proposed covering efficient structural probabilistic analysis methods, correlated random variables, and response of linear system under stationary random loading.

  18. Electronic Databases.

    ERIC Educational Resources Information Center

    Williams, Martha E.

    1985-01-01

    Presents examples of bibliographic, full-text, and numeric databases. Also discusses how to access these databases online, aids to online retrieval, and several issues and trends (including copyright and downloading, transborder data flow, use of optical disc/videodisc technology, and changing roles in database generation and processing). (JN)

  19. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  20. Hawaii bibliographic database

    USGS Publications Warehouse

    Wright, T.L.; Takahashi, T.J.

    1998-01-01

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  1. Custom Search Engines: Tools & Tips

    ERIC Educational Resources Information Center

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  2. PADB : Published Association Database

    PubMed Central

    Rhee, Hwanseok; Lee, Jin-Sung

    2007-01-01

    Background Although molecular pathway information and the International HapMap Project data can help biomedical researchers to investigate the aetiology of complex diseases more effectively, such information is missing or insufficient in current genetic association databases. In addition, only a few of the environmental risk factors are included as gene-environment interactions, and the risk measures of associations are not indexed in any association databases. Description We have developed a published association database (PADB; ) that includes both the genetic associations and the environmental risk factors available in PubMed database. Each genetic risk factor is linked to a molecular pathway database and the HapMap database through human gene symbols identified in the abstracts. And the risk measures such as odds ratios or hazard ratios are extracted automatically from the abstracts when available. Thus, users can review the association data sorted by the risk measures, and genetic associations can be grouped by human genes or molecular pathways. The search results can also be saved to tab-delimited text files for further sorting or analysis. Currently, PADB indexes more than 1,500,000 PubMed abstracts that include 3442 human genes, 461 molecular pathways and about 190,000 risk measures ranging from 0.00001 to 4878.9. Conclusion PADB is a unique online database of published associations that will serve as a novel and powerful resource for reviewing and interpreting huge association data of complex human diseases. PMID:17877839

  3. Probabilistic Open Set Recognition

    NASA Astrophysics Data System (ADS)

    Jain, Lalit Prithviraj

    Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary

  4. Probabilistic theories with purification

    SciTech Connect

    Chiribella, Giulio; D'Ariano, Giacomo Mauro; Perinotti, Paolo

    2010-06-15

    We investigate general probabilistic theories in which every mixed state has a purification, unique up to reversible channels on the purifying system. We show that the purification principle is equivalent to the existence of a reversible realization of every physical process, that is, to the fact that every physical process can be regarded as arising from a reversible interaction of the system with an environment, which is eventually discarded. From the purification principle we also construct an isomorphism between transformations and bipartite states that possesses all structural properties of the Choi-Jamiolkowski isomorphism in quantum theory. Such an isomorphism allows one to prove most of the basic features of quantum theory, like, e.g., existence of pure bipartite states giving perfect correlations in independent experiments, no information without disturbance, no joint discrimination of all pure states, no cloning, teleportation, no programming, no bit commitment, complementarity between correctable channels and deletion channels, characterization of entanglement-breaking channels as measure-and-prepare channels, and others, without resorting to the mathematical framework of Hilbert spaces.

  5. Online Database Coverage of Forensic Medicine.

    ERIC Educational Resources Information Center

    Snow, Bonnie; Ifshin, Steven L.

    1984-01-01

    Online seaches of sample topics in the area of forensic medicine were conducted in the following life science databases: Biosis Previews, Excerpta Medica, Medline, Scisearch, and Chemical Abstracts Search. Search outputs analyzed according to criteria of recall, uniqueness, overlap, and utility reveal the need for a cross-database approach to…

  6. A Stemming Algorithm for Latin Text Databases.

    ERIC Educational Resources Information Center

    Schinke, Robyn; And Others

    1996-01-01

    Describes the design of a stemming algorithm for searching Latin text databases. The algorithm uses a longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries for processing query and database words that enables users to pursue specific searches for single grammatical forms of words.…

  7. First Look: VU/TEXT Databases.

    ERIC Educational Resources Information Center

    Willmann, Donna

    1985-01-01

    Profiles of online services provided by VU/TEXT, which maintains market access to electronic newspaper databases, highlights scope (newspapers, business information, wire services and nonnewspaper regional information, encyclopedia); search techniques; strengths; and upcoming enhancements. Descriptions of 17 databases and sample searches are…

  8. Data mining in forensic image databases

    NASA Astrophysics Data System (ADS)

    Geradts, Zeno J.; Bijhold, Jurrien

    2002-07-01

    Forensic Image Databases appear in a wide variety. The oldest computer database is with fingerprints. Other examples of databases are shoeprints, handwriting, cartridge cases, toolmarks drugs tablets and faces. In these databases searches are conducted on shape, color and other forensic features. There exist a wide variety of methods for searching in images in these databases. The result will be a list of candidates that should be compared manually. The challenge in forensic science is to combine the information acquired. The combination of the shape of a partial shoe print with information on a cartridge case can result in stronger evidence. It is expected that searching in the combination of these databases with other databases (e.g. network traffic information) more crimes will be solved. Searching in image databases is still difficult, as we can see in databases of faces. Due to lighting conditions and altering of the face by aging, it is nearly impossible to find a right face from a database of one million faces in top position by a image searching method, without using other information. The methods for data mining in images in databases (e.g. MPEG-7 framework) are discussed, and the expectations of future developments are presented in this study.

  9. Degradation monitoring using probabilistic inference

    NASA Astrophysics Data System (ADS)

    Alpay, Bulent

    In order to increase safety and improve economy and performance in a nuclear power plant (NPP), the source and extent of component degradations should be identified before failures and breakdowns occur. It is also crucial for the next generation of NPPs, which are designed to have a long core life and high fuel burnup to have a degradation monitoring system in order to keep the reactor in a safe state, to meet the designed reactor core lifetime and to optimize the scheduled maintenance. Model-based methods are based on determining the inconsistencies between the actual and expected behavior of the plant, and use these inconsistencies for detection and diagnostics of degradations. By defining degradation as a random abrupt change from the nominal to a constant degraded state of a component, we employed nonlinear filtering techniques based on state/parameter estimation. We utilized a Bayesian recursive estimation formulation in the sequential probabilistic inference framework and constructed a hidden Markov model to represent a general physical system. By addressing the problem of a filter's inability to estimate an abrupt change, which is called the oblivious filter problem in nonlinear extensions of Kalman filtering, and the sample impoverishment problem in particle filtering, we developed techniques to modify filtering algorithms by utilizing additional data sources to improve the filter's response to this problem. We utilized a reliability degradation database that can be constructed from plant specific operational experience and test and maintenance reports to generate proposal densities for probable degradation modes. These are used in a multiple hypothesis testing algorithm. We then test samples drawn from these proposal densities with the particle filtering estimates based on the Bayesian recursive estimation formulation with the Metropolis Hastings algorithm, which is a well-known Markov chain Monte Carlo method (MCMC). This multiple hypothesis testing

  10. Statistical databases

    SciTech Connect

    Kogalovskii, M.R.

    1995-03-01

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  11. Probabilistic exposure fusion.

    PubMed

    Song, Mingli; Tao, Dacheng; Chen, Chun; Bu, Jiajun; Luo, Jiebo; Zhang, Chengqi

    2012-01-01

    The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range (SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is preferred over a number of existing approaches. PMID:21609883

  12. PROBABILISTIC INFORMATION INTEGRATION TECHNOLOGY

    SciTech Connect

    J. BOOKER; M. MEYER; ET AL

    2001-02-01

    The Statistical Sciences Group at Los Alamos has successfully developed a structured, probabilistic, quantitative approach for the evaluation of system performance based on multiple information sources, called Information Integration Technology (IIT). The technology integrates diverse types and sources of data and information (both quantitative and qualitative), and their associated uncertainties, to develop distributions for performance metrics, such as reliability. Applications include predicting complex system performance, where test data are lacking or expensive to obtain, through the integration of expert judgment, historical data, computer/simulation model predictions, and any relevant test/experimental data. The technology is particularly well suited for tracking estimated system performance for systems under change (e.g. development, aging), and can be used at any time during product development, including concept and early design phases, prior to prototyping, testing, or production, and before costly design decisions are made. Techniques from various disciplines (e.g., state-of-the-art expert elicitation, statistical and reliability analysis, design engineering, physics modeling, and knowledge management) are merged and modified to develop formal methods for the data/information integration. The power of this technology, known as PREDICT (Performance and Reliability Evaluation with Diverse Information Combination and Tracking), won a 1999 R and D 100 Award (Meyer, Booker, Bement, Kerscher, 1999). Specifically the PREDICT application is a formal, multidisciplinary process for estimating the performance of a product when test data are sparse or nonexistent. The acronym indicates the purpose of the methodology: to evaluate the performance or reliability of a product/system by combining all available (often diverse) sources of information and then tracking that performance as the product undergoes changes.

  13. The comprehensive peptaibiotics database.

    PubMed

    Stoppacher, Norbert; Neumann, Nora K N; Burgstaller, Lukas; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer

    2013-05-01

    Peptaibiotics are nonribosomally biosynthesized peptides, which - according to definition - contain the marker amino acid α-aminoisobutyric acid (Aib) and possess antibiotic properties. Being known since 1958, a constantly increasing number of peptaibiotics have been described and investigated with a particular emphasis on hypocrealean fungi. Starting from the existing online 'Peptaibol Database', first published in 1997, an exhaustive literature survey of all known peptaibiotics was carried out and resulted in a list of 1043 peptaibiotics. The gathered information was compiled and used to create the new 'The Comprehensive Peptaibiotics Database', which is presented here. The database was devised as a software tool based on Microsoft (MS) Access. It is freely available from the internet at http://peptaibiotics-database.boku.ac.at and can easily be installed and operated on any computer offering a Windows XP/7 environment. It provides useful information on characteristic properties of the peptaibiotics included such as peptide category, group name of the microheterogeneous mixture to which the peptide belongs, amino acid sequence, sequence length, producing fungus, peptide subfamily, molecular formula, and monoisotopic mass. All these characteristics can be used and combined for automated search within the database, which makes The Comprehensive Peptaibiotics Database a versatile tool for the retrieval of valuable information about peptaibiotics. Sequence data have been considered as to December 14, 2012. PMID:23681723

  14. Evaluation of Federated Searching Options for the School Library

    ERIC Educational Resources Information Center

    Abercrombie, Sarah E.

    2008-01-01

    Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…

  15. A search for pre-main sequence stars in the high-latitude molecular clouds. II - A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris

    1990-01-01

    The preliminary results are reported of a survey of every EINSTEIN image which overlaps any high-latitude molecular cloud in a search for X-ray emitting pre-main sequence stars. This survey, together with complementary KPNO and IRAS data, will allow the determination of how prevalent low mass star formation is in these clouds in general and, particularly, in the translucent molecular clouds.

  16. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  17. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  18. Collection Fusion Using Bayesian Estimation of a Linear Regression Model in Image Databases on the Web.

    ERIC Educational Resources Information Center

    Kim, Deok-Hwan; Chung, Chin-Wan

    2003-01-01

    Discusses the collection fusion problem of image databases, concerned with retrieving relevant images by content based retrieval from image databases distributed on the Web. Focuses on a metaserver which selects image databases supporting similarity measures and proposes a new algorithm which exploits a probabilistic technique using Bayesian…

  19. Cytochrome P450 database.

    PubMed

    Lisitsa, A V; Gusev, S A; Karuzina, I I; Archakov, A I; Koymans, L

    2001-01-01

    This paper describes a specialized database dedicated exclusively to the cytochrome P450 superfamily. The system provides the impression of superfamily's nomenclature and describes structure and function of different P450 enzymes. Information on P450-catalyzed reactions, substrate preferences, peculiarities of induction and inhibition is available through the database management system. Also the source genes and appropriate translated proteins can be retrieved together with corresponding literature references. Developed programming solution provides the flexible interface for browsing, searching, grouping and reporting the information. Local version of database manager and required data files are distributed on a compact disk. Besides, there is a network version of the software available on Internet. The network version implies the original mechanism, which is useful for the permanent online extension of the data scope. PMID:11769119

  20. Probabilistic progressive buckling of trusses

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.

    1991-01-01

    A three-bay, space, cantilever truss is probabilistically evaluated to describe progressive buckling and truss collapse in view of the numerous uncertainties associated with the structural, material, and load variables (primitive variables) that describe the truss. Initially, the truss is deterministically analyzed for member forces, and member(s) in which the axial force exceeds the Euler buckling load are identified. These member(s) are then discretized with several intermediate nodes and a probabilistic buckling analysis is performed on the truss to obtain its probabilistic buckling loads and respective mode shapes. Furthermore, sensitivities associated with the uncertainties in the primitive variables are investigated, margin of safety values for the truss are determined, and truss end node displacements are noted. These steps are repeated by sequentially removing the buckled member(s) until onset of truss collapse is reached. Results show that this procedure yields an optimum truss configuration for a given loading and for a specified reliability.

  1. Rule Learning with Probabilistic Smoothing

    NASA Astrophysics Data System (ADS)

    Costa, Gianni; Guarascio, Massimo; Manco, Giuseppe; Ortale, Riccardo; Ritacco, Ettore

    A hierarchical classification framework is proposed for discriminating rare classes in imprecise domains, characterized by rarity (of both classes and cases), noise and low class separability. The devised framework couples the rules of a rule-based classifier with as many local probabilistic generative models. These are trained over the coverage of the corresponding rules to better catch those globally rare cases/classes that become less rare in the coverage. Two novel schemes for tightly integrating rule-based and probabilistic classification are introduced, that classify unlabeled cases by considering multiple classifier rules as well as their local probabilistic counterparts. An intensive evaluation shows that the proposed framework is competitive and often superior in accuracy w.r.t. established competitors, while overcoming them in dealing with rare classes.

  2. Vagueness as Probabilistic Linguistic Knowledge

    NASA Astrophysics Data System (ADS)

    Lassiter, Daniel

    Consideration of the metalinguistic effects of utterances involving vague terms has led Barker [1] to treat vagueness using a modified Stalnakerian model of assertion. I present a sorites-like puzzle for factual beliefs in the standard Stalnakerian model [28] and show that it can be resolved by enriching the model to make use of probabilistic belief spaces. An analogous problem arises for metalinguistic information in Barker's model, and I suggest that a similar enrichment is needed here as well. The result is a probabilistic theory of linguistic representation that retains a classical metalanguage but avoids the undesirable divorce between meaning and use inherent in the epistemic theory [34]. I also show that the probabilistic approach provides a plausible account of the sorites paradox and higher-order vagueness and that it fares well empirically and conceptually in comparison to leading competitors.

  3. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery. PMID:19727614

  4. Probabilistic framework for network partition

    NASA Astrophysics Data System (ADS)

    Li, Tiejun; Liu, Jian; E, Weinan

    2009-08-01

    Given a large and complex network, we would like to find the partition of this network into a small number of clusters. This question has been addressed in many different ways. In a previous paper, we proposed a deterministic framework for an optimal partition of a network as well as the associated algorithms. In this paper, we extend this framework to a probabilistic setting, in which each node has a certain probability of belonging to a certain cluster. Two classes of numerical algorithms for such a probabilistic network partition are presented and tested. Application to three representative examples is discussed.

  5. Probabilistic coding of quantum states

    SciTech Connect

    Grudka, Andrzej; Wojcik, Antoni; Czechlewski, Mikolaj

    2006-07-15

    We discuss the properties of probabilistic coding of two qubits to one qutrit and generalize the scheme to higher dimensions. We show that the protocol preserves the entanglement between the qubits to be encoded and the environment and can also be applied to mixed states. We present a protocol that enables encoding of n qudits to one qudit of dimension smaller than the Hilbert space of the original system and then allows probabilistic but error-free decoding of any subset of k qudits. We give a formula for the probability of successful decoding.

  6. Standardization of Keyword Search Mode

    ERIC Educational Resources Information Center

    Su, Di

    2010-01-01

    In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…

  7. A search for pre-main-sequence stars in high-latitude molecular clouds. 3: A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris; Fryer, Chris

    1995-01-01

    In order to discern whether the high-latitude molecular clouds are regions of ongoing star formation, we have used X-ray emission as a tracer of youthful stars. The entire Einstein database yields 18 images which overlap 10 of the clouds mapped partially or completely in the CO (1-0) transition, providing a total of approximately 6 deg squared of overlap. Five previously unidentified X-ray sources were detected: one has an optical counterpart which is a pre-main-sequence (PMS) star, and two have normal main-sequence stellar counterparts, while the other two are probably extragalactic sources. The PMS star is located in a high Galactic latitude Lynds dark cloud, so this result is not too suprising. The translucent clouds, though, have yet to reveal any evidence of star formation.

  8. BIOMARKERS DATABASE

    EPA Science Inventory

    This database was developed by assembling and evaluating the literature relevant to human biomarkers. It catalogues and evaluates the usefulness of biomarkers of exposure, susceptibility and effect which may be relevant for a longitudinal cohort study. In addition to describing ...

  9. Using the Reactome Database

    PubMed Central

    Haw, Robin

    2012-01-01

    There is considerable interest in the bioinformatics community in creating pathway databases. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University Medical Center and the European Bioinformatics Institute) is one such pathway database and collects structured information on all the biological pathways and processes in the human. It is an expert-authored and peer-reviewed, curated collection of well-documented molecular reactions that span the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm and other model organisms. This unit describes how to use the Reactome database to learn the steps of a biological pathway; navigate and browse through the Reactome database; identify the pathways in which a molecule of interest is involved; use the Pathway and Expression analysis tools to search the database for and visualize possible connections within user-supplied experimental data set and Reactome pathways; and the Species Comparison tool to compare human and model organism pathways. PMID:22700314

  10. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  11. A Database Selection Expert System Based on Reference Librarian's Database Selection Strategy: A Usability and Empirical Evaluation.

    ERIC Educational Resources Information Center

    Ma, Wei

    2002-01-01

    Describes the development of a prototype Web-based database selection expert system at the University of Illinois at Urbana-Champaign that is based on reference librarians' database selection strategy which allows users to simultaneously search all available databases to identify those most relevant to their search using free-text keywords or…

  12. Research on probabilistic information processing

    NASA Technical Reports Server (NTRS)

    Edwards, W.

    1973-01-01

    The work accomplished on probabilistic information processing (PIP) is reported. The research proposals and decision analysis are discussed along with the results of research on MSC setting, multiattribute utilities, and Bayesian research. Abstracts of reports concerning the PIP research are included.

  13. Probabilistic assessment of composite structures

    NASA Technical Reports Server (NTRS)

    Shiao, Michael E.; Abumeri, Galib H.; Chamis, Christos C.

    1993-01-01

    A general computational simulation methodology for an integrated probabilistic assessment of composite structures is discussed and demonstrated using aircraft fuselage (stiffened composite cylindrical shell) structures with rectangular cutouts. The computational simulation was performed for the probabilistic assessment of the structural behavior including buckling loads, vibration frequencies, global displacements, and local stresses. The scatter in the structural response is simulated based on the inherent uncertainties in the primitive (independent random) variables at the fiber matrix constituent, ply, laminate, and structural scales that describe the composite structures. The effect of uncertainties due to fabrication process variables such as fiber volume ratio, void volume ratio, ply orientation, and ply thickness is also included. The methodology has been embedded in the computer code IPACS (Integrated Probabilistic Assessment of Composite Structures). In addition to the simulated scatter, the IPACS code also calculates the sensitivity of the composite structural behavior to all the primitive variables that influence the structural behavior. This information is useful for assessing reliability and providing guidance for improvement. The results from the probabilistic assessment for the composite structure with rectangular cutouts indicate that the uncertainty in the longitudinal ply stress is mainly caused by the uncertainty in the laminate thickness, and the large overlap of the scatter in the first four buckling loads implies that the buckling mode shape for a specific buckling load can be either of the four modes.

  14. Probabilistic Techniques for Phrase Extraction.

    ERIC Educational Resources Information Center

    Feng, Fangfang; Croft, W. Bruce

    2001-01-01

    This study proposes a probabilistic model for automatically extracting English noun phrases for indexing or information retrieval. The technique is based on a Markov model, whose initial parameters are estimated by a phrase lookup program with a phrase dictionary, then optimized by a set of maximum entropy parameters. (Author/LRW)

  15. Designing Probabilistic Tasks for Kindergartners

    ERIC Educational Resources Information Center

    Skoumpourdi, Chrysanthi; Kafoussi, Sonia; Tatsis, Konstantinos

    2009-01-01

    Recent research suggests that children could be engaged in probability tasks at an early age and task characteristics seem to play an important role in the way children perceive an activity. To this direction in the present article we investigate the role of some basic characteristics of probabilistic tasks in their design and implementation. In…

  16. Making Probabilistic Relational Categories Learnable

    ERIC Educational Resources Information Center

    Jung, Wookyoung; Hummel, John E.

    2015-01-01

    Theories of relational concept acquisition (e.g., schema induction) based on structured intersection discovery predict that relational concepts with a probabilistic (i.e., family resemblance) structure ought to be extremely difficult to learn. We report four experiments testing this prediction by investigating conditions hypothesized to facilitate…

  17. Probabilistic Mass Growth Uncertainties

    NASA Technical Reports Server (NTRS)

    Plumer, Eric; Elliott, Darren

    2013-01-01

    Mass has been widely used as a variable input parameter for Cost Estimating Relationships (CER) for space systems. As these space systems progress from early concept studies and drawing boards to the launch pad, their masses tend to grow substantially, hence adversely affecting a primary input to most modeling CERs. Modeling and predicting mass uncertainty, based on historical and analogous data, is therefore critical and is an integral part of modeling cost risk. This paper presents the results of a NASA on-going effort to publish mass growth datasheet for adjusting single-point Technical Baseline Estimates (TBE) of masses of space instruments as well as spacecraft, for both earth orbiting and deep space missions at various stages of a project's lifecycle. This paper will also discusses the long term strategy of NASA Headquarters in publishing similar results, using a variety of cost driving metrics, on an annual basis. This paper provides quantitative results that show decreasing mass growth uncertainties as mass estimate maturity increases. This paper's analysis is based on historical data obtained from the NASA Cost Analysis Data Requirements (CADRe) database.

  18. Probabilistic Fatigue And Flaw-Propagation Analysis

    NASA Technical Reports Server (NTRS)

    Moore, Nicholas; Newlin, Laura; Ebbeler, Donald; Sutharshana, Sravan; Creager, Matthew

    1995-01-01

    Probabilistic Failure Assessment for Fatigue and Flaw Propagation (PFAFAT II) package of software utilizing probabilistic failure-assessment (PFA) methodology to model flaw-propagation and low-cycle-fatigue modes of failure of structural components. Comprises one program for performing probabilistic crack-growth analysis and two programs for performing probabilistic low-cycle-fatigue analysis. These programs perform probabilistic fatigue and crack-propagation analysis by means of Monte Carlo simulation. PFAFAT II is extension of, rather than replacement for, PFAFAT software (NPO-18965). Written in FORTRAN 77.

  19. A probabilistic Hu-Washizu variational principle

    NASA Technical Reports Server (NTRS)

    Liu, W. K.; Belytschko, T.; Besterfield, G. H.

    1987-01-01

    A Probabilistic Hu-Washizu Variational Principle (PHWVP) for the Probabilistic Finite Element Method (PFEM) is presented. This formulation is developed for both linear and nonlinear elasticity. The PHWVP allows incorporation of the probabilistic distributions for the constitutive law, compatibility condition, equilibrium, domain and boundary conditions into the PFEM. Thus, a complete probabilistic analysis can be performed where all aspects of the problem are treated as random variables and/or fields. The Hu-Washizu variational formulation is available in many conventional finite element codes thereby enabling the straightforward inclusion of the probabilistic features into present codes.

  20. A Gaussian Model-Based Probabilistic Approach for Pulse Transit Time Estimation.

    PubMed

    Jang, Dae-Geun; Park, Seung-Hun; Hahn, Minsoo

    2016-01-01

    In this paper, we propose a new probabilistic approach to pulse transit time (PTT) estimation using a Gaussian distribution model. It is motivated basically by the hypothesis that PTTs normalized by RR intervals follow the Gaussian distribution. To verify the hypothesis, we demonstrate the effects of arterial compliance on the normalized PTTs using the Moens-Korteweg equation. Furthermore, we observe a Gaussian distribution of the normalized PTTs on real data. In order to estimate the PTT using the hypothesis, we first assumed that R-waves in the electrocardiogram (ECG) can be correctly identified. The R-waves limit searching ranges to detect pulse peaks in the photoplethysmogram (PPG) and to synchronize the results with cardiac beats--i.e., the peaks of the PPG are extracted within the corresponding RR interval of the ECG as pulse peak candidates. Their probabilities of being the actual pulse peak are then calculated using a Gaussian probability function. The parameters of the Gaussian function are automatically updated when a new pulse peak is identified. This update makes the probability function adaptive to variations of cardiac cycles. Finally, the pulse peak is identified as the candidate with the highest probability. The proposed approach is tested on a database where ECG and PPG waveforms are collected simultaneously during the submaximal bicycle ergometer exercise test. The results are promising, suggesting that the method provides a simple but more accurate PTT estimation in real applications. PMID:25420274

  1. Entanglement and thermodynamics in general probabilistic theories

    NASA Astrophysics Data System (ADS)

    Chiribella, Giulio; Scandolo, Carlo Maria

    2015-10-01

    Entanglement is one of the most striking features of quantum mechanics, and yet it is not specifically quantum. More specific to quantum mechanics is the connection between entanglement and thermodynamics, which leads to an identification between entropies and measures of pure state entanglement. Here we search for the roots of this connection, investigating the relation between entanglement and thermodynamics in the framework of general probabilistic theories. We first address the question whether an entangled state can be transformed into another by means of local operations and classical communication. Under two operational requirements, we prove a general version of the Lo-Popescu theorem, which lies at the foundations of the theory of pure-state entanglement. We then consider a resource theory of purity where free operations are random reversible transformations, modelling the scenario where an agent has limited control over the dynamics of a closed system. Our key result is a duality between the resource theory of entanglement and the resource theory of purity, valid for every physical theory where all processes arise from pure states and reversible interactions at the fundamental level. As an application of the main result, we establish a one-to-one correspondence between entropies and measures of pure bipartite entanglement. The correspondence is then used to define entanglement measures in the general probabilistic framework. Finally, we show a duality between the task of information erasure and the task of entanglement generation, whereby the existence of entropy sinks (systems that can absorb arbitrary amounts of information) becomes equivalent to the existence of entanglement sources (correlated systems from which arbitrary amounts of entanglement can be extracted).

  2. Probabilistic load simulation: Code development status

    NASA Technical Reports Server (NTRS)

    Newell, J. F.; Ho, H.

    1991-01-01

    The objective of the Composite Load Spectra (CLS) project is to develop generic load models to simulate the composite load spectra that are included in space propulsion system components. The probabilistic loads thus generated are part of the probabilistic design analysis (PDA) of a space propulsion system that also includes probabilistic structural analyses, reliability, and risk evaluations. Probabilistic load simulation for space propulsion systems demands sophisticated probabilistic methodology and requires large amounts of load information and engineering data. The CLS approach is to implement a knowledge based system coupled with a probabilistic load simulation module. The knowledge base manages and furnishes load information and expertise and sets up the simulation runs. The load simulation module performs the numerical computation to generate the probabilistic loads with load information supplied from the CLS knowledge base.

  3. HANFORD NUCLEAR CRITICALITY SAFETY PROGRAM DATABASE

    SciTech Connect

    TOFFER, H.

    2005-05-02

    The Hanford Database is a useful information retrieval tool for a criticality safety practitioner. The database contains nuclear criticality literature screened for parameter studies. The entries, characterized with a value index, are segregated into 16 major and six minor categories. A majority of the screened entries have abstracts and a limited number are connected to the Office of Scientific and Technology Information (OSTI) database of full-size documents. Simple and complex searches of the data can be accomplished very rapidly and the end-product of the searches could be a full-size document. The paper contains a description of the database, user instructions, and a number of examples.

  4. Experiment Databases

    NASA Astrophysics Data System (ADS)

    Vanschoren, Joaquin; Blockeel, Hendrik

    Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.

  5. Probabilistic validation of protein NMR chemical shift assignments.

    PubMed

    Dashti, Hesam; Tonelli, Marco; Lee, Woonghee; Westler, William M; Cornilescu, Gabriel; Ulrich, Eldon L; Markley, John L

    2016-01-01

    Data validation plays an important role in ensuring the reliability and reproducibility of studies. NMR investigations of the functional properties, dynamics, chemical kinetics, and structures of proteins depend critically on the correctness of chemical shift assignments. We present a novel probabilistic method named ARECA for validating chemical shift assignments that relies on the nuclear Overhauser effect data . ARECA has been evaluated through its application to 26 case studies and has been shown to be complementary to, and usually more reliable than, approaches based on chemical shift databases. ARECA is available online at http://areca.nmrfam.wisc.edu/. PMID:26724815

  6. Design of a Bioactive Small Molecule that Targets the Myotonic Dystrophy Type 1 RNA Via an RNA Motif-Ligand Database & Chemical Similarity Searching

    PubMed Central

    Parkesh, Raman; Childs-Disney, Jessica L.; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A.; Disney, Matthew D.

    2012-01-01

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3′ untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5′CUG/3′GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, Sarco(endo)plasmic reticulum Ca2+ ATPase 1 (Serca1/Atp2a1), and cardiac troponin T (cTNT). Based on these observations, the development of small molecule ligands that target specifically expanded DM1 repeats could serve as therapeutics. In the present study, computational screening was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of inhibitors of the RNA-protein complex with low micromolar IC50’s, which are >20-fold more potent than the query compounds, were identified. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with virtual screening. PMID:22300544

  7. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    ERIC Educational Resources Information Center

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  8. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  9. Semantics for Biological Data Resource: Cell Image Database

    National Institute of Standards and Technology Data Gateway

    SRD 165 NIST Semantics for Biological Data Resource: Cell Image Database (Web, free access)   This Database is a prototype to test concepts for semantic searching of cell image data based on experimental details.

  10. Aggregated Interdisciplinary Databases and the Needs of Undergraduate Researchers

    ERIC Educational Resources Information Center

    Fister, Barbara; Gilbert, Julie; Fry, Amy Ray

    2008-01-01

    After seeing growing frustration among inexperienced undergraduate researchers searching a popular aggregated interdisciplinary database, the authors questioned whether the leading interdisciplinary databases are serving undergraduates' needs. As a preliminary exploration of this question, the authors queried vendors, analyzed their marketing…

  11. A Java implementation of the probabilistic argumentation system for data fusion in missile defense applications

    NASA Astrophysics Data System (ADS)

    Chan, Moses W.; Hansen, Terri N.; Monney, Paul-Andre; Baker, Todd L.

    2004-04-01

    In missile defense target recognition applications, knowledge about the problem may be imperfect, imprecise, and incomplete. Consequently, complete probabilistic models are not available. In order to obtain robust inference results and avoid making inaccurate assumptions, the probabilistic argumentation system (PAS) is employed. In PAS, knowledge is encoded as logical rules with probabilistically weighted assumptions. These rules map directly to Dempster-Shafer belief functions, which allow for uncertainty reasoning in the absence of complete probabilistic models. The PAS can be used to compute arguments for and against hypotheses of interest, and numerical answers that quantify these arguments. These arguments can be used as explanations that describe how inference results are computed. This explanation facility can also be used to validate intelligent information, which can in turn improve inference results. This paper presents a Java implementation of the probabilistic argumentation system as well as a number of new features. A rule-based syntax is defined as a problem encoding mechanism and for Monte Carlo simulation purposes. In addition, a graphical user interface (GUI) is implemented so that users can encode the knowledge database, and visualize relationships among rules and probabilistically weighted assumptions. Furthermore, a graphical model is used to represent these rules, which in turn provides graphical explanations of the inference results. We provide examples that illustrate how classical pattern recognition problems can be solved using canonical rule sets, as well as examples that demonstrate how this new software can be used as an explanation facility that describes how the inference results are determined.

  12. PCEMCAN - Probabilistic Ceramic Matrix Composites Analyzer: User's Guide, Version 1.0

    NASA Technical Reports Server (NTRS)

    Shah, Ashwin R.; Mital, Subodh K.; Murthy, Pappu L. N.

    1998-01-01

    PCEMCAN (Probabalistic CEramic Matrix Composites ANalyzer) is an integrated computer code developed at NASA Lewis Research Center that simulates uncertainties associated with the constituent properties, manufacturing process, and geometric parameters of fiber reinforced ceramic matrix composites and quantifies their random thermomechanical behavior. The PCEMCAN code can perform the deterministic as well as probabilistic analyses to predict thermomechanical properties. This User's guide details the step-by-step procedure to create input file and update/modify the material properties database required to run PCEMCAN computer code. An overview of the geometric conventions, micromechanical unit cell, nonlinear constitutive relationship and probabilistic simulation methodology is also provided in the manual. Fast probability integration as well as Monte-Carlo simulation methods are available for the uncertainty simulation. Various options available in the code to simulate probabilistic material properties and quantify sensitivity of the primitive random variables have been described. The description of deterministic as well as probabilistic results have been described using demonstration problems. For detailed theoretical description of deterministic and probabilistic analyses, the user is referred to the companion documents "Computational Simulation of Continuous Fiber-Reinforced Ceramic Matrix Composite Behavior," NASA TP-3602, 1996 and "Probabilistic Micromechanics and Macromechanics for Ceramic Matrix Composites", NASA TM 4766, June 1997.

  13. LQTS gene LOVD database.

    PubMed

    Zhang, Tao; Moss, Arthur; Cong, Peikuan; Pan, Min; Chang, Bingxi; Zheng, Liangrong; Fang, Quan; Zareba, Wojciech; Robinson, Jennifer; Lin, Changsong; Li, Zhongxiang; Wei, Junfang; Zeng, Qiang; Qi, Ming

    2010-11-01

    The Long QT Syndrome (LQTS) is a group of genetically heterogeneous disorders that predisposes young individuals to ventricular arrhythmias and sudden death. LQTS is mainly caused by mutations in genes encoding subunits of cardiac ion channels (KCNQ1, KCNH2,SCN5A, KCNE1, and KCNE2). Many other genes involved in LQTS have been described recently(KCNJ2, AKAP9, ANK2, CACNA1C, SCNA4B, SNTA1, and CAV3). We created an online database(http://www.genomed.org/LOVD/introduction.html) that provides information on variants in LQTS-associated genes. As of February 2010, the database contains 1738 unique variants in 12 genes. A total of 950 variants are considered pathogenic, 265 are possible pathogenic, 131 are unknown/unclassified, and 292 have no known pathogenicity. In addition to these mutations collected from published literature, we also submitted information on gene variants, including one possible novel pathogenic mutation in the KCNH2 splice site found in ten Chinese families with documented arrhythmias. The remote user is able to search the data and is encouraged to submit new mutations into the database. The LQTS database will become a powerful tool for both researchers and clinicians. PMID:20809527

  14. Environmental probabilistic quantitative assessment methodologies

    USGS Publications Warehouse

    Crovelli, R.A.

    1995-01-01

    In this paper, four petroleum resource assessment methodologies are presented as possible pollution assessment methodologies, even though petroleum as a resource is desirable, whereas pollution is undesirable. A methodology is defined in this paper to consist of a probability model and a probabilistic method, where the method is used to solve the model. The following four basic types of probability models are considered: 1) direct assessment, 2) accumulation size, 3) volumetric yield, and 4) reservoir engineering. Three of the four petroleum resource assessment methodologies were written as microcomputer systems, viz. TRIAGG for direct assessment, APRAS for accumulation size, and FASPU for reservoir engineering. A fourth microcomputer system termed PROBDIST supports the three assessment systems. The three assessment systems have different probability models but the same type of probabilistic method. The type of advantages of the analytic method are in computational speed and flexibility, making it ideal for a microcomputer. -from Author

  15. Probabilistic Assessment of Cancer Risk for Astronauts on Lunar Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2009-01-01

    During future lunar missions, exposure to solar particle events (SPEs) is a major safety concern for crew members during extra-vehicular activities (EVAs) on the lunar surface or Earth-to-moon transit. NASA s new lunar program anticipates that up to 15% of crew time may be on EVA, with minimal radiation shielding. For the operational challenge to respond to events of unknown size and duration, a probabilistic risk assessment approach is essential for mission planning and design. Using the historical database of proton measurements during the past 5 solar cycles, a typical hazard function for SPE occurrence was defined using a non-homogeneous Poisson model as a function of time within a non-specific future solar cycle of 4000 days duration. Distributions ranging from the 5th to 95th percentile of particle fluences for a specified mission period were simulated. Organ doses corresponding to particle fluences at the median and at the 95th percentile for a specified mission period were assessed using NASA s baryon transport model, BRYNTRN. The cancer fatality risk for astronauts as functions of age, gender, and solar cycle activity were then analyzed. The probability of exceeding the NASA 30- day limit of blood forming organ (BFO) dose inside a typical spacecraft was calculated. Future work will involve using this probabilistic risk assessment approach to SPE forecasting, combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  16. Probabilistic Simulation for Nanocomposite Characterization

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.; Coroneos, Rula M.

    2007-01-01

    A unique probabilistic theory is described to predict the properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths properties of a mononanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions.

  17. Probabilistic methods for rotordynamics analysis

    NASA Technical Reports Server (NTRS)

    Wu, Y.-T.; Torng, T. Y.; Millwater, H. R.; Fossum, A. F.; Rheinfurth, M. H.

    1991-01-01

    This paper summarizes the development of the methods and a computer program to compute the probability of instability of dynamic systems that can be represented by a system of second-order ordinary linear differential equations. Two instability criteria based upon the eigenvalues or Routh-Hurwitz test functions are investigated. Computational methods based on a fast probability integration concept and an efficient adaptive importance sampling method are proposed to perform efficient probabilistic analysis. A numerical example is provided to demonstrate the methods.

  18. Probabilistic Cloning and Quantum Computation

    NASA Astrophysics Data System (ADS)

    Gao, Ting; Yan, Feng-Li; Wang, Zhi-Xi

    2004-06-01

    We discuss the usefulness of quantum cloning and present examples of quantum computation tasks for which the cloning offers an advantage which cannot be matched by any approach that does not resort to quantum cloning. In these quantum computations, we need to distribute quantum information contained in the states about which we have some partial information. To perform quantum computations, we use a state-dependent probabilistic quantum cloning procedure to distribute quantum information in the middle of a quantum computation.

  19. SearchLit.

    ERIC Educational Resources Information Center

    Joachim, Robert

    1986-01-01

    Describes file management package offering relatively sophisticated features: conversion of previously downloaded records from commercial database into local file; keyword searching on inverted keyword file; full-text searching on author, title, bibliographic source, abstract; sorting by author, title, bibliographic source, publication year;…

  20. Using Quick Search.

    ERIC Educational Resources Information Center

    Maxfield, Sandy, Ed.; Kabus, Karl, Ed.

    This document is a guide to the use of Quick Search, a library service that provides access to more than 100 databases which contain references to journal articles and other research materials through two commercial systems--BRS After/Dark and DIALOG's Knowledge Index. The guide is divided into five sections: (1) Using Quick Search; (2) The…

  1. Distribution functions of probabilistic automata

    NASA Technical Reports Server (NTRS)

    Vatan, F.

    2001-01-01

    Each probabilistic automaton M over an alphabet A defines a probability measure Prob sub(M) on the set of all finite and infinite words over A. We can identify a k letter alphabet A with the set {0, 1,..., k-1}, and, hence, we can consider every finite or infinite word w over A as a radix k expansion of a real number X(w) in the interval [0, 1]. This makes X(w) a random variable and the distribution function of M is defined as usual: F(x) := Prob sub(M) { w: X(w) < x }. Utilizing the fixed-point semantics (denotational semantics), extended to probabilistic computations, we investigate the distribution functions of probabilistic automata in detail. Automata with continuous distribution functions are characterized. By a new, and much more easier method, it is shown that the distribution function F(x) is an analytic function if it is a polynomial. Finally, answering a question posed by D. Knuth and A. Yao, we show that a polynomial distribution function F(x) on [0, 1] can be generated by a prob abilistic automaton iff all the roots of F'(x) = 0 in this interval, if any, are rational numbers. For this, we define two dynamical systems on the set of polynomial distributions and study attracting fixed points of random composition of these two systems.

  2. Binary Encoded-Prototype Tree for Probabilistic Model Building GP

    NASA Astrophysics Data System (ADS)

    Yanase, Toshihiko; Hasegawa, Yoshihiko; Iba, Hitoshi

    In recent years, program evolution algorithms based on the estimation of distribution algorithm (EDA) have been proposed to improve search ability of genetic programming (GP) and to overcome GP-hard problems. One such method is the probabilistic prototype tree (PPT) based algorithm. The PPT based method explores the optimal tree structure by using the full tree whose number of child nodes is maximum among possible trees. This algorithm, however, suffers from problems arising from function nodes having different number of child nodes. These function nodes cause intron nodes, which do not affect the fitness function. Moreover, the function nodes having many child nodes increase the search space and the number of samples necessary for properly constructing the probabilistic model. In order to solve this problem, we propose binary encoding for PPT. In this article, we convert each function node to a subtree of binary nodes where the converted tree is correct in grammar. Our method reduces ineffectual search space, and the binary encoded tree is able to express the same tree structures as the original method. The effectiveness of the proposed method is demonstrated through the use of two computational experiments.

  3. Artificial Intelligence Databases: A Survey and Comparison.

    ERIC Educational Resources Information Center

    Stern, David

    1990-01-01

    Identifies and describes online databases containing references to materials on artificial intelligence, robotics, and expert systems, and compares them in terms of scope and usage. Recommendations for conducting online searches on artificial intelligence and related fields are offered. (CLB)

  4. Typing mineral deposits using their associated rocks, grades and tonnages using a probabilistic neural network

    USGS Publications Warehouse

    Singer, D.A.

    2006-01-01

    A probabilistic neural network is employed to classify 1610 mineral deposits into 18 types using tonnage, average Cu, Mo, Ag, Au, Zn, and Pb grades, and six generalized rock types. The purpose is to examine whether neural networks might serve for integrating geoscience information available in large mineral databases to classify sites by deposit type. Successful classifications of 805 deposits not used in training - 87% with grouped porphyry copper deposits - and the nature of misclassifications demonstrate the power of probabilistic neural networks and the value of quantitative mineral-deposit models. The results also suggest that neural networks can classify deposits as well as experienced economic geologists. ?? International Association for Mathematical Geology 2006.

  5. Probabilistic Assessment of Cancer Risk from Solar Particle Events

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2010-01-01

    For long duration missions outside of the protection of the Earth s magnetic field, space radiation presents significant health risks including cancer mortality. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons (less than several hundred MeV); and galactic cosmic ray (GCR), which include high energy protons and heavy ions. While the frequency distribution of SPEs depends strongly upon the phase within the solar activity cycle, the individual SPE occurrences themselves are random in nature. We estimated the probability of SPE occurrence using a non-homogeneous Poisson model to fit the historical database of proton measurements. Distributions of particle fluences of SPEs for a specified mission period were simulated ranging from its 5 th to 95th percentile to assess the cancer risk distribution. Spectral variability of SPEs was also examined, because the detailed energy spectra of protons are important especially at high energy levels for assessing the cancer risk associated with energetic particles for large events. We estimated the overall cumulative probability of GCR environment for a specified mission period using a solar modulation model for the temporal characterization of the GCR environment represented by the deceleration potential (^). Probabilistic assessment of cancer fatal risk was calculated for various periods of lunar and Mars missions. This probabilistic approach to risk assessment from space radiation is in support of mission design and operational planning for future manned space exploration missions. In future work, this probabilistic approach to the space radiation will be combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  6. Probabilistic Assessment of Cancer Risk from Solar Particle Events

    NASA Astrophysics Data System (ADS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    For long duration missions outside of the protection of the Earth's magnetic field, space radi-ation presents significant health risks including cancer mortality. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons (less than several hundred MeV); and galactic cosmic ray (GCR), which include high energy protons and heavy ions. While the frequency distribution of SPEs depends strongly upon the phase within the solar activity cycle, the individual SPE occurrences themselves are random in nature. We es-timated the probability of SPE occurrence using a non-homogeneous Poisson model to fit the historical database of proton measurements. Distributions of particle fluences of SPEs for a specified mission period were simulated ranging from its 5th to 95th percentile to assess the cancer risk distribution. Spectral variability of SPEs was also examined, because the detailed energy spectra of protons are important especially at high energy levels for assessing the cancer risk associated with energetic particles for large events. We estimated the overall cumulative probability of GCR environment for a specified mission period using a solar modulation model for the temporal characterization of the GCR environment represented by the deceleration po-tential (φ). Probabilistic assessment of cancer fatal risk was calculated for various periods of lunar and Mars missions. This probabilistic approach to risk assessment from space radiation is in support of mission design and operational planning for future manned space exploration missions. In future work, this probabilistic approach to the space radiation will be combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  7. High Resolution Soil Water from Regional Databases and Satellite Images

    NASA Technical Reports Server (NTRS)

    Morris, Robin D.; Smelyanskly, Vadim N.; Coughlin, Joseph; Dungan, Jennifer; Clancy, Daniel (Technical Monitor)

    2002-01-01

    This viewgraph presentation provides information on the ways in which plant growth can be inferred from satellite data and can then be used to infer soil water. There are several steps in this process, the first of which is the acquisition of data from satellite observations and relevant information databases such as the State Soil Geographic Database (STATSGO). Then probabilistic analysis and inversion with the Bayes' theorem reveals sources of uncertainty. The Markov chain Monte Carlo method is also used.

  8. Online Searching and the University Researchers.

    ERIC Educational Resources Information Center

    Horner, Jan; Thirlwall, David

    1988-01-01

    Describes a survey that compared search behaviors of humanities and social sciences researchers to those of researchers in science and technology. The factors discussed include frequency of use of online databases; personal searches versus intermediaries; and attitudes toward, interest in, knowledge about, and access to online database searching.…

  9. The Weaknesses of Full-Text Searching

    ERIC Educational Resources Information Center

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  10. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  11. Probabilistic Aeroelastic Analysis of Turbomachinery Components

    NASA Technical Reports Server (NTRS)

    Reddy, T. S. R.; Mital, S. K.; Stefko, G. L.

    2004-01-01

    A probabilistic approach is described for aeroelastic analysis of turbomachinery blade rows. Blade rows with subsonic flow and blade rows with supersonic flow with subsonic leading edge are considered. To demonstrate the probabilistic approach, the flutter frequency, damping and forced response of a blade row representing a compressor geometry is considered. The analysis accounts for uncertainties in structural and aerodynamic design variables. The results are presented in the form of probabilistic density function (PDF) and sensitivity factors. For subsonic flow cascade, comparisons are also made with different probabilistic distributions, probabilistic methods, and Monte-Carlo simulation. The approach shows that the probabilistic approach provides a more realistic and systematic way to assess the effect of uncertainties in design variables on the aeroelastic instabilities and response.

  12. Probabilistic Computational Methods in Structural Failure Analysis

    NASA Astrophysics Data System (ADS)

    Krejsa, Martin; Kralik, Juraj

    2015-12-01

    Probabilistic methods are used in engineering where a computational model contains random variables. Each random variable in the probabilistic calculations contains uncertainties. Typical sources of uncertainties are properties of the material and production and/or assembly inaccuracies in the geometry or the environment where the structure should be located. The paper is focused on methods for the calculations of failure probabilities in structural failure and reliability analysis with special attention on newly developed probabilistic method: Direct Optimized Probabilistic Calculation (DOProC), which is highly efficient in terms of calculation time and the accuracy of the solution. The novelty of the proposed method lies in an optimized numerical integration that does not require any simulation technique. The algorithm has been implemented in mentioned software applications, and has been used several times in probabilistic tasks and probabilistic reliability assessments.

  13. Probabilistic simulation of uncertainties in thermal structures

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Shiao, Michael

    1990-01-01

    Development of probabilistic structural analysis methods for hot structures is a major activity at NASA-Lewis, and consists of five program elements: (1) probabilistic loads, (2) probabilistic finite element analysis, (3) probabilistic material behavior, (4) assessment of reliability and risk, and (5) probabilistic structural performance evaluation. Attention is given to quantification of the effects of uncertainties for several variables on High Pressure Fuel Turbopump blade temperature, pressure, and torque of the Space Shuttle Main Engine; the evaluation of the cumulative distribution function for various structural response variables based on assumed uncertainties in primitive structural variables; evaluation of the failure probability; reliability and risk-cost assessment; and an outline of an emerging approach for eventual hot structures certification. Collectively, the results demonstrate that the structural durability/reliability of hot structural components can be effectively evaluated in a formal probabilistic framework. In addition, the approach can be readily extended to computationally simulate certification of hot structures for aerospace environments.

  14. Integration of Information Retrieval and Database Management Systems.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Raghavan, Vijay V.

    1988-01-01

    Discusses the motivation for integrating information retrieval and database management systems, and proposes a probabilistic retrieval model in which records in a file may be composed of attributes (formatted data items) and descriptors (content indicators). The details and resolutions of difficulties involved in integrating such systems are…

  15. Probabilistic structural analysis methods development for SSME

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Hopkins, D. A.

    1988-01-01

    The development of probabilistic structural analysis methods is a major part of the SSME Structural Durability Program and consists of three program elements: composite load spectra, probabilistic finite element structural analysis, and probabilistic structural analysis applications. Recent progress includes: (1) the effects of the uncertainties of several factors on the HPFP blade temperature pressure and torque, (2) the evaluation of the cumulative distribution function of structural response variables based on assumed uncertainties on primitive structural variables, and (3) evaluation of the failure probability. Collectively, the results obtained demonstrate that the structural durability of critical SSME components can be probabilistically evaluated.

  16. Probabilistic cloning of three nonorthogonal states

    NASA Astrophysics Data System (ADS)

    Zhang, Wen; Rui, Pinshu; Yang, Qun; Zhao, Yan; Zhang, Ziyun

    2015-04-01

    We study the probabilistic cloning of three nonorthogonal states with equal success probabilities. For simplicity, we assume that the three states belong to a special set. Analytical form of the maximal success probability for probabilistic cloning is calculated. With the maximal success probability, we deduce the explicit form of probabilistic quantum cloning machine. In the case of cloning, we get the unambiguous form of the unitary operation. It is demonstrated that the upper bound for probabilistic quantum cloning machine in (Qiu in J Phys A 35:6931, 2002) can be reached only if the three states are equidistant.

  17. Six Online Periodical Databases: A Librarian's View.

    ERIC Educational Resources Information Center

    Willems, Harry

    1999-01-01

    Compares the following World Wide Web-based periodical databases, focusing on their usefulness in K-12 school libraries: EBSCO, Electric Library, Facts on File, SIRS, Wilson, and UMI. Search interfaces, display options, help screens, printing, home access, copyright restrictions, database administration, and making a decision are discussed. A…

  18. Electronic Reference Library: Silverplatter's Database Networking Solution.

    ERIC Educational Resources Information Center

    Millea, Megan

    Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…

  19. Web Database Development: Implications for Academic Publishing.

    ERIC Educational Resources Information Center

    Fernekes, Bob

    This paper discusses the preliminary planning, design, and development of a pilot project to create an Internet accessible database and search tool for locating and distributing company data and scholarly work. Team members established four project objectives: (1) to develop a Web accessible database and decision tool that creates Web pages on the…

  20. A database/knowledge structure for a robotics vision system

    NASA Technical Reports Server (NTRS)

    Dearholt, D. W.; Gonzales, N. N.

    1987-01-01

    Desirable properties of robotics vision database systems are given, and structures which possess properties appropriate for some aspects of such database systems are examined. Included in the structures discussed is a family of networks in which link membership is determined by measures of proximity between pairs of the entities stored in the database. This type of network is shown to have properties which guarantee that the search for a matching feature vector is monotonic. That is, the database can be searched with no backtracking, if there is a feature vector in the database which matches the feature vector of the external entity which is to be identified. The construction of the database is discussed, and the search procedure is presented. A section on the support provided by the database for description of the decision-making processes and the search path is also included.

  1. Spectroscopic data for an astronomy database

    NASA Technical Reports Server (NTRS)

    Parkinson, W. H.; Smith, Peter L.

    1995-01-01

    Very few of the atomic and molecular data used in analyses of astronomical spectra are currently available in World Wide Web (WWW) databases that are searchable with hypertext browsers. We have begun to rectify this situation by making extensive atomic data files available with simple search procedures. We have also established links to other on-line atomic and molecular databases. All can be accessed from our database homepage with URL: http:// cfa-www.harvard.edu/ amp/ data/ amdata.html.

  2. Search Engines for Tomorrow's Scholars

    ERIC Educational Resources Information Center

    Fagan, Jody Condit

    2011-01-01

    Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…

  3. (Meta)Search like Google

    ERIC Educational Resources Information Center

    Rochkind, Jonathan

    2007-01-01

    The ability to search and receive results in more than one database through a single interface--or metasearch--is something many users want. Google Scholar--the search engine of specifically scholarly content--and library metasearch products like Ex Libris's MetaLib, Serials Solution's Central Search, WebFeat, and products based on MuseGlobal used…

  4. Navy precision optical interferometer database

    NASA Astrophysics Data System (ADS)

    Ryan, K. K.; Jorgensen, A. M.; Hall, T.; Armstrong, J. T.; Hutter, D.; Mozurkewich, D.

    2012-07-01

    The Navy Precision Optical Interferometer (NPOI) has now been recording astronomical observations for the better part of two decades. During that time period hundreds of thousands of observations have been obtained, with a total data volume of multiple terabytes. Additionally, in the next few years the data rate from the NPOI is expected to increase significantly. To make it easier for NPOI users to search the NPOI observations and to make it easier for them to obtain data, we have constructed a easily accessible and searchable database of observations. The database is based on a MySQL server and uses standard query language (SQL). In this paper we will describe the database table layout and show examples of possible database queries.

  5. Windows on the brain: the emerging role of atlases and databases in neuroscience

    NASA Technical Reports Server (NTRS)

    Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

    2002-01-01

    Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.

  6. Probabilistic Simulation for Nanocomposite Fracture

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2010-01-01

    A unique probabilistic theory is described to predict the uniaxial strengths and fracture properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths and fracture of a nanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions from low probability to high.

  7. Probabilistic risk assessment: Number 219

    SciTech Connect

    Bari, R.A.

    1985-11-13

    This report describes a methodology for analyzing the safety of nuclear power plants. A historical overview of plants in the US is provided, and past, present, and future nuclear safety and risk assessment are discussed. A primer on nuclear power plants is provided with a discussion of pressurized water reactors (PWR) and boiling water reactors (BWR) and their operation and containment. Probabilistic Risk Assessment (PRA), utilizing both event-tree and fault-tree analysis, is discussed as a tool in reactor safety, decision making, and communications. (FI)

  8. Probabilistic approach to EMP assessment

    SciTech Connect

    Bevensee, R.M.; Cabayan, H.S.; Deadrick, F.J.; Martin, L.C.; Mensing, R.W.

    1980-09-01

    The development of nuclear EMP hardness requirements must account for uncertainties in the environment, in interaction and coupling, and in the susceptibility of subsystems and components. Typical uncertainties of the last two kinds are briefly summarized, and an assessment methodology is outlined, based on a probabilistic approach that encompasses the basic concepts of reliability. It is suggested that statements of survivability be made compatible with system reliability. Validation of the approach taken for simple antenna/circuit systems is performed with experiments and calculations that involve a Transient Electromagnetic Range, numerical antenna modeling, separate device failure data, and a failure analysis computer program.

  9. ELNET--The Electronic Library Database System.

    ERIC Educational Resources Information Center

    King, Shirley V.

    1991-01-01

    ELNET (Electronic Library Network), a Japanese language database, allows searching of index terms and free text terms from articles and stores the full text of the articles on an optical disc system. Users can order fax copies of the text from the optical disc. This article also explains online searching and discusses machine translation. (LRW)

  10. Curcumin Resource Database

    PubMed Central

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. Database URL: http://www.crdb.in PMID:26220923

  11. Probabilistic Tsunami Hazard Assessment: the Seaside, Oregon Pilot Study

    NASA Astrophysics Data System (ADS)

    Gonzalez, F. I.; Geist, E. L.; Synolakis, C.; Titov, V. V.

    2004-12-01

    A pilot study of Seaside, Oregon is underway, to develop methodologies for probabilistic tsunami hazard assessments that can be incorporated into Flood Insurance Rate Maps (FIRMs) developed by FEMA's National Flood Insurance Program (NFIP). Current NFIP guidelines for tsunami hazard assessment rely on the science, technology and methodologies developed in the 1970s; although generally regarded as groundbreaking and state-of-the-art for its time, this approach is now superseded by modern methods that reflect substantial advances in tsunami research achieved in the last two decades. In particular, post-1990 technical advances include: improvements in tsunami source specification; improved tsunami inundation models; better computational grids by virtue of improved bathymetric and topographic databases; a larger database of long-term paleoseismic and paleotsunami records and short-term, historical earthquake and tsunami records that can be exploited to develop improved probabilistic methodologies; better understanding of earthquake recurrence and probability models. The NOAA-led U.S. National Tsunami Hazard Mitigation Program (NTHMP), in partnership with FEMA, USGS, NSF and Emergency Management and Geotechnical agencies of the five Pacific States, incorporates these advances into site-specific tsunami hazard assessments for coastal communities in Alaska, California, Hawaii, Oregon and Washington. NTHMP hazard assessment efforts currently focus on developing deterministic, "credible worst-case" scenarios that provide valuable guidance for hazard mitigation and emergency management. The NFIP focus, on the other hand, is on actuarial needs that require probabilistic hazard assessments such as those that characterize 100- and 500-year flooding events. There are clearly overlaps in NFIP and NTHMP objectives. NTHMP worst-case scenario assessments that include an estimated probability of occurrence could benefit the NFIP; NFIP probabilistic assessments of 100- and 500-yr

  12. Knowledge Abstraction in Chinese Chess Endgame Databases

    NASA Astrophysics Data System (ADS)

    Chen, Bo-Nian; Liu, Pangfeng; Hsu, Shun-Chin; Hsu, Tsan-Sheng

    Retrograde analysis is a well known approach to construct endgame databases. However, the size of the endgame databases are too large to be loaded into the main memory of a computer during tournaments. In this paper, a novel knowledge abstraction strategy is proposed to compress endgame databases. The goal is to obtain succinct knowledge for practical endgames. A specialized goal-oriented search method is described and applied on the important endgame KRKNMM. The method of combining a search algorithm with a small size of knowledge is used to handle endgame positions up to a limited depth, but with a high degree of correctness.

  13. Is Probabilistic Evidence a Source of Knowledge?

    ERIC Educational Resources Information Center

    Friedman, Ori; Turri, John

    2015-01-01

    We report a series of experiments examining whether people ascribe knowledge for true beliefs based on probabilistic evidence. Participants were less likely to ascribe knowledge for beliefs based on probabilistic evidence than for beliefs based on perceptual evidence (Experiments 1 and 2A) or testimony providing causal information (Experiment 2B).…

  14. Probabilistic Cue Combination: Less Is More

    ERIC Educational Resources Information Center

    Yurovsky, Daniel; Boyer, Ty W.; Smith, Linda B.; Yu, Chen

    2013-01-01

    Learning about the structure of the world requires learning probabilistic relationships: rules in which cues do not predict outcomes with certainty. However, in some cases, the ability to track probabilistic relationships is a handicap, leading adults to perform non-normatively in prediction tasks. For example, in the "dilution effect,"…

  15. Error Discounting in Probabilistic Category Learning

    ERIC Educational Resources Information Center

    Craig, Stewart; Lewandowsky, Stephan; Little, Daniel R.

    2011-01-01

    The assumption in some current theories of probabilistic categorization is that people gradually attenuate their learning in response to unavoidable error. However, existing evidence for this error discounting is sparse and open to alternative interpretations. We report 2 probabilistic-categorization experiments in which we investigated error…

  16. Software for Probabilistic Risk Reduction

    NASA Technical Reports Server (NTRS)

    Hensley, Scott; Michel, Thierry; Madsen, Soren; Chapin, Elaine; Rodriguez, Ernesto

    2004-01-01

    A computer program implements a methodology, denoted probabilistic risk reduction, that is intended to aid in planning the development of complex software and/or hardware systems. This methodology integrates two complementary prior methodologies: (1) that of probabilistic risk assessment and (2) a risk-based planning methodology, implemented in a prior computer program known as Defect Detection and Prevention (DDP), in which multiple requirements and the beneficial effects of risk-mitigation actions are taken into account. The present methodology and the software are able to accommodate both process knowledge (notably of the efficacy of development practices) and product knowledge (notably of the logical structure of a system, the development of which one seeks to plan). Estimates of the costs and benefits of a planned development can be derived. Functional and non-functional aspects of software can be taken into account, and trades made among them. It becomes possible to optimize the planning process in the sense that it becomes possible to select the best suite of process steps and design choices to maximize the expectation of success while remaining within budget.

  17. Bioinformatics: searching the Net.

    PubMed

    Kastin, S; Wexler, J

    1998-04-01

    During the past 30 years, there has been an explosion in the volume of published medical information. As this volume has increased, so has the need for efficient methods for searching the data. MEDLINE, the primary medical database, is currently limited to abstracts of the medical literature. MEDLINE searches use AND/OR/NOT logical searching for keywords that have been assigned to each article and for textwords included in article abstracts. Recently, the complete text of some scientific journals, including figures and tables, has become accessible electronically. Keyword and textword searches can provide an overwhelming number of results. Search engines that use phrase searching, or searches that limit the number of words between two finds, improve the precision of search engines. The development of the Internet as a vehicle for worldwide communication, and the emergence of the World Wide Web (WWW) as a common vehicle for communication have made instantaneous access to much of the entire body of medical information an exciting possibility. There is more than one way to search the WWW for information. At the present time, two broad strategies have emerged for cataloging the WWW: directories and search engines. These allow more efficient searching of the WWW. Directories catalog WWW information by creating categories and subcategories of information and then publishing pointers to information within the category listings. Directories are analogous to yellow pages of the phone book. Search engines make no attempt to categorize information. They automatically scour the WWW looking for words and then automatically create an index of those words. When a specific search engine is used, its index is searched for a particular word. Usually, search engines are nonspecific and produce voluminous results. Use of AND/OR/NOT and "near" and "adjacent" search refinements greatly improve the results of a search. Search engines that limit their scope to specific sites, and

  18. Demystifying the Search Button

    PubMed Central

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  19. Models And Results Database System.

    Energy Science and Technology Software Center (ESTSC)

    2001-03-27

    Version 00 MAR-D 4.16 is a program that is used primarily for Probabilistic Risk Assessment (PRA) data loading. This program defines a common relational database structure that is used by other PRA programs. This structure allows all of the software to access and manipulate data created by other software in the system without performing a lengthy conversion. The MAR-D program also provides the facilities for loading and unloading of PRA data from the relational databasemore » structure used to store the data to an ASCII format for interchange with other PRA software. The primary function of MAR-D is to create a data repository for NUREG-1150 and other permanent data by providing input, conversion, and output capabilities for data used by IRRAS, SARA, SETS and FRANTIC.« less

  20. YCRD: Yeast Combinatorial Regulation Database

    PubMed Central

    Wu, Wei-Sheng; Hsieh, Yen-Chen; Lai, Fu-Jou

    2016-01-01

    In eukaryotes, the precise transcriptional control of gene expression is typically achieved through combinatorial regulation using cooperative transcription factors (TFs). Therefore, a database which provides regulatory associations between cooperative TFs and their target genes is helpful for biologists to study the molecular mechanisms of transcriptional regulation of gene expression. Because there is no such kind of databases in the public domain, this prompts us to construct a database, called Yeast Combinatorial Regulation Database (YCRD), which deposits 434,197 regulatory associations between 2535 cooperative TF pairs and 6243 genes. The comprehensive collection of more than 2500 cooperative TF pairs was retrieved from 17 existing algorithms in the literature. The target genes of a cooperative TF pair (e.g. TF1-TF2) are defined as the common target genes of TF1 and TF2, where a TF’s experimentally validated target genes were downloaded from YEASTRACT database. In YCRD, users can (i) search the target genes of a cooperative TF pair of interest, (ii) search the cooperative TF pairs which regulate a gene of interest and (iii) identify important cooperative TF pairs which regulate a given set of genes. We believe that YCRD will be a valuable resource for yeast biologists to study combinatorial regulation of gene expression. YCRD is available at http://cosbi.ee.ncku.edu.tw/YCRD/ or http://cosbi2.ee.ncku.edu.tw/YCRD/. PMID:27392072

  1. Variable stars in the MACHO Collaboration database

    SciTech Connect

    Cook, K.H.; Alcock, C.; Allsman, R.A.

    1995-02-01

    The MACHO Collaboration`s search for baryonic dark matter via its gravitational microlensing signature has generated a massive database of time ordered photometry of millions Of stars in the LMC and the bulge of the Milky Way. The search`s experimental design and capabilities are reviewed and the dark matter results are briefly noted. Preliminary analysis of the {approximately} 39,000 variable stars discovered in the LMC database is presented and examples of periodic variables are shown. A class of aperiodically variable Be stars is described which is the closest background to microlensing which has been found. Plans for future work on variable stars using the MACHO data are described.

  2. eSLDB: eukaryotic subcellular localization database.

    PubMed

    Pierleoni, Andea; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2007-01-01

    Eukaryotic Subcellular Localization DataBase collects the annotations of subcellular localization of eukaryotic proteomes. So far five proteomes have been processed and stored: Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana. For each sequence, the database lists localization obtained adopting three different approaches: (i) experimentally determined (when available); (ii) homology-based (when possible); and (iii) predicted. The latter is computed with a suite of machine learning based methods, developed in house. All the data are available at our website and can be searched by sequence, by protein code and/or by protein description. Furthermore, a more complex search can be performed combining different search fields and keys. All the data contained in the database can be freely downloaded in flat file format. The database is available at http://gpcr.biocomp.unibo.it/esldb/. PMID:17108361

  3. Overview of selected molecular biological databases

    SciTech Connect

    Rayl, K.D.; Gaasterland, T.

    1994-11-01

    This paper presents an overview of the purpose, content, and design of a subset of the currently available biological databases, with an emphasis on protein databases. Databases included in this summary are 3D-ALI, Berlin RNA databank, Blocks, DSSP, EMBL Nucleotide Database, EMP, ENZYME, FSSP, GDB, GenBank, HSSP, LiMB, PDB, PIR, PKCDD, ProSite, and SWISS-PROT. The goal is to provide a starting point for researchers who wish to take advantage of the myriad available databases. Rather than providing a complete explanation of each database, we present its content and form by explaining the details of typical entries. Pointers to more complete ``user guides`` are included, along with general information on where to search for a new database.

  4. PMAG: Relational Database Definition

    NASA Astrophysics Data System (ADS)

    Keizer, P.; Koppers, A.; Tauxe, L.; Constable, C.; Genevey, A.; Staudigel, H.; Helly, J.

    2002-12-01

    The Scripps center for Physical and Chemical Earth References (PACER) was established to help create databases for reference data and make them available to the Earth science community. As part of these efforts PACER supports GERM, REM and PMAG and maintains multiple online databases under the http://earthref.org umbrella website. This website has been built on top of a relational database that allows for the archiving and electronic access to a great variety of data types and formats, permitting data queries using a wide range of metadata. These online databases are designed in Oracle 8.1.5 and they are maintained at the San Diego Supercomputer Center. They are directly available via http://earthref.org/databases/. A prototype of the PMAG relational database is now operational within the existing EarthRef.org framework under http://earthref.org/databases/PMAG/. As will be shown in our presentation, the PMAG design focuses around the general workflow that results in the determination of typical paleo-magnetic analyses. This ensures that individual data points can be traced between the actual analysis and the specimen, sample, site, locality and expedition it belongs to. These relations guarantee traceability of the data by distinguishing between original and derived data, where the actual (raw) measurements are performed on the specimen level, and data on the sample level and higher are then derived products in the database. These relations may also serve to recalculate site means when new data becomes available for that locality. The PMAG data records are extensively described in terms of metadata. These metadata are used when scientists search through this online database in order to view and download their needed data. They minimally include method descriptions for field sampling, laboratory techniques and statistical analyses. They also include selection criteria used during the interpretation of the data and, most importantly, critical information about the

  5. Interactive Graphical Queries for Bibliographic Search.

    ERIC Educational Resources Information Center

    Brooks, Martin; Campbell, Jennifer

    1999-01-01

    Presents "Islands," an interactive graphical interface for construction, modification, and management of queries during a search session on a bibliographic database. Discusses motivation and bibliographic search semantics and compares the Islands interface to the Dialog interface. (Author/LRW)

  6. The Eruption Forecasting Information System (EFIS) database project

    NASA Astrophysics Data System (ADS)

    Ogburn, Sarah; Harpel, Chris; Pesicek, Jeremy; Wellik, Jay; Pallister, John; Wright, Heather

    2016-04-01

    The Eruption Forecasting Information System (EFIS) project is a new initiative of the U.S. Geological Survey-USAID Volcano Disaster Assistance Program (VDAP) with the goal of enhancing VDAP's ability to forecast the outcome of volcanic unrest. The EFIS project seeks to: (1) Move away from relying on the collective memory to probability estimation using databases (2) Create databases useful for pattern recognition and for answering common VDAP questions; e.g. how commonly does unrest lead to eruption? how commonly do phreatic eruptions portend magmatic eruptions and what is the range of antecedence times? (3) Create generic probabilistic event trees using global data for different volcano 'types' (4) Create background, volcano-specific, probabilistic event trees for frequently active or particularly hazardous volcanoes in advance of a crisis (5) Quantify and communicate uncertainty in probabilities A major component of the project is the global EFIS relational database, which contains multiple modules designed to aid in the construction of probabilistic event trees and to answer common questions that arise during volcanic crises. The primary module contains chronologies of volcanic unrest, including the timing of phreatic eruptions, column heights, eruptive products, etc. and will be initially populated using chronicles of eruptive activity from Alaskan volcanic eruptions in the GeoDIVA database (Cameron et al. 2013). This database module allows us to query across other global databases such as the WOVOdat database of monitoring data and the Smithsonian Institution's Global Volcanism Program (GVP) database of eruptive histories and volcano information. The EFIS database is in the early stages of development and population; thus, this contribution also serves as a request for feedback from the community.

  7. The CIS Database: Occupational Health and Safety Information Online.

    ERIC Educational Resources Information Center

    Siegel, Herbert; Scurr, Erica

    1985-01-01

    Describes document acquisition, selection, indexing, and abstracting and discusses online searching of the CIS database, an online system produced by the International Occupational Safety and Health Information Centre. This database comprehensively covers information in the field of occupational health and safety. Sample searches and search…

  8. Subject Retrieval from Full-Text Databases in the Humanities

    ERIC Educational Resources Information Center

    East, John W.

    2007-01-01

    This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…

  9. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.

  10. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. Here, researchers investigate the effect of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks in a partitioned distributed database system. The researchers developed six probabilistic models and expressions for the number of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results obtained are compared to results from simulation. It was concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughput is also grossly undermined when such models are employed.

  11. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  12. The GLIMS Glacier Database

    NASA Astrophysics Data System (ADS)

    Raup, B. H.; Khalsa, S. S.; Armstrong, R.

    2007-12-01

    The Global Land Ice Measurements from Space (GLIMS) project has built a geospatial and temporal database of glacier data, composed of glacier outlines and various scalar attributes. These data are being derived primarily from satellite imagery, such as from ASTER and Landsat. Each "snapshot" of a glacier is from a specific time, and the database is designed to store multiple snapshots representative of different times. We have implemented two web-based interfaces to the database; one enables exploration of the data via interactive maps (web map server), while the other allows searches based on text-field constraints. The web map server is an Open Geospatial Consortium (OGC) compliant Web Map Server (WMS) and Web Feature Server (WFS). This means that other web sites can display glacier layers from our site over the Internet, or retrieve glacier features in vector format. All components of the system are implemented using Open Source software: Linux, PostgreSQL, PostGIS (geospatial extensions to the database), MapServer (WMS and WFS), and several supporting components such as Proj.4 (a geographic projection library) and PHP. These tools are robust and provide a flexible and powerful framework for web mapping applications. As a service to the GLIMS community, the database contains metadata on all ASTER imagery acquired over glacierized terrain. Reduced-resolution of the images (browse imagery) can be viewed either as a layer in the MapServer application, or overlaid on the virtual globe within Google Earth. The interactive map application allows the user to constrain by time what data appear on the map. For example, ASTER or glacier outlines from 2002 only, or from Autumn in any year, can be displayed. The system allows users to download their selected glacier data in a choice of formats. The results of a query based on spatial selection (using a mouse) or text-field constraints can be downloaded in any of these formats: ESRI shapefiles, KML (Google Earth), Map

  13. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and Specificity analysis.

    SciTech Connect

    Kapp, Eugene; Schutz, Frederick; Connolly, Lisa M.; Chakel, John A.; Meza, Jose E.; Miller, Christine A.; Fenyo, David; Eng, Jimmy K.; Adkins, Joshua N.; Omenn, Gilbert; Simpson, Richard

    2005-08-01

    MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X-Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, Peptide Prophet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X-Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of ''consensus scoring'', i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

  14. Integrating Variances into an Analytical Database

    NASA Technical Reports Server (NTRS)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  15. Citation Searching: Search Smarter & Find More

    ERIC Educational Resources Information Center

    Hammond, Chelsea C.; Brown, Stephanie Willen

    2008-01-01

    The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…

  16. Database Marketplace 2002: The Database Universe.

    ERIC Educational Resources Information Center

    Tenopir, Carol; Baker, Gayle; Robinson, William

    2002-01-01

    Reviews the database industry over the past year, including new companies and services, company closures, popular database formats, popular access methods, and changes in existing products and services. Lists 33 firms and their database services; 33 firms and their database products; and 61 company profiles. (LRW)

  17. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  18. Construction of Database for Pulsating Variable Stars

    NASA Astrophysics Data System (ADS)

    Chen, B. Q.; Yang, M.; Jiang, B. W.

    2011-07-01

    A database for the pulsating variable stars is constructed for Chinese astronomers to study the variable stars conveniently. The database includes about 230000 variable stars in the Galactic bulge, LMC and SMC observed by the MACHO (MAssive Compact Halo Objects) and OGLE (Optical Gravitational Lensing Experiment) projects at present. The software used for the construction is LAMP, i.e., Linux+Apache+MySQL+PHP. A web page is provided to search the photometric data and the light curve in the database through the right ascension and declination of the object. More data will be incorporated into the database.

  19. Using the DFCI Gene Index Databases for Biological Discovery

    PubMed Central

    Antonescu, Corina; Antonescu, Valentin; Sultana, Razvan; Quackenbush, John

    2014-01-01

    The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information. PMID:20205187

  20. Probabilistic Fatigue Damage Program (FATIG)

    NASA Technical Reports Server (NTRS)

    Michalopoulos, Constantine

    2012-01-01

    FATIG computes fatigue damage/fatigue life using the stress rms (root mean square) value, the total number of cycles, and S-N curve parameters. The damage is computed by the following methods: (a) traditional method using Miner s rule with stress cycles determined from a Rayleigh distribution up to 3*sigma; and (b) classical fatigue damage formula involving the Gamma function, which is derived from the integral version of Miner's rule. The integration is carried out over all stress amplitudes. This software solves the problem of probabilistic fatigue damage using the integral form of the Palmgren-Miner rule. The software computes fatigue life using an approach involving all stress amplitudes, up to N*sigma, as specified by the user. It can be used in the design of structural components subjected to random dynamic loading, or by any stress analyst with minimal training for fatigue life estimates of structural components.

  1. Probabilistic cloning of equidistant states

    SciTech Connect

    Jimenez, O.; Roa, Luis; Delgado, A.

    2010-08-15

    We study the probabilistic cloning of equidistant states. These states are such that the inner product between them is a complex constant or its conjugate. Thereby, it is possible to study their cloning in a simple way. In particular, we are interested in the behavior of the cloning probability as a function of the phase of the overlap among the involved states. We show that for certain families of equidistant states Duan and Guo's cloning machine leads to cloning probabilities lower than the optimal unambiguous discrimination probability of equidistant states. We propose an alternative cloning machine whose cloning probability is higher than or equal to the optimal unambiguous discrimination probability for any family of equidistant states. Both machines achieve the same probability for equidistant states whose inner product is a positive real number.

  2. Probabilistic Reasoning for Plan Robustness

    NASA Technical Reports Server (NTRS)

    Schaffer, Steve R.; Clement, Bradley J.; Chien, Steve A.

    2005-01-01

    A planning system must reason about the uncertainty of continuous variables in order to accurately project the possible system state over time. A method is devised for directly reasoning about the uncertainty in continuous activity duration and resource usage for planning problems. By representing random variables as parametric distributions, computing projected system state can be simplified in some cases. Common approximation and novel methods are compared for over-constrained and lightly constrained domains. The system compares a few common approximation methods for an iterative repair planner. Results show improvements in robustness over the conventional non-probabilistic representation by reducing the number of constraint violations witnessed by execution. The improvement is more significant for larger problems and problems with higher resource subscription levels but diminishes as the system is allowed to accept higher risk levels.

  3. Database Management Systems: New Homes for Migrating Bibliographic Records.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.; Bierbaum, Esther G.

    1987-01-01

    Assesses bibliographic databases as part of visionary text systems such as hypertext and scholars' workstations. Downloading is discussed in terms of the capability to search records and to maintain unique bibliographic descriptions, and relational database management systems, file managers, and text databases are reviewed as possible hosts for…

  4. The Cystic Fibrosis Database: Content and Research Opportunities.

    ERIC Educational Resources Information Center

    Shaw, William M., Jr.; And Others

    1991-01-01

    Describes the files contained in the Cystic Fibrosis (CF) database and discusses educational and research opportunities using this database. Topics discussed include queries, evaluating the relevance of items retrieved, and use of the database in an online searching course in the School of Information and Library Science at the University of North…

  5. Development of probabilistic multimedia multipathway computer codes.

    SciTech Connect

    Yu, C.; LePoire, D.; Gnanapragasam, E.; Arnish, J.; Kamboj, S.; Biwer, B. M.; Cheng, J.-J.; Zielen, A. J.; Chen, S. Y.; Mo, T.; Abu-Eid, R.; Thaggard, M.; Sallo, A., III.; Peterson, H., Jr.; Williams, W. A.; Environmental Assessment; NRC; EM

    2002-01-01

    The deterministic multimedia dose/risk assessment codes RESRAD and RESRAD-BUILD have been widely used for many years for evaluation of sites contaminated with residual radioactive materials. The RESRAD code applies to the cleanup of sites (soils) and the RESRAD-BUILD code applies to the cleanup of buildings and structures. This work describes the procedure used to enhance the deterministic RESRAD and RESRAD-BUILD codes for probabilistic dose analysis. A six-step procedure was used in developing default parameter distributions and the probabilistic analysis modules. These six steps include (1) listing and categorizing parameters; (2) ranking parameters; (3) developing parameter distributions; (4) testing parameter distributions for probabilistic analysis; (5) developing probabilistic software modules; and (6) testing probabilistic modules and integrated codes. The procedures used can be applied to the development of other multimedia probabilistic codes. The probabilistic versions of RESRAD and RESRAD-BUILD codes provide tools for studying the uncertainty in dose assessment caused by uncertain input parameters. The parameter distribution data collected in this work can also be applied to other multimedia assessment tasks and multimedia computer codes.

  6. Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos.

    PubMed

    Demirkus, Meltem; Precup, Doina; Clark, James J; Arbel, Tal

    2016-06-01

    Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively. PMID:26415152

  7. Rice Glycosyltransferase (GT) Phylogenomic Database

    DOE Data Explorer

    Ronald, Pamela

    The Ronald Laboratory staff at the University of California-Davis has a primary research focus on the genes of the rice plant. They study the role that genetics plays in the way rice plants respond to their environment. They created the Rice GT Database in order to integrate functional genomic information for putative rice Glycosyltransferases (GTs). This database contains information on nearly 800 putative rice GTs (gene models) identified by sequence similarity searches based on the Carbohydrate Active enZymes (CAZy) database. The Rice GT Database provides a platform to display user-selected functional genomic data on a phylogenetic tree. This includes sequence information, mutant line information, expression data, etc. An interactive chromosomal map shows the position of all rice GTs, and links to rice annotation databases are included. The format is intended to "facilitate the comparison of closely related GTs within different families, as well as perform global comparisons between sets of related families." [From http://ricephylogenomics.ucdavis.edu/cellwalls/gt/genInfo.shtml] See also the primary paper discussing this work: Peijian Cao, Laura E. Bartley, Ki-Hong Jung and Pamela C. Ronalda. Construction of a Rice Glycosyltransferase Phylogenomic Database and Identification of Rice-Diverged Glycosyltransferases. Molecular Plant, 2008, 1(5): 858-877.

  8. The EMBL Nucleotide Sequence Database.

    PubMed

    Stoesser, G; Tuli, M A; Lopez, R; Sterk, P

    1999-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl.html) constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. While automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO), the preferred submission tool for individual submitters is Webin (WWW). Through all stages, dataflow is monitored by EBI biologists communicating with the sequencing groups. In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute (EBI). Database releases are produced quarterly and are distributed on CD-ROM. Network services allow access to the most up-to-date data collection via Internet and World Wide Web interface. EBI's Sequence Retrieval System (SRS) is a Network Browser for Databanks in Molecular Biology, integrating and linking the main nucleotide and protein databases, plus many specialised databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, Blast etc) are available for external users to compare their own sequences against the most currently available data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:9847133

  9. The Majorana Parts Tracking Database

    DOE PAGESBeta

    Abgrall, N.

    2015-01-16

    The Majorana Demonstrator is an ultra-low background physics experiment searching for the neutrinoless double beta decay of 76Ge. The Majorana Parts Tracking Database is used to record the history of components used in the construction of the Demonstrator. The tracking implementation takes a novel approach based on the schema-free database technology CouchDB. Transportation, storage, and processes undergone by parts such as machining or cleaning are linked to part records. Tracking parts provides a great logistics benefit and an important quality assurance reference during construction. In addition, the location history of parts provides an estimate of their exposure to cosmic radiation.more » In summary, a web application for data entry and a radiation exposure calculator have been developed as tools for achieving the extreme radio-purity required for this rare decay search.« less

  10. The Majorana Parts Tracking Database

    SciTech Connect

    Abgrall, N.

    2015-01-16

    The Majorana Demonstrator is an ultra-low background physics experiment searching for the neutrinoless double beta decay of 76Ge. The Majorana Parts Tracking Database is used to record the history of components used in the construction of the Demonstrator. The tracking implementation takes a novel approach based on the schema-free database technology CouchDB. Transportation, storage, and processes undergone by parts such as machining or cleaning are linked to part records. Tracking parts provides a great logistics benefit and an important quality assurance reference during construction. In addition, the location history of parts provides an estimate of their exposure to cosmic radiation. In summary, a web application for data entry and a radiation exposure calculator have been developed as tools for achieving the extreme radio-purity required for this rare decay search.

  11. The MAJORANA Parts Tracking Database

    NASA Astrophysics Data System (ADS)

    Abgrall, N.; Aguayo, E.; Avignone, F. T.; Barabash, A. S.; Bertrand, F. E.; Brudanin, V.; Busch, M.; Byram, D.; Caldwell, A. S.; Chan, Y.-D.; Christofferson, C. D.; Combs, D. C.; Cuesta, C.; Detwiler, J. A.; Doe, P. J.; Efremenko, Yu.; Egorov, V.; Ejiri, H.; Elliott, S. R.; Esterline, J.; Fast, J. E.; Finnerty, P.; Fraenkle, F. M.; Galindo-Uribarri, A.; Giovanetti, G. K.; Goett, J.; Green, M. P.; Gruszko, J.; Guiseppe, V. E.; Gusev, K.; Hallin, A. L.; Hazama, R.; Hegai, A.; Henning, R.; Hoppe, E. W.; Howard, S.; Howe, M. A.; Keeter, K. J.; Kidd, M. F.; Kochetov, O.; Konovalov, S. I.; Kouzes, R. T.; LaFerriere, B. D.; Leon, J. Diaz; Leviner, L. E.; Loach, J. C.; MacMullin, J.; Martin, R. D.; Meijer, S. J.; Mertens, S.; Miller, M. L.; Mizouni, L.; Nomachi, M.; Orrell, J. L.; O`Shaughnessy, C.; Overman, N. R.; Petersburg, R.; Phillips, D. G.; Poon, A. W. P.; Pushkin, K.; Radford, D. C.; Rager, J.; Rielage, K.; Robertson, R. G. H.; Romero-Romero, E.; Ronquest, M. C.; Shanks, B.; Shima, T.; Shirchenko, M.; Snavely, K. J.; Snyder, N.; Soin, A.; Suriano, A. M.; Tedeschi, D.; Thompson, J.; Timkin, V.; Tornow, W.; Trimble, J. E.; Varner, R. L.; Vasilyev, S.; Vetter, K.; Vorren, K.; White, B. R.; Wilkerson, J. F.; Wiseman, C.; Xu, W.; Yakushev, E.; Young, A. R.; Yu, C.-H.; Yumatov, V.; Zhitnikov, I.

    2015-04-01

    The MAJORANA DEMONSTRATOR is an ultra-low background physics experiment searching for the neutrinoless double beta decay of 76Ge. The MAJORANA Parts Tracking Database is used to record the history of components used in the construction of the DEMONSTRATOR. The tracking implementation takes a novel approach based on the schema-free database technology CouchDB. Transportation, storage, and processes undergone by parts such as machining or cleaning are linked to part records. Tracking parts provide a great logistics benefit and an important quality assurance reference during construction. In addition, the location history of parts provides an estimate of their exposure to cosmic radiation. A web application for data entry and a radiation exposure calculator have been developed as tools for achieving the extreme radio-purity required for this rare decay search.

  12. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery. PMID:26017444

  13. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  14. Probabilistic population projections with migration uncertainty

    PubMed Central

    Azose, Jonathan J.; Ševčíková, Hana; Raftery, Adrian E.

    2016-01-01

    We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations’ Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find that uncertainty in migration projection is a substantial contributor to uncertainty in population projections for many countries. Prediction intervals for the populations of Northern America and Europe are over 70% wider, whereas prediction intervals for the populations of Africa, Asia, and the world as a whole are nearly unchanged. Out-of-sample validation shows that the model is reasonably well calibrated. PMID:27217571

  15. Probabilistic population projections with migration uncertainty.

    PubMed

    Azose, Jonathan J; Ševčíková, Hana; Raftery, Adrian E

    2016-06-01

    We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations' Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find that uncertainty in migration projection is a substantial contributor to uncertainty in population projections for many countries. Prediction intervals for the populations of Northern America and Europe are over 70% wider, whereas prediction intervals for the populations of Africa, Asia, and the world as a whole are nearly unchanged. Out-of-sample validation shows that the model is reasonably well calibrated. PMID:27217571

  16. Mouse Phenome Database

    PubMed Central

    Grubb, Stephen C.; Bult, Carol J.; Bogue, Molly A.

    2014-01-01

    The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements. PMID:24243846

  17. Mouse phenome database.

    PubMed

    Grubb, Stephen C; Bult, Carol J; Bogue, Molly A

    2014-01-01

    The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements. PMID:24243846

  18. Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree.

    PubMed

    Carneiro, Gustavo; Georgescu, Bogdan; Good, Sara; Comaniciu, Dorin

    2008-09-01

    We propose a novel method for the automatic detection and measurement of fetal anatomical structures in ultrasound images. This problem offers a myriad of challenges, including: difficulty of modeling the appearance variations of the visual object of interest, robustness to speckle noise and signal dropout, and large search space of the detection procedure. Previous solutions typically rely on the explicit encoding of prior knowledge and formulation of the problem as a perceptual grouping task solved through clustering or variational approaches. These methods are constrained by the validity of the underlying assumptions and usually are not enough to capture the complex appearances of fetal anatomies. We propose a novel system for fast automatic detection and measurement of fetal anatomies that directly exploits a large database of expert annotated fetal anatomical structures in ultrasound images. Our method learns automatically to distinguish between the appearance of the object of interest and background by training a constrained probabilistic boosting tree classifier. This system is able to produce the automatic segmentation of several fetal anatomies using the same basic detection algorithm. We show results on fully automatic measurement of biparietal diameter (BPD), head circumference (HC), abdominal circumference (AC), femur length (FL), humerus length (HL), and crown rump length (CRL). Notice that our approach is the first in the literature to deal with the HL and CRL measurements. Extensive experiments (with clinical validation) show that our system is, on average, close to the accuracy of experts in terms of segmentation and obstetric measurements. Finally, this system runs under half second on a standard dual-core PC computer. PMID:18753047

  19. Blocks database and its applications.

    PubMed

    Henikoff, J G; Henikoff, S

    1996-01-01

    Protein blocks consist of multiply aligned sequence segments without gaps that represent the most highly conserved regions of protein families. A database of blocks has been constructed by successive application of the fully automated PROTOMAT system to lists of protein family members obtained from Prosite documentation. Currently, Blocks 8.0 based on protein families documented in Prosite 12 consists of 2884 blocks representing 770 families. Searches of the Blocks Database are carried out using protein or DNA sequence queries, and results are returned with measures of significance for both single and multiple block hits. The databse has also proved useful for derivation of amino acid substitution matrices (the Blosum series) and other sets of parameters. WWW and E-mail servers provide access to the database and associated functions, including a block maker for sequences provided by the user. PMID:8743679

  20. Open Clients for Distributed Databases

    NASA Astrophysics Data System (ADS)

    Chayes, D. N.; Arko, R. A.

    2001-12-01

    We are actively developing a collection of open source example clients that demonstrate use of our "back end" data management infrastructure. The data management system is reported elsewhere at this meeting (Arko and Chayes: A Scaleable Database Infrastructure). In addition to their primary goal of being examples for others to build upon, some of these clients may have limited utility in them selves. More information about the clients and the data infrastructure is available on line at http://data.ldeo.columbia.edu. The available examples to be demonstrated include several web-based clients including those developed for the Community Review System of the Digital Library for Earth System Education, a real-time watch standers log book, an offline interface to use log book entries, a simple client to search on multibeam metadata and others are Internet enabled and generally web-based front ends that support searches against one or more relational databases using industry standard SQL queries. In addition to the web based clients, simple SQL searches from within Excel and similar applications will be demonstrated. By defining, documenting and publishing a clear interface to the fully searchable databases, it becomes relatively easy to construct client interfaces that are optimized for specific applications in comparison to building a monolithic data and user interface system.

  1. A simple probabilistic model of multibody interactions in proteins.

    PubMed

    Johansson, Kristoffer Enøe; Hamelryck, Thomas

    2013-08-01

    Protein structure prediction methods typically use statistical potentials, which rely on statistics derived from a database of know protein structures. In the vast majority of cases, these potentials involve pairwise distances or contacts between amino acids or atoms. Although some potentials beyond pairwise interactions have been described, the formulation of a general multibody potential is seen as intractable due to the perceived limited amount of data. In this article, we show that it is possible to formulate a probabilistic model of higher order interactions in proteins, without arbitrarily limiting the number of contacts. The success of this approach is based on replacing a naive table-based approach with a simple hierarchical model involving suitable probability distributions and conditional independence assumptions. The model captures the joint probability distribution of an amino acid and its neighbors, local structure and solvent exposure. We show that this model can be used to approximate the conditional probability distribution of an amino acid sequence given a structure using a pseudo-likelihood approach. We verify the model by decoy recognition and site-specific amino acid predictions. Our coarse-grained model is compared to state-of-art methods that use full atomic detail. This article illustrates how the use of simple probabilistic models can lead to new opportunities in the treatment of nonlocal interactions in knowledge-based protein structure prediction and design. PMID:23468247

  2. Probabilistic Assessment of Radiation Risk for Astronauts in Space Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee; DeAngelis, Giovanni; Cucinotta, Francis A.

    2009-01-01

    Accurate predictions of the health risks to astronauts from space radiation exposure are necessary for enabling future lunar and Mars missions. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons, (less than 100 MeV); and galactic cosmic rays (GCR), which include protons and heavy ions of higher energies. While the expected frequency of SPEs is strongly influenced by the solar activity cycle, SPE occurrences themselves are random in nature. A solar modulation model has been developed for the temporal characterization of the GCR environment, which is represented by the deceleration potential, phi. The risk of radiation exposure from SPEs during extra-vehicular activities (EVAs) or in lightly shielded vehicles is a major concern for radiation protection, including determining the shielding and operational requirements for astronauts and hardware. To support the probabilistic risk assessment for EVAs, which would be up to 15% of crew time on lunar missions, we estimated the probability of SPE occurrence as a function of time within a solar cycle using a nonhomogeneous Poisson model to fit the historical database of measurements of protons with energy > 30 MeV, (phi)30. The resultant organ doses and dose equivalents, as well as effective whole body doses for acute and cancer risk estimations are analyzed for a conceptual habitat module and a lunar rover during defined space mission periods. This probabilistic approach to radiation risk assessment from SPE and GCR is in support of mission design and operational planning to manage radiation risks for space exploration.

  3. Accuracy of Probabilistic Linkage Using the Enhanced Matching System for Public Health and Epidemiological Studies

    PubMed Central

    Aldridge, Robert W.; Shaji, Kunju; Hayward, Andrew C.; Abubakar, Ibrahim

    2015-01-01

    Background The Enhanced Matching System (EMS) is a probabilistic record linkage program developed by the tuberculosis section at Public Health England to match data for individuals across two datasets. This paper outlines how EMS works and investigates its accuracy for linkage across public health datasets. Methods EMS is a configurable Microsoft SQL Server database program. To examine the accuracy of EMS, two public health databases were matched using National Health Service (NHS) numbers as a gold standard unique identifier. Probabilistic linkage was then performed on the same two datasets without inclusion of NHS number. Sensitivity analyses were carried out to examine the effect of varying matching process parameters. Results Exact matching using NHS number between two datasets (containing 5931 and 1759 records) identified 1071 matched pairs. EMS probabilistic linkage identified 1068 record pairs. The sensitivity of probabilistic linkage was calculated as 99.5% (95%CI: 98.9, 99.8), specificity 100.0% (95%CI: 99.9, 100.0), positive predictive value 99.8% (95%CI: 99.3, 100.0), and negative predictive value 99.9% (95%CI: 99.8, 100.0). Probabilistic matching was most accurate when including address variables and using the automatically generated threshold for determining links with manual review. Conclusion With the establishment of national electronic datasets across health and social care, EMS enables previously unanswerable research questions to be tackled with confidence in the accuracy of the linkage process. In scenarios where a small sample is being matched into a very large database (such as national records of hospital attendance) then, compared to results presented in this analysis, the positive predictive value or sensitivity may drop according to the prevalence of matches between databases. Despite this possible limitation, probabilistic linkage has great potential to be used where exact matching using a common identifier is not possible, including in

  4. A Robust Tsunami Deposit Database For California

    NASA Astrophysics Data System (ADS)

    Wilson, R. I.; Hemphill-Haley, E.; Admire, A. R.

    2012-12-01

    The California Geological Survey (CGS) has partnered with Humboldt State University (HSU) to produce a robust statewide tsunami deposit database to facilitate the evaluation of tsunami hazard products for both emergency response and land-use planning and development. The California tsunami deposit database attributes compliment and expand on existing tsunami deposit databases from the National Geophysical Data Center (NGDC) (Global), the USGS (Cascadia Subduction Zone), and the Oregon Department of Geology and Mineral Industries (DOGAMI) (adjacent state). Whereas the existing NGDC and USGS databases focus on references or individual tsunami layers, this new State-maintained database concentrates on the location and contents of individual cores/trenches that sample tsunami deposits, including laboratory tests to evaluate sample grain-size, geochemistry, microfossils, and age-dating results. The first generation of the database is completed and includes 94 cores from six studies in northern California, at the southern end of the Cascadia Subduction Zone. A second generation of the database will include recently collected tsunami deposit information for the rest of California. These data provide an important observational benchmark for evaluating the results of tsunami inundation modeling. CGS is collaborating with and sharing the database entry form with other states to encourage its continued development beyond California's coastline so that tsunami deposits can be more easily evaluated on a regional basis, a recommendation of the National Tsunami Hazard Mitigation Program. This database is being used to help CGS in the development and validation of updates to their existing inundation maps for emergency planning, and probabilistic tsunami hazard analyses (PTHA) of value to local land-use planning and coastal development.

  5. Development and use of a train-level probabilistic risk assessment

    SciTech Connect

    Smith, C.L.; Fowler, R.D.; Wolfram, L.M.

    1993-04-01

    The Idaho National Engineering Laboratory examined the potential for the development of train-level probabilistic risk assessment (PRA) databases. These train-level databases will allow the Nuclear Regulatory Commission to investigate effects on plant core damage frequency (CDF) given a train is failed or taken out of service. The intent of this task was to develop user-friendly databases that required a minimalamount of personnel involvement to be usable. It was originally intended that the train-level models would not be expanded to include basic events below the top gate of a train, with the possible exception of including some of the major train-related components (e.g., important pumps and motor-operated valves). It was found that a database similar to the original plant PRA provided the accuracy needed to measure the changes in plant CDF. The Peach Bottom Unit 2 NUREG-1150 PRA (a large fault tree model) and the Beaver Valley Unit 2 IPE (a large event tree model) were selected to demonstrate the feasibility of developing train-level databases. Five different methods for developing train-level databases were hypothesized and are examined. Ultimately, two train-level databases were developed using the Peach Bottom Unit 2 PRA and onetrain-level database was developed using the Beaver Valley Unit 2 IPE. The development, use, limitations, and results of these train-level databases are discussed.

  6. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  7. Curcumin Resource Database.

    PubMed

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. PMID:26220923

  8. Multidimensional analysis and probabilistic model of volcanic and seismic activities

    NASA Astrophysics Data System (ADS)

    Fedorov, V.

    2009-04-01

    .I. Gushchenko, 1979) and seismological (database of USGS/NEIC Significant Worldwide Earthquakes, 2150 B.C.- 1994 A.D.) information which displays dynamics of endogenic relief-forming processes over a period of 1900 to 1994. In the course of the analysis, a substitution of calendar variable by a corresponding astronomical one has been performed and the epoch superposition method was applied. In essence, the method consists in that the massifs of information on volcanic eruptions (over a period of 1900 to 1977) and seismic events (1900-1994) are differentiated with respect to value of astronomical parameters which correspond to the calendar dates of the known eruptions and earthquakes, regardless of the calendar year. The obtained spectra of volcanic eruptions and violent earthquake distribution in the fields of the Earth orbital movement parameters were used as a basis for calculation of frequency spectra and diurnal probability of volcanic and seismic activity. The objective of the proposed investigations is a probabilistic model development of the volcanic and seismic events, as well as GIS designing for monitoring and forecast of volcanic and seismic activities. In accordance with the stated objective, three probability parameters have been found in the course of preliminary studies; they form the basis for GIS-monitoring and forecast development. 1. A multidimensional analysis of volcanic eruption and earthquakes (of magnitude 7) have been performed in terms of the Earth orbital movement. Probability characteristics of volcanism and seismicity have been defined for the Earth as a whole. Time intervals have been identified with a diurnal probability twice as great as the mean value. Diurnal probability of volcanic and seismic events has been calculated up to 2020. 2. A regularity is found in duration of dormant (repose) periods has been established. A relationship has been found between the distribution of the repose period probability density and duration of the period. 3

  9. Probabilistic model better defines development well risks

    SciTech Connect

    Connolly, M.R.

    1996-10-14

    Probabilistic techniques to compare and rank projects, such as the drilling of development wells, often are more representative than decision tree or deterministic approaches. As opposed to traditional deterministic methods, probabilistic analysis gives decision-makers ranges of outcomes with associated probabilities of occurrence. This article analyzes the drilling of a hypothetical development well with actual field data (such as stabilized initial rates, production declines, and gas/oil ratios) to calculate probabilistic reserves, and production flow streams. Analog operating data were included to build distributions for capital and operating costs. Economics from the Monte Carlo simulation include probabilistic production flow streams and cost distributions. Results include single parameter distributions (reserves, net present value, and profitability index) and time function distributions (annual production and net cash flow).

  10. Non-unitary probabilistic quantum computing

    NASA Technical Reports Server (NTRS)

    Gingrich, Robert M.; Williams, Colin P.

    2004-01-01

    We present a method for designing quantum circuits that perform non-unitary quantum computations on n-qubit states probabilistically, and give analytic expressions for the success probability and fidelity.

  11. Do probabilistic forecasts lead to better decisions?

    NASA Astrophysics Data System (ADS)

    Ramos, M. H.; van Andel, S. J.; Pappenberger, F.

    2012-12-01

    The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also start putting attention to ways of communicating the probabilistic forecasts to decision makers. Communicating probabilistic forecasts includes preparing tools and products for visualization, but also requires understanding how decision makers perceive and use uncertainty information in real-time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision makers. Answers were collected and analyzed. In this paper, we present the results of this exercise and discuss if indeed we make better decisions on the basis of probabilistic forecasts.

  12. Do probabilistic forecasts lead to better decisions?

    NASA Astrophysics Data System (ADS)

    Ramos, M. H.; van Andel, S. J.; Pappenberger, F.

    2013-06-01

    The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also started focusing attention on ways of communicating the probabilistic forecasts to decision-makers. Communicating probabilistic forecasts includes preparing tools and products for visualisation, but also requires understanding how decision-makers perceive and use uncertainty information in real time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision-makers. Answers were collected and analysed. In this paper, we present the results of this exercise and discuss if we indeed make better decisions on the basis of probabilistic forecasts.

  13. COMMUNICATING PROBABILISTIC RISK OUTCOMES TO RISK MANAGERS

    EPA Science Inventory

    Increasingly, risk assessors are moving away from simple deterministic assessments to probabilistic approaches that explicitly incorporate ecological variability, measurement imprecision, and lack of knowledge (collectively termed "uncertainty"). While the new methods provide an...

  14. Probabilistic micromechanics for high-temperature composites

    NASA Technical Reports Server (NTRS)

    Reddy, J. N.

    1993-01-01

    The three-year program of research had the following technical objectives: the development of probabilistic methods for micromechanics-based constitutive and failure models, application of the probabilistic methodology in the evaluation of various composite materials and simulation of expected uncertainties in unidirectional fiber composite properties, and influence of the uncertainties in composite properties on the structural response. The first year of research was devoted to the development of probabilistic methodology for micromechanics models. The second year of research focused on the evaluation of the Chamis-Hopkins constitutive model and Aboudi constitutive model using the methodology developed in the first year of research. The third year of research was devoted to the development of probabilistic finite element analysis procedures for laminated composite plate and shell structures.

  15. A Probabilistic Formulation for Hausdorff Matching

    NASA Technical Reports Server (NTRS)

    Olson, Clark F.

    1998-01-01

    Matching images based on a Hausdorff measure has become popular for computer vision applications. In this paper, we develope a probabilistic formulation for Hausdorff matching in terms of maximum likelihood estimation.

  16. De novo protein conformational sampling using a probabilistic graphical model

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Debswapna; Cheng, Jianlin

    2015-11-01

    Efficient exploration of protein conformational space remains challenging especially for large proteins when assembling discretized structural fragments extracted from a protein structure data database. We propose a fragment-free probabilistic graphical model, FUSION, for conformational sampling in continuous space and assess its accuracy using ‘blind’ protein targets with a length up to 250 residues from the CASP11 structure prediction exercise. The method reduces sampling bottlenecks, exhibits strong convergence, and demonstrates better performance than the popular fragment assembly method, ROSETTA, on relatively larger proteins with a length of more than 150 residues in our benchmark set. FUSION is freely available through a web server at http://protein.rnet.missouri.edu/FUSION/.

  17. De novo protein conformational sampling using a probabilistic graphical model

    PubMed Central

    Bhattacharya, Debswapna; Cheng, Jianlin

    2015-01-01

    Efficient exploration of protein conformational space remains challenging especially for large proteins when assembling discretized structural fragments extracted from a protein structure data database. We propose a fragment-free probabilistic graphical model, FUSION, for conformational sampling in continuous space and assess its accuracy using ‘blind’ protein targets with a length up to 250 residues from the CASP11 structure prediction exercise. The method reduces sampling bottlenecks, exhibits strong convergence, and demonstrates better performance than the popular fragment assembly method, ROSETTA, on relatively larger proteins with a length of more than 150 residues in our benchmark set. FUSION is freely available through a web server at http://protein.rnet.missouri.edu/FUSION/. PMID:26541939

  18. Emulation for probabilistic weather forecasting

    NASA Astrophysics Data System (ADS)

    Cornford, Dan; Barillec, Remi

    2010-05-01

    Numerical weather prediction models are typically very expensive to run due to their complexity and resolution. Characterising the sensitivity of the model to its initial condition and/or to its parameters requires numerous runs of the model, which is impractical for all but the simplest models. To produce probabilistic forecasts requires knowledge of the distribution of the model outputs, given the distribution over the inputs, where the inputs include the initial conditions, boundary conditions and model parameters. Such uncertainty analysis for complex weather prediction models seems a long way off, given current computing power, with ensembles providing only a partial answer. One possible way forward that we develop in this work is the use of statistical emulators. Emulators provide an efficient statistical approximation to the model (or simulator) while quantifying the uncertainty introduced. In the emulator framework, a Gaussian process is fitted to the simulator response as a function of the simulator inputs using some training data. The emulator is essentially an interpolator of the simulator output and the response in unobserved areas is dictated by the choice of covariance structure and parameters in the Gaussian process. Suitable parameters are inferred from the data in a maximum likelihood, or Bayesian framework. Once trained, the emulator allows operations such as sensitivity analysis or uncertainty analysis to be performed at a much lower computational cost. The efficiency of emulators can be further improved by exploiting the redundancy in the simulator output through appropriate dimension reduction techniques. We demonstrate this using both Principal Component Analysis on the model output and a new reduced-rank emulator in which an optimal linear projection operator is estimated jointly with other parameters, in the context of simple low order models, such as the Lorenz 40D system. We present the application of emulators to probabilistic weather

  19. End User Information Searching on the Internet: How Do Users Search and What Do They Search For? (SIG USE)

    ERIC Educational Resources Information Center

    Saracevic, Tefko

    2000-01-01

    Summarizes a presentation that discussed findings and implications of research projects using an Internet search service and Internet-accessible vendor databases, representing the two sides of public database searching: query formulation and resource utilization. Presenters included: Tefko Saracevic, Amanda Spink, Dietmar Wolfram and Hong Xie.…

  20. NASA Taxonomies for Searching Problem Reports and FMEAs

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Throop, David R.

    2006-01-01

    Many types of hazard and risk analyses are used during the life cycle of complex systems, including Failure Modes and Effects Analysis (FMEA), Hazard Analysis, Fault Tree and Event Tree Analysis, Probabilistic Risk Assessment, Reliability Analysis and analysis of Problem Reporting and Corrective Action (PRACA) databases. The success of these methods depends on the availability of input data and the analysts knowledge. Standard nomenclature can increase the reusability of hazard, risk and problem data. When nomenclature in the source texts is not standard, taxonomies with mapping words (sets of rough synonyms) can be combined with semantic search to identify items and tag them with metadata based on a rich standard nomenclature. Semantic search uses word meanings in the context of parsed phrases to find matches. The NASA taxonomies provide the word meanings. Spacecraft taxonomies and ontologies (generalization hierarchies with attributes and relationships, based on terms meanings) are being developed for types of subsystems, functions, entities, hazards and failures. The ontologies are broad and general, covering hardware, software and human systems. Semantic search of Space Station texts was used to validate and extend the taxonomies. The taxonomies have also been used to extract system connectivity (interaction) models and functions from requirements text. Now the Reconciler semantic search tool and the taxonomies are being applied to improve search in the Space Shuttle PRACA database, to discover recurring patterns of failure. Usual methods of string search and keyword search fall short because the entries are terse and have numerous shortcuts (irregular abbreviations, nonstandard acronyms, cryptic codes) and modifier words cannot be used in sentence context to refine the search. The limited and fixed FMEA categories associated with the entries do not make the fine distinctions needed in the search. The approach assigns PRACA report titles to problem classes in