Science.gov

Sample records for probabilistic database search

  1. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  2. XLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides.

    PubMed

    Ji, Chao; Li, Sujun; Reilly, James P; Radivojac, Predrag; Tang, Haixu

    2016-06-01

    Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein-protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch ). PMID:27068484

  3. Online Database Searching Workbook.

    ERIC Educational Resources Information Center

    Littlejohn, Alice C.; Parker, Joan M.

    Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…

  4. Database Searching by Managers.

    ERIC Educational Resources Information Center

    Arnold, Stephen E.

    Managers and executives need the easy and quick access to business and management information that online databases can provide, but many have difficulty articulating their search needs to an intermediary. One possible solution would be to encourage managers and their immediate support staff members to search textual databases directly as they now…

  5. Begin: Online Database Searching Now!

    ERIC Educational Resources Information Center

    Lodish, Erica K.

    1986-01-01

    Because of the increasing importance of online databases, school library media specialists are encouraged to introduce students to online searching. Four books that would help media specialists gain a basic background are reviewed and it is noted that although they are very technical, they can be adapted to individual needs. (EM)

  6. Searching NCBI databases using Entrez.

    PubMed

    Gibney, Gretchen; Baxevanis, Andreas D

    2011-06-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  7. Searching NCBI databases using Entrez.

    PubMed

    Baxevanis, Andreas D

    2008-12-01

    One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.

  8. Online Search Patterns: NLM CATLINE Database.

    ERIC Educational Resources Information Center

    Tolle, John E.; Hah, Sehchang

    1985-01-01

    Presents analysis of online search patterns within user searching sessions of National Library of Medicine ELHILL system and examines user search patterns on the CATLINE database. Data previously analyzed on MEDLINE database for same period is used to compare the performance parameters of different databases within the same information system.…

  9. A probabilistic approach to information retrieval in heterogeneous databases

    SciTech Connect

    Chatterjee, A.; Segev, A.

    1991-08-01

    During the post decade, organizations have increased their scope and operations beyond their traditional geographic boundaries. At the same time, they have adopted heterogeneous and incompatible information systems independent of each other without a careful consideration that one day they may need to be integrated. As a result of this diversity, many important business applications today require access to data stored in multiple autonomous databases. This paper examines a problem of inter-database information retrieval in a heterogeneous environment, where conventional techniques are no longer efficient. To solve the problem, broader definitions for join, union, intersection and selection operators are proposed. Also, a probabilistic method to specify the selectivity of these operators is discussed. An algorithm to compute these probabilities is provided in pseudocode.

  10. Library Instruction and Online Database Searching.

    ERIC Educational Resources Information Center

    Mercado, Heidi

    1999-01-01

    Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…

  11. Quantum search of a real unstructured database

    NASA Astrophysics Data System (ADS)

    Broda, Bogusław

    2016-02-01

    A simple circuit implementation of the oracle for Grover's quantum search of a real unstructured classical database is proposed. The oracle contains a kind of quantumly accessible classical memory, which stores the database.

  12. Online Database Searching in Smaller Public Libraries.

    ERIC Educational Resources Information Center

    Roose, Tina

    1983-01-01

    Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…

  13. Ranking search for probabilistic fingerprinting codes

    NASA Astrophysics Data System (ADS)

    Schäfer, Marcel; Berchtold, Waldemar; Steinebach, Martin

    2012-03-01

    Digital transaction watermarking today is a widely accepted mechanism to discourage illegal distribution of multimedia. The transaction watermark is a user-specific message that is embedded in all copies of one content and thus makes it individual. Therewith it allows to trace back copyright infringements. One major threat on transaction watermarking are collusion attacks. Here, multiple individualized copies of the work are compared and/or combined to attack the integrity or availability of the embedded watermark message. One solution to counter such attacks are mathematical codes called collusion secure fingerprinting codes. Problems arise when applying such codes to multimedia files with small payload, e.g. short audio tracks or images. Therefore the code length has to be shortened which increases the error rates and/or the effort of the tracing algorithm. In this work we propose an approach whether to use as an addition to probabilistic fingerprinting codes for a reduction of the effort and increment of security, as well as a new separate method providing shorter codes at a very fast and high accurate tracing algorithm.

  14. Searching and Indexing Genomic Databases via Kernelization

    PubMed Central

    Gagie, Travis; Puglisi, Simon J.

    2015-01-01

    The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

  15. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins.

    PubMed

    Surujon, Defne; Ratner, David I

    2016-01-01

    The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms. PMID:26751210

  16. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins

    PubMed Central

    Surujon, Defne; Ratner, David I.

    2016-01-01

    The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms. PMID:26751210

  17. Interactive searching of facial image databases

    NASA Astrophysics Data System (ADS)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  18. A Probabilistic Approach to Information Retrieval in Systems with Boolean Search Request Formulations.

    ERIC Educational Resources Information Center

    Radecki, Tadeusz

    1982-01-01

    Outlines an approach to information retrieval which integrates the existing theory of probabilistic retrieval into a practical methodology based on Boolean searches. Basic concepts, search methodology, and examples of Boolean searching are noted. Twenty-six sources are appended. (EJS)

  19. Active fault database of Japan: Its construction and search system

    NASA Astrophysics Data System (ADS)

    Yoshioka, T.; Miyamoto, F.

    2011-12-01

    The Active fault database of Japan was constructed by the Active Fault and Earthquake Research Center, GSJ/AIST and opened to the public on the Internet from 2005 to make a probabilistic evaluation of the future faulting event and earthquake occurrence on major active faults in Japan. The database consists of three sub-database, 1) sub-database on individual site, which includes long-term slip data and paleoseismicity data with error range and reliability, 2) sub-database on details of paleoseismicity, which includes the excavated geological units and faulting event horizons with age-control, 3) sub-database on characteristics of behavioral segments, which includes the fault-length, long-term slip-rate, recurrence intervals, most-recent-event, slip per event and best-estimate of cascade earthquake. Major seismogenic faults, those are approximately the best-estimate segments of cascade earthquake, each has a length of 20 km or longer and slip-rate of 0.1m/ky or larger and is composed from about two behavioral segments in average, are included in the database. This database contains information of active faults in Japan, sorted by the concept of "behavioral segments" (McCalpin, 1996). Each fault is subdivided into 550 behavioral segments based on surface trace geometry and rupture history revealed by paleoseismic studies. Behavioral segments can be searched on the Google Maps. You can select one behavioral segment directly or search segments in a rectangle area on the map. The result of search is shown on a fixed map or the Google Maps with information of geologic and paleoseismic parameters including slip rate, slip per event, recurrence interval, and calculated rupture probability in the future. Behavioral segments can be searched also by name or combination of fault parameters. All those data are compiled from journal articles, theses, and other documents. We are currently developing a revised edition, which is based on an improved database system. More than ten

  20. Online search patterns: NLM CATLINE database.

    PubMed

    Tolle, J E; Hah, S

    1985-03-01

    In this article the authors present their analysis of the online search patterns within user searching sessions of the National Library of Medicine ELHILL system and examine the user search patterns on the CATLINE database. In addition to the CATLINE analysis, a comparison is made using data previously analyzed on the MEDLINE database for the same time period, thus offering an opportunity to compare the performance parameters of different databases within the same information system. Data collection covers eight weeks and includes 441,282 transactions and over 11,067 user sessions, which accounted for 1680 hours of system usage. The descriptive analysis contained in this report can assists system design activities, while the predictive power of the transaction log analysis methodology may assists the development of real-time aids. PMID:10300015

  1. Multi-Database Searching in Forensic Psychology.

    ERIC Educational Resources Information Center

    Piotrowski, Chris; Perdue, Robert W.

    Traditional library skills have been augmented since the introduction of online computerized database services. Because of the complexity of the field, forensic psychology can benefit enormously from the application of comprehensive bibliographic search strategies. The study reported here demonstrated the bibliographic results obtained when a…

  2. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  3. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  4. A Bayesian network approach to the database search problem in criminal proceedings

    PubMed Central

    2012-01-01

    Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions

  5. Universal database search tool for proteomics

    PubMed Central

    Kim, Sangtae; Pevzner, Pavel A.

    2015-01-01

    Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyze tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral datasets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these datasets, MS-GF+ significantly increases the number of identified peptides compared to commonly used methods for peptide identifications. We emphasize that while MS-GF+ is not specifically designed for any particular experimental set-up, it improves upon the performance of tools specifically designed for these applications (e.g., specialized tools for phosphoproteomics). PMID:25358478

  6. Online Bibliographic Searching in the Humanities Databases: An Introduction.

    ERIC Educational Resources Information Center

    Suresh, Raghini S.

    Numerous easily accessible databases cover almost every subject area in the humanities. The principal database resources in the humanities are described. There are two major database vendors for humanities information: BRS (Bibliographic Retrieval Services) and DIALOG Information Services, Inc. As an introduction to online searching, this article…

  7. Multiple Database Searching: Techniques and Pitfalls

    ERIC Educational Resources Information Center

    Hawkins, Donald T.

    1978-01-01

    Problems involved in searching multiple data bases are discussed including indexing differences, overlap among data bases, variant spellings, and elimination of duplicate items from search output. Discussion focuses on CA Condensates, Inspec, and Metadex data bases. (J PF)

  8. Lost in Search: (Mal-)Adaptation to Probabilistic Decision Environments in Children and Adults

    ERIC Educational Resources Information Center

    Betsch, Tilmann; Lehmann, Anne; Lindow, Stefanie; Lang, Anna; Schoemann, Martin

    2016-01-01

    Adaptive decision making in probabilistic environments requires individuals to use probabilities as weights in predecisional information searches and/or when making subsequent choices. Within a child-friendly computerized environment (Mousekids), we tracked 205 children's (105 children 5-6 years of age and 100 children 9-10 years of age) and 103…

  9. Semantic search among heterogeneous biological databases based on gene ontology.

    PubMed

    Cao, Shun-Liang; Qin, Lei; He, Wei-Zhong; Zhong, Yang; Zhu, Yang-Yong; Li, Yi-Xue

    2004-05-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In this paper, we present a methodology for implementing semantic search in BioDW, an integrated biological data warehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entries from BioDW data sources with GO, and the semantic similarity table to record similarity scores derived from any pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and the corresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  10. An efficient quantum search engine on unsorted database

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Zhang, Yingyu; Liu, Fang

    2013-10-01

    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O() queries to factorized oracles. But the algorithm requires that all the attribute values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the attribute values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O() for most cases.

  11. Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery

    NASA Technical Reports Server (NTRS)

    McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)

    2001-01-01

    To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.

  12. Is Library Database Searching a Language Learning Activity?

    ERIC Educational Resources Information Center

    Bordonaro, Karen

    2010-01-01

    This study explores how non-native speakers of English think of words to enter into library databases when they begin the process of searching for information in English. At issue is whether or not language learning takes place when these students use library databases. Language learning in this study refers to the use of strategies employed by…

  13. Searching the PASCAL database - A user's perspective

    NASA Technical Reports Server (NTRS)

    Jack, Robert F.

    1989-01-01

    The operation of PASCAL, a bibliographic data base covering broad subject areas in science and technology, is discussed. The data base includes information from about 1973 to the present, including topics in engineering, chemistry, physics, earth science, environmental science, biology, psychology, and medicine. Data from 1986 to the present may be searched using DIALOG. The procedures and classification codes for searching PASCAL are presented. Examples of citations retrieved from the data base are given and suggestions are made concerning when to use PASCAL.

  14. [Identifying phosphopeptide by searching a site annotated protein database].

    PubMed

    Cheng, Kai; Wang, Fangjun; Bian, Yangyang; Ye, Mingling; Zou, Hanfa

    2015-01-01

    Phosphoproteome analysis is one of the important research fields in proteomics. In shotgun proteomics, phosphopeptides could be identified directly by setting phosphorylation as variable modifications in database search. However, search space increases significantly when variable modifications are set in post-translation modifications (PTMs) analysis, which will decrease the identification sensitivity. Because setting a variable modification on a specific type of amino acid residue means all of this amino acid residues in the database might be modified, which is not consistent with actual conditions. Phosphorylation and dephosphorylation are regulated by protein kinases and phosphatases, which can only occur on particular substrates. Therefore only residues within specific sequence are potential sites which may be modified. To address this issue, we extracted the characteristic sequence from the identified phosphorylation sites and created an annotated database containing phosphorylation site information, which allowed the searching engine to set variable modifications only on the serine, threonine and tyrosine residues that were identified to be phosphorylated previously. In this database only annotated serine, threonine and tyrosine can be modified. This strategy significantly reduced the search space. The performance of this new database searching strategy was evaluated by searching different types of data with Mascot, and higher sensitivity for phosphopeptide identification was achieved with high reliability.

  15. Exhaustive Database Searching for Amino Acid Mutations in Proteomes

    SciTech Connect

    Hyatt, Philip Douglas; Pan, Chongle

    2012-01-01

    Amino acid mutations in proteins can be found by searching tandem mass spectra acquired in shotgun proteomics experiments against protein sequences predicted from genomes. Traditionally, unconstrained searches for amino acid mutations have been accomplished by using a sequence tagging approach that combines de novo sequencing with database searching. However, this approach is limited by the performance of de novo sequencing. The Sipros algorithm v2.0 was developed to perform unconstrained database searching using high-resolution tandem mass spectra by exhaustively enumerating all single non-isobaric mutations for every residue in a protein database. The performance of Sipros for amino acid mutation identification exceeded that of an established sequence tagging algorithm, Inspect, based on benchmarking results from a Rhodopseudomonas palustris proteomics dataset. To demonstrate the viability of the algorithm for meta-proteomics, Sipros was used to identify amino acid mutations in a natural microbial community in acid mine drainage.

  16. An Exact Quantum Search Algorithm with Arbitrary Database

    NASA Astrophysics Data System (ADS)

    Liu, Yang

    2014-08-01

    In standard Grover's algorithm for quantum searching, the probability of finding a marked state is not exactly 1, and some modified versions of Grover's algorithm that search a marked state from an evenly distributed database with full successful rate have been presented. In this article, we present a generalized quantum search algorithm that searches M marked states from an arbitrary distributed N-item quantum database with a zero theoretical failure rate, where N is not necessary to be the power of 2. We analyze the general properties of our search algorithm, we find that our algorithm has periodicity with a period of 2 J + 1, and it is effective with certainty for J + (2 J + 1) m times of iteration, where m is an arbitrary nonnegative number.

  17. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  18. Privacy-preserving search for chemical compound databases

    PubMed Central

    2015-01-01

    Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650

  19. Molecule database framework: a framework for creating database applications with chemical structure search capability

    PubMed Central

    2013-01-01

    Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was

  20. Searching Harvard Business Review Online. . . Lessons in Searching a Full Text Database.

    ERIC Educational Resources Information Center

    Tenopir, Carol

    1985-01-01

    This article examines the Harvard Business Review Online (HBRO) database (bibliographic description fields, abstracts, extracted information, full text, subject descriptors) and reports on 31 sample HBRO searches conducted in Bibliographic Retrieval Services to test differences between searching full text and searching bibliographic record. Sample…

  1. Cognitive Style and On-Line Database Search Experience as Predictors of Web Search Performance.

    ERIC Educational Resources Information Center

    Palmquist, Ruth A.; Kim, Kyung-Sun

    2000-01-01

    Describes a study that investigated the effects of cognitive style (field dependent and field independent) and online database search experience (novice and experienced) on undergraduate World Wide Web search performance. Considers user factors to predict search efficiency and discusses implications for Web interface design and user training…

  2. An Analysis of Performance and Cost Factors in Searching Large Text Databases Using Parallel Search Systems.

    ERIC Educational Resources Information Center

    Couvreur, T. R.; And Others

    1994-01-01

    Discusses the results of modeling the performance of searching large text databases via various parallel hardware architectures and search algorithms. The performance under load and the cost of each configuration are compared, and a common search workload used in the modeling is described. (Contains 26 references.) (LRW)

  3. Searching the MINT database for protein interaction information.

    PubMed

    Cesareni, Gianni; Chatr-aryamontri, Andrew; Licata, Luana; Ceol, Arnaud

    2008-06-01

    The Molecular Interactions Database (MINT) is a relational database designed to store information about protein interactions. Expert curators extract the relevant information from the scientific literature and deposit it in a computer readable form. Currently (April 2008), MINT contains information on almost 29,000 proteins and more than 100,000 interactions from more than 30 model organisms. This unit provides protocols for searching MINT over the Internet, using the MINT Viewer.

  4. The LAILAPS search engine: relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Bargsten, Joachim; Haberhauer, Gregor; Klapperstück, Matthias; Leps, Michael; Weinel, Christian; Wünschiers, Röbbe; Weissbach, Mandy; Stein, Jens; Scholz, Uwe

    2010-01-15

    Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  5. Complementary use of the SciSearch database for improved biomedical information searching.

    PubMed Central

    Brown, C M

    1998-01-01

    The use of at least two complementary online biomedical databases is generally considered critical for biomedical scientists seeking to keep fully abreast of recent research developments as well as to retrieve the highest number of relevant citations possible. Although the National Library of Medicine's MEDLINE is usually the database of choice, this paper illustrates the benefits of using another database, the Institute for Scientific Information's SciSearch, when conducting a biomedical information search. When a simple query about red wine consumption and coronary artery disease was posed simultaneously in both MEDLINE and SciSearch, a greater number of relevant citations were retrieved through SciSearch. This paper also provides suggestions for carrying out a comprehensive biomedical literature search in a rapid and efficient manner by using SciSearch in conjunction with MEDLINE. PMID:9549014

  6. Probabilistic acute dietary exposure assessments to captan and tolylfluanid using several European food consumption and pesticide concentration databases.

    PubMed

    Boon, Polly E; Svensson, Kettil; Moussavian, Shahnaz; van der Voet, Hilko; Petersen, Annette; Ruprich, Jiri; Debegnach, Francesca; de Boer, Waldo J; van Donkersgoed, Gerda; Brera, Carlo; van Klaveren, Jacob D; Busk, Leif

    2009-12-01

    Probabilistic dietary acute exposure assessments of captan and tolylfluanid were performed for the populations of the Czech Republic, Denmark, Italy, the Netherlands and Sweden. The basis for these assessments was national databases for food consumption and pesticide concentration data harmonised at the level of raw agricultural commodity. Data were obtained from national food consumption surveys and national monitoring programmes and organised in an electronic platform of databases connected to probabilistic software. The exposure assessments were conducted by linking national food consumption data either (1) to national pesticide concentration data or (2) to a pooled database containing all national pesticide concentration data. We show that with this tool national exposure assessments can be performed in a harmonised way and that pesticide concentrations of other countries can be linked to national food consumption surveys. In this way it is possible to exchange or merge concentration data between countries in situations of data scarcity. This electronic platform in connection with probabilistic software can be seen as a prototype of a data warehouse, including a harmonised approach for dietary exposure modelling.

  7. Ontology searching and browsing at the Rat Genome Database.

    PubMed

    Laulederkind, Stanley J F; Tutaj, Marek; Shimoyama, Mary; Hayman, G Thomas; Lowry, Timothy F; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R; Wang, Shur-Jen; de Pons, Jeff; Dwinell, Melinda R; Jacob, Howard J

    2012-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40 000 rat gene records, as well as human and mouse orthologs, 1857 rat and 1912 human quantitative trait loci (QTLs) and 2347 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. RGD uses more than a dozen different ontologies to standardize annotation information for genes, QTLs and strains. That means a lot of time can be spent searching and browsing ontologies for the appropriate terms needed both for curating and mining the data. RGD has upgraded its ontology term search to make it more versatile and more robust. A term search result is connected to a term browser so the user can fine-tune the search by viewing parent and children terms. Most publicly available term browsers display a hierarchical organization of terms in an expandable tree format. RGD has replaced its old tree browser format with a 'driller' type of browser that allows quicker drilling up and down through the term branches, which has been confirmed by testing. The RGD ontology report pages have also been upgraded. Expanded functionality allows more choice in how annotations are displayed and what subsets of annotations are displayed. The new ontology search, browser and report features have been designed to enhance both manual data curation and manual data extraction. DATABASE URL: http://rgd.mcw.edu/rgdweb/ontology/search.html.

  8. Multi-Database Searching in the Behavioral Sciences--Part I: Basic Techniques and Core Databases.

    ERIC Educational Resources Information Center

    Angier, Jennifer J.; Epstein, Barbara A.

    1980-01-01

    Outlines practical searching techniques in seven core behavioral science databases accessing psychological literature: Psychological Abstracts, Social Science Citation Index, Biosis, Medline, Excerpta Medica, Sociological Abstracts, ERIC. Use of individual files is discussed and their relative strengths/weaknesses are compared. Appended is a list…

  9. Fast and accurate database searches with MS-GF+Percolator

    SciTech Connect

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  10. Significant speedup of database searches with HMMs by search space reduction with PSSM family models

    PubMed Central

    Beckstette, Michael; Homann, Robert; Giegerich, Robert; Kurtz, Stefan

    2009-01-01

    Motivation: Profile hidden Markov models (pHMMs) are currently the most popular modeling concept for protein families. They provide sensitive family descriptors, and sequence database searching with pHMMs has become a standard task in today's genome annotation pipelines. On the downside, searching with pHMMs is computationally expensive. Results: We propose a new method for efficient protein family classification and for speeding up database searches with pHMMs as is necessary for large-scale analysis scenarios. We employ simpler models of protein families called position-specific scoring matrices family models (PSSM-FMs). For fast database search, we combine full-text indexing, efficient exact p-value computation of PSSM match scores and fast fragment chaining. The resulting method is well suited to prefilter the set of sequences to be searched for subsequent database searches with pHMMs. We achieved a classification performance only marginally inferior to hmmsearch, yet, results could be obtained in a fraction of runtime with a speedup of >64-fold. In experiments addressing the method's ability to prefilter the sequence space for subsequent database searches with pHMMs, our method reduces the number of sequences to be searched with hmmsearch to only 0.80% of all sequences. The filter is very fast and leads to a total speedup of factor 43 over the unfiltered search, while retaining >99.5% of the original results. In a lossless filter setup for hmmsearch on UniProtKB/Swiss-Prot, we observed a speedup of factor 92. Availability: The presented algorithms are implemented in the program PoSSuMsearch2, available for download at http://bibiserv.techfak.uni-bielefeld.de/possumsearch2/. Contact: beckstette@zbh.uni-hamburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19828575

  11. Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis.

    PubMed

    Zhang, Xin; Li, Yunzi; Shao, Wenguang; Lam, Henry

    2011-03-01

    Spectral library searching has been recently proposed as an alternative to sequence database searching for peptide identification from MS/MS. We performed a systematic comparison between spectral library searching and sequence database searching using a wide variety of data to better demonstrate, and understand, the superior sensitivity of the former observed in preliminary studies. By decoupling the effect of search space, we demonstrated that the success of spectral library searching is primarily attributable to the use of real library spectra for matching, without which the sensitivity advantage largely disappears. We further determined the extent to which the use of real peak intensities and non-canonical fragments, both under-utilized information in sequence database searching, contributes to the sensitivity advantage. Lastly, we showed that spectral library searching is disproportionately more successful in identifying low-quality spectra, and complex spectra of higher- charged precursors, both important frontiers in peptide sequencing. Our results answered important outstanding questions about this promising yet unproven method using well-controlled computational experiments and sound statistical approaches.

  12. The database search problem: a question of rational decision making.

    PubMed

    Gittelson, S; Biedermann, A; Bozza, S; Taroni, F

    2012-10-10

    This paper applies probability and decision theory in the graphical interface of an influence diagram to study the formal requirements of rationality which justify the individualization of a person found through a database search. The decision-theoretic part of the analysis studies the parameters that a rational decision maker would use to individualize the selected person. The modeling part (in the form of an influence diagram) clarifies the relationships between this decision and the ingredients that make up the database search problem, i.e., the results of the database search and the different pairs of propositions describing whether an individual is at the source of the crime stain. These analyses evaluate the desirability associated with the decision of 'individualizing' (and 'not individualizing'). They point out that this decision is a function of (i) the probability that the individual in question is, in fact, at the source of the crime stain (i.e., the state of nature), and (ii) the decision maker's preferences among the possible consequences of the decision (i.e., the decision maker's loss function). We discuss the relevance and argumentative implications of these insights with respect to recent comments in specialized literature, which suggest points of view that are opposed to the results of our study.

  13. Peptide identification by database search of mixture tandem mass spectra.

    PubMed

    Wang, Jian; Bourne, Philip E; Bandeira, Nuno

    2011-12-01

    In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.

  14. Are Bibliographic Management Software Search Interfaces Reliable?: A Comparison between Search Results Obtained Using Database Interfaces and the EndNote Online Search Function

    ERIC Educational Resources Information Center

    Fitzgibbons, Megan; Meert, Deborah

    2010-01-01

    The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…

  15. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches.

  16. Enriching Great Britain's National Landslide Database by searching newspaper archives

    NASA Astrophysics Data System (ADS)

    Taylor, Faith E.; Malamud, Bruce D.; Freeborough, Katy; Demeritt, David

    2015-11-01

    Our understanding of where landslide hazard and impact will be greatest is largely based on our knowledge of past events. Here, we present a method to supplement existing records of landslides in Great Britain by searching an electronic archive of regional newspapers. In Great Britain, the British Geological Survey (BGS) is responsible for updating and maintaining records of landslide events and their impacts in the National Landslide Database (NLD). The NLD contains records of more than 16,500 landslide events in Great Britain. Data sources for the NLD include field surveys, academic articles, grey literature, news, public reports and, since 2012, social media. We aim to supplement the richness of the NLD by (i) identifying additional landslide events, (ii) acting as an additional source of confirmation of events existing in the NLD and (iii) adding more detail to existing database entries. This is done by systematically searching the Nexis UK digital archive of 568 regional newspapers published in the UK. In this paper, we construct a robust Boolean search criterion by experimenting with landslide terminology for four training periods. We then apply this search to all articles published in 2006 and 2012. This resulted in the addition of 111 records of landslide events to the NLD over the 2 years investigated (2006 and 2012). We also find that we were able to obtain information about landslide impact for 60-90% of landslide events identified from newspaper articles. Spatial and temporal patterns of additional landslides identified from newspaper articles are broadly in line with those existing in the NLD, confirming that the NLD is a representative sample of landsliding in Great Britain. This method could now be applied to more time periods and/or other hazards to add richness to databases and thus improve our ability to forecast future events based on records of past events.

  17. A Multilevel Probabilistic Beam Search Algorithm for the Shortest Common Supersequence Problem

    PubMed Central

    Gallardo, José E.

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably. PMID:23300667

  18. A multilevel probabilistic beam search algorithm for the shortest common supersequence problem.

    PubMed

    Gallardo, José E

    2012-01-01

    The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably.

  19. Supporting ontology-based keyword search over medical databases.

    PubMed

    Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

    2008-11-06

    The proliferation of medical terms poses a number of challenges in the sharing of medical information among different stakeholders. Ontologies are commonly used to establish relationships between different terms, yet their role in querying has not been investigated in detail. In this paper, we study the problem of supporting ontology-based keyword search queries on a database of electronic medical records. We present several approaches to support this type of queries, study the advantages and limitations of each approach, and summarize the lessons learned as best practices.

  20. Numerical database system based on a weighted search tree

    NASA Astrophysics Data System (ADS)

    Park, S. C.; Bahri, C.; Draayer, J. P.; Zheng, S.-Q.

    1994-09-01

    An on-line numerical database system, that is based on the concept of a weighted search tree and which functions like a file directory, is introduced. The system, which is designed to aid in reducing time-consuming redundant calculations in numerically intensive computations, can be used to fetch, insert and delete items from a dynamically generated list in optimal [ O(log n) where n is the number of items in the list] time. Items in the list are ordered according to a priority queue with the initial priority for each element set either automatically or by an user supplied algorithm. The priority queue is updated on-the-fly to reflect element hit frequency. Items can be added to a database so long as there is space to accommodate them, and when there is not, the lowest priority element(s) is removed to make room for an incoming element(s) with higher priority. The system acts passively and therefore can be applied to any number of databases, with the same or different structures, within a single application.

  1. Comparison Study of Overlap among 21 Scientific Databases in Searching Pesticide Information.

    ERIC Educational Resources Information Center

    Meyer, Daniel E.; And Others

    1983-01-01

    Evaluates overlapping coverage of 21 scientific databases used in 10 online pesticide searches in an attempt to identify minimum number of databases needed to generate 90 percent of unique, relevant citations for given search. Comparison of searches combined under given pesticide usage (herbicide, fungicide, insecticide) is discussed. Nine…

  2. On optimizing distance-based similarity search for biological databases.

    PubMed

    Mao, Rui; Xu, Weijia; Ramakrishnan, Smriti; Nuckolls, Glen; Miranker, Daniel P

    2005-01-01

    Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits. PMID:16447992

  3. Finding current evidence: search strategies and common databases.

    PubMed

    Gillespie, Lesley Diane; Gillespie, William John

    2003-08-01

    With more than 100 orthopaedic, sports medicine, or hand surgery journals indexed in MEDLINE, it is no longer possible to keep abreast of developments in orthopaedic surgery by reading a few journals each month. Electronic resources are easier to search and more current than most print sources. We provide a practical approach to finding useful information to guide orthopaedic practice. We focus first on where to find the information by providing details about many useful databases and web links. Sources for identifying guidelines, systematic reviews, and randomized controlled trials are identified. The second section discusses how to find the information, from the first stage of formulating a question and identifying the concepts of interest, through to writing a simple strategy. Sources for additional self-directed learning are provided.

  4. Phase4: automatic evaluation of database search methods.

    PubMed

    Rehmsmeier, Marc

    2002-12-01

    It has become standard to evaluate newly devised database search methods in terms of sensitivity and selectivity and to compare them with existing methods. This involves the construction of a suitable evaluation scenario, the execution of the methods, the assessment of their performances, and the presentation of the results. Each of these four phases and their smooth connection usually imposes formidable work. To relieve the evaluator of this burden, a system has been designed with which evaluations can be effected rapidly. It is implemented in the programming language Python whose object-oriented features are used to offer a great flexibility in changing the evaluation design. A graphical user interface is provided which offers the usual amenities such as radio- and checkbuttons or file browsing facilities.

  5. Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

    PubMed

    Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

    2016-01-01

    Advances in mass spectrometric instrumentation in the past 15 years have resulted in an explosion in the raw data yield from typical phosphoproteomics workflows. This poses the challenge of confidently identifying peptide sequences, localizing phosphosites to proteins and quantifying these from the vast amounts of raw data. This task is tackled by computational tools implementing algorithms that match the experimental data to databases, providing the user with lists for downstream analysis. Several platforms for such automated interpretation of mass spectrometric data have been developed, each having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here. PMID:26584936

  6. Search Databases and Statistics: Pitfalls and Best Practices in Phosphoproteomics.

    PubMed

    Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J

    2016-01-01

    Advances in mass spectrometric instrumentation in the past 15 years have resulted in an explosion in the raw data yield from typical phosphoproteomics workflows. This poses the challenge of confidently identifying peptide sequences, localizing phosphosites to proteins and quantifying these from the vast amounts of raw data. This task is tackled by computational tools implementing algorithms that match the experimental data to databases, providing the user with lists for downstream analysis. Several platforms for such automated interpretation of mass spectrometric data have been developed, each having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here.

  7. The Use of AJAX in Searching a Bibliographic Database: A Case Study of the Italian Biblioteche Oggi Database

    ERIC Educational Resources Information Center

    Cavaleri, Piero

    2008-01-01

    Purpose: The purpose of this paper is to describe the use of AJAX for searching the Biblioteche Oggi database of bibliographic records. Design/methodology/approach: The paper is a demonstration of how bibliographic database single page interfaces allow the implementation of more user-friendly features for social and collaborative tasks. Findings:…

  8. A Mentor/Research Model to Teach Library Skills: An Introduction to Database Searching.

    ERIC Educational Resources Information Center

    Nissen, Karen R.; Ross, Barbara A.

    1996-01-01

    As a way to teach database searching skills, students in a nursing research class were paired with, and performed literature searches for, faculty members or community health care professionals. Because students did not formulate questions or initial search terms, they concentrated on the search process, learning how professionals find and use…

  9. Seismic hazard assessment for Myanmar: Earthquake model database, ground-motion scenarios, and probabilistic assessments

    NASA Astrophysics Data System (ADS)

    Chan, C. H.; Wang, Y.; Thant, M.; Maung Maung, P.; Sieh, K.

    2015-12-01

    We have constructed an earthquake and fault database, conducted a series of ground-shaking scenarios, and proposed seismic hazard maps for all of Myanmar and hazard curves for selected cities. Our earthquake database integrates the ISC, ISC-GEM and global ANSS Comprehensive Catalogues, and includes harmonized magnitude scales without duplicate events. Our active fault database includes active fault data from previous studies. Using the parameters from these updated databases (i.e., the Gutenberg-Richter relationship, slip rate, maximum magnitude and the elapse time of last events), we have determined the earthquake recurrence models of seismogenic sources. To evaluate the ground shaking behaviours in different tectonic regimes, we conducted a series of tests by matching the modelled ground motions to the felt intensities of earthquakes. Through the case of the 1975 Bagan earthquake, we determined that Atkinson and Moore's (2003) scenario using the ground motion prediction equations (GMPEs) fits the behaviours of the subduction events best. Also, the 2011 Tarlay and 2012 Thabeikkyin events suggested the GMPEs of Akkar and Cagnan (2010) fit crustal earthquakes best. We thus incorporated the best-fitting GMPEs and site conditions based on Vs30 (the average shear-velocity down to 30 m depth) from analysis of topographic slope and microtremor array measurements to assess seismic hazard. The hazard is highest in regions close to the Sagaing Fault and along the Western Coast of Myanmar as seismic sources there have earthquakes occur at short intervals and/or last events occurred a long time ago. The hazard curves for the cities of Bago, Mandalay, Sagaing, Taungoo and Yangon show higher hazards for sites close to an active fault or with a low Vs30, e.g., the downtown of Sagaing and Shwemawdaw Pagoda in Bago.

  10. A common philosophy and FORTRAN 77 software package for implementing and searching sequence databases.

    PubMed

    Claverie, J M

    1984-01-11

    I present a common philosophy for implementing the EMBL and GENBANK (BBN-Los Alamos) nucleic acid sequence databases, as well as the National Biological Foundation (Dayhoff) protein sequence database. The associated FORTRAN 77 fully transportable software package includes: 1) modules for implementing each of these databases from the initial magnetic tape file, 2) modules performing a fast mnemonic access, 3) modules performing key-string access and allowing the definition of user-specific database subsets, 4) a common probe searching module allowing the stacking of multiple combined search requests over the databases. This software is particularly suitable for 32-bit mini/microcomputers but would eventually run on 16-bit computers.

  11. EasyKSORD: A Platform of Keyword Search Over Relational Databases

    NASA Astrophysics Data System (ADS)

    Peng, Zhaohui; Li, Jing; Wang, Shan

    Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.

  12. Probabilistic person identification in TV news programs using image web database

    NASA Astrophysics Data System (ADS)

    Battisti, F.; Carli, M.; Leo, M.; Neri, A.

    2014-02-01

    The automatic labeling of faces in TV broadcasting is still a challenging problem. The high variability in view points, facial expressions, general appearance, and lighting conditions, as well as occlusions, rapid shot changes, and camera motions, produce significant variations in image appearance. The application of automatic tools for face recognition is not yet fully established and the human intervention is needed. In this paper, we deal with the automatic face recognition in TV broadcasting programs. The target of the proposed method is to identify the presence of a specific person in a video by means of a set of images downloaded from Web using a specific search key.

  13. Compressed binary bit trees: a new data structure for accelerating database searching.

    PubMed

    Smellie, Andrew

    2009-02-01

    Molecules are often represented as bit string fingerprints in databases. These bit strings are used for similarity searching using the Tanimoto coefficient and rapid indexing. A new data structure is introduced, the compressed bit binary tree, that rapidly increases search and indexing times by up to a factor of 30. Results will be shown for databases of up to 1 M compounds with a variety of search parameters.

  14. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    ERIC Educational Resources Information Center

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  15. Interspecies extrapolation based on the RepDose database--a probabilistic approach.

    PubMed

    Escher, Sylvia E; Batke, Monika; Hoffmann-Doerr, Simone; Messinger, Horst; Mangelsdorf, Inge

    2013-04-12

    Repeated dose toxicity studies from the RepDose database (DB) were used to determine interspecies differences for rats and mice. NOEL (no observed effect level) ratios based on systemic effects were investigated for three different types of exposure: inhalation, oral food/drinking water and oral gavage. Furthermore, NOEL ratios for local effects in inhalation studies were evaluated. On the basis of the NOEL ratio distributions, interspecies assessment factors (AF) are evaluated. All data sets were best described by a lognormal distribution. No difference was seen between inhalation and oral exposure for systemic effects. Rats and mice were on average equally sensitive at equipotent doses with geometric mean (GM) values of 1 and geometric standard deviation (GSD) values ranging from 2.30 to 3.08. The local AF based on inhalation exposure resulted in a similar distribution with GM values of 1 and GSD values between 2.53 and 2.70. Our analysis confirms former analyses on interspecies differences, including also dog and human data. Furthermore it supports the principle of allometric scaling according to caloric demand in the case that body doses are applied. In conclusion, an interspecies distribution animal/human with a GM equal to allometric scaling and a GSD of 2.5 was derived.

  16. Adaptive search in mobile peer-to-peer databases

    NASA Technical Reports Server (NTRS)

    Wolfson, Ouri (Inventor); Xu, Bo (Inventor)

    2010-01-01

    Information is stored in a plurality of mobile peers. The peers communicate in a peer to peer fashion, using a short-range wireless network. Occasionally, a peer initiates a search for information in the peer to peer network by issuing a query. Queries and pieces of information, called reports, are transmitted among peers that are within a transmission range. For each search additional peers are utilized, wherein these additional peers search and relay information on behalf of the originator of the search.

  17. Content Evaluation of Textual CD-ROM and Web Databases. Database Searching Series.

    ERIC Educational Resources Information Center

    Jacso, Peter

    This book provides guidelines for evaluating a variety of database types, including abstracting and indexing, directory, full-text, and page-image databases available in online and/or CD-ROM formats. The book discusses the purpose and techniques of comparing and evaluating the most important characteristics of textual databases, such as their…

  18. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  19. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  20. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  1. Optimal design of groundwater remediation systems using a probabilistic multi-objective fast harmony search algorithm under uncertainty

    NASA Astrophysics Data System (ADS)

    Luo, Q.; Wu, J.; Qian, J.

    2013-12-01

    This study develops a new probabilistic multi-objective fast harmony search algorithm (PMOFHS) for optimal design of groundwater remediation system under uncertainty associated with the hydraulic conductivity of aquifers. The PMOFHS integrates the previously developed deterministic multi-objective optimization method, namely multi-objective fast harmony search algorithm (MOFHS) with a probabilistic Pareto domination ranking and probabilistic niche technique to search for Pareto-optimal solutions to multi-objective optimization problems in a noisy hydrogeological environment arising from insufficient hydraulic conductivity data. The PMOFHS is then coupled with the commonly used flow and transport codes, MODFLOW and MT3DMS, to identify the optimal groundwater remediation system of a two-dimensional hypothetical test problem involving two objectives: (i) minimization of the total remediation cost through the engineering planning horizon, and (ii) minimization of the percentage of mass remaining in the aquifer at the end of the operational period, which uses the Pump-and-Treat (PAT) technology to clean up contaminated groundwater. Also, Monte Carlo (MC) analysis is used to demonstrate the effectiveness of the proposed methodology. The MC analysis is taken to each Pareto solutions for every K realization. Then the statistical mean and the upper and lower bounds of uncertainty intervals of 95% confidence level are calculated. The MC analysis results show that all of the Pareto-optimal solutions are located between the upper and lower bounds of the MC analysis. Moreover, the root mean square errors (RMSEs) between the Pareto-optimal solutions by the PMOFHS and the average values of optimal solutions by the MC analysis are 0.0204 for the first objective and 0.0318 for the second objective, quite smaller than those RMSEs between the results by the existing probabilistic multi-objective genetic algorithm (PMOGA) and the MC analysis, 0.0384 and 0.0397, respectively. In

  2. Using the Turning Research Into Practice (TRIP) database: how do clinicians really search?*

    PubMed Central

    Meats, Emma; Brassey, Jon; Heneghan, Carl; Glasziou, Paul

    2007-01-01

    Objectives: Clinicians and patients are increasingly accessing information through Internet searches. This study aimed to examine clinicians' current search behavior when using the Turning Research Into Practice (TRIP) database to examine search engine use and the ways it might be improved. Methods: A Web log analysis was undertaken of the TRIP database—a meta-search engine covering 150 health resources including MEDLINE, The Cochrane Library, and a variety of guidelines. The connectors for terms used in searches were studied, and observations were made of 9 users' search behavior when working with the TRIP database. Results: Of 620,735 searches, most used a single term, and 12% (n = 75,947) used a Boolean operator: 11% (n = 69,006) used “AND” and 0.8% (n = 4,941) used “OR.” Of the elements of a well-structured clinical question (population, intervention, comparator, and outcome), the population was most commonly used, while fewer searches included the intervention. Comparator and outcome were rarely used. Participants in the observational study were interested in learning how to formulate better searches. Conclusions: Web log analysis showed most searches used a single term and no Boolean operators. Observational study revealed users were interested in conducting efficient searches but did not always know how. Therefore, either better training or better search interfaces are required to assist users and enable more effective searching. PMID:17443248

  3. Searching for Controlled Trials of Complementary and Alternative Medicine: A Comparison of 15 Databases

    PubMed Central

    Cogo, Elise; Sampson, Margaret; Ajiferuke, Isola; Manheimer, Eric; Campbell, Kaitryn; Daniel, Raymond; Moher, David

    2011-01-01

    This project aims to assess the utility of bibliographic databases beyond the three major ones (MEDLINE, EMBASE and Cochrane CENTRAL) for finding controlled trials of complementary and alternative medicine (CAM). Fifteen databases were searched to identify controlled clinical trials (CCTs) of CAM not also indexed in MEDLINE. Searches were conducted in May 2006 using the revised Cochrane highly sensitive search strategy (HSSS) and the PubMed CAM Subset. Yield of CAM trials per 100 records was determined, and databases were compared over a standardized period (2005). The Acudoc2 RCT, Acubriefs, Index to Chiropractic Literature (ICL) and Hom-Inform databases had the highest concentrations of non-MEDLINE records, with more than 100 non-MEDLINE records per 500. Other productive databases had ratios between 500 and 1500 records to 100 non-MEDLINE records—these were AMED, MANTIS, PsycINFO, CINAHL, Global Health and Alt HealthWatch. Five databases were found to be unproductive: AGRICOLA, CAIRSS, Datadiwan, Herb Research Foundation and IBIDS. Acudoc2 RCT yielded 100 CAM trials in the most recent 100 records screened. Acubriefs, AMED, Hom-Inform, MANTIS, PsycINFO and CINAHL had more than 25 CAM trials per 100 records screened. Global Health, ICL and Alt HealthWatch were below 25 in yield. There were 255 non-MEDLINE trials from eight databases in 2005, with only 10% indexed in more than one database. Yield varied greatly between databases; the most productive databases from both sampling methods were Acubriefs, Acudoc2 RCT, AMED and CINAHL. Low overlap between databases indicates comprehensive CAM literature searches will require multiple databases. PMID:19468052

  4. Research Report on Development of a Probabilistic Author Search and Matching Technique for Retrieval and Creation of Bibliographic Records.

    ERIC Educational Resources Information Center

    Hickey, Thomas B.

    Using macro and micro-structure analysis of large files of personal author names, this study developed retrieval techniques and algorithms to automatically correct and/or flag typographical errors in names, identify names in a database that are similar to a name entered by a user during a search, and measure similarities between names. It was…

  5. Techniques for searching the CINAHL database using the EBSCO interface.

    PubMed

    Lawrence, Janna C

    2007-04-01

    The cumulative index to Nursing and Allied Health Literature (CINAHL) is a useful research tool for accessing articles of interest to nurses and health care professionals. More than 2,800 journals are indexed by CINAHL and can be searched easily using assigned subject headings. Detailed instructions about conducting, combining, and saving searches in CINAHL are provided in this article. Establishing an account at EBSCO further allows a nurse to save references and searches and to receive e-mail alerts when new articles on a topic of interest are published.

  6. New showers from parent body search across several video meteor databases

    NASA Astrophysics Data System (ADS)

    Šegon, Damir; Gural, Peter; Andreić, Željko; Skokić, Ivica; Korlević, Korado; Vida, Denis; Novoselnik, Filip

    2014-04-01

    This work was initiated by utilizing the latest complete set of both comets and NEOs downloaded from the JPL small-body database search engine. Rather than search for clustering within a single given database of meteor orbits, the method employed herein is to use all the known parent bodies with their individual orbital elements as the starting point, and find statistically significant associations across a variety of meteor databases cmns3-sonotaco,cmns3-cmnsdb. Fifteen new showers possibly related to a comet or a NEO were found.

  7. STEPS: a grid search methodology for optimized peptide identification filtering of MS/MS database search results.

    PubMed

    Piehowski, Paul D; Petyuk, Vladislav A; Sandoval, John D; Burnum, Kristin E; Kiebel, Gary R; Monroe, Matthew E; Anderson, Gordon A; Camp, David G; Smith, Richard D

    2013-03-01

    For bottom-up proteomics, there are wide variety of database-searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid-search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection--referred to as STEPS--utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true-positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  8. STEPS: A Grid Search Methodology for Optimized Peptide Identification Filtering of MS/MS Database Search Results

    SciTech Connect

    Piehowski, Paul D.; Petyuk, Vladislav A.; Sandoval, John D.; Burnum, Kristin E.; Kiebel, Gary R.; Monroe, Matthew E.; Anderson, Gordon A.; Camp, David G.; Smith, Richard D.

    2013-03-01

    For bottom-up proteomics there are a wide variety of database searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection - referred to as STEPS - utilizes user-defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal "parameter set" for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true positive identifications are demonstrated using datasets derived from immunoaffinity-depleted blood serum and a bacterial cell lysate, two common proteomics sample types.

  9. Databases and Search Services North of the Border.

    ERIC Educational Resources Information Center

    Janke, Richard V.

    This paper presents an overview of the major Canadian online systems, which account for approximately 5% of information vendors and databases worldwide. Also described is DATAPAC, Canada's national telecommunications network, and DATAPAC's X.75 interface with TELENET, TYMNET, UNINET, and France's TRANSPAC. The online systems described include: (1)…

  10. A Practical Introduction to Non-Bibliographic Database Searching.

    ERIC Educational Resources Information Center

    Rocke, Hans J.; And Others

    This guide comprises four reports on the Laboratory Animal Data Bank (LADB), the National Institute of Health Environmental Protection Agency (NIH/EPA) Chemical Information System (CIS), nonbibliographic databases for the social sciences, and the Toxicology Data Bank (TDB) and Registry of Toxic Effects of Chemical Substances (RTECS). The first…

  11. A searching and reporting system for relational databases using a graph-based metadata representation.

    PubMed

    Hewitt, Robin; Gobbi, Alberto; Lee, Man-Ling

    2005-01-01

    Relational databases are the current standard for storing and retrieving data in the pharmaceutical and biotech industries. However, retrieving data from a relational database requires specialized knowledge of the database schema and of the SQL query language. At Anadys, we have developed an easy-to-use system for searching and reporting data in a relational database to support our drug discovery project teams. This system is fast and flexible and allows users to access all data without having to write SQL queries. This paper presents the hierarchical, graph-based metadata representation and SQL-construction methods that, together, are the basis of this system's capabilities.

  12. Social Work Literature Searching: Current Issues with Databases and Online Search Engines

    ERIC Educational Resources Information Center

    McGinn, Tony; Taylor, Brian; McColgan, Mary; McQuilkan, Janice

    2016-01-01

    Objectives: To compare the performance of a range of search facilities; and to illustrate the execution of a comprehensive literature search for qualitative evidence in social work. Context: Developments in literature search methods and comparisons of search facilities help facilitate access to the best available evidence for social workers.…

  13. Searching for planning operators with context-dependent and probabilistic effects

    SciTech Connect

    Oates, T.; Cohen, P.R.

    1996-12-31

    Providing a complete and accurate domain model for an agent situated in a complex environment can be an extremely difficult task. Actions may have different effects depending on the context in which they are taken, and actions may or may not induce their intended effects, with the probability of success again depending on context. We present an algorithm for automatically learning planning operators with context-dependent and probabilistic effects in environments where exogenous events change the state of the world. Empirical results show that the algorithm successfully finds operators that capture the true structure of an agent`s interactions with its environment, and avoids spurious associations between actions and exogenous events.

  14. MIDAS: a database-searching algorithm for metabolite identification in metabolomics.

    PubMed

    Wang, Yingfeng; Kora, Guruprasad; Bowen, Benjamin P; Pan, Chongle

    2014-10-01

    A database searching approach can be used for metabolite identification in metabolomics by matching measured tandem mass spectra (MS/MS) against the predicted fragments of metabolites in a database. Here, we present the open-source MIDAS algorithm (Metabolite Identification via Database Searching). To evaluate a metabolite-spectrum match (MSM), MIDAS first enumerates possible fragments from a metabolite by systematic bond dissociation, then calculates the plausibility of the fragments based on their fragmentation pathways, and finally scores the MSM to assess how well the experimental MS/MS spectrum from collision-induced dissociation (CID) is explained by the metabolite's predicted CID MS/MS spectrum. MIDAS was designed to search high-resolution tandem mass spectra acquired on time-of-flight or Orbitrap mass spectrometer against a metabolite database in an automated and high-throughput manner. The accuracy of metabolite identification by MIDAS was benchmarked using four sets of standard tandem mass spectra from MassBank. On average, for 77% of original spectra and 84% of composite spectra, MIDAS correctly ranked the true compounds as the first MSMs out of all MetaCyc metabolites as decoys. MIDAS correctly identified 46% more original spectra and 59% more composite spectra at the first MSMs than an existing database-searching algorithm, MetFrag. MIDAS was showcased by searching a published real-world measurement of a metabolome from Synechococcus sp. PCC 7002 against the MetaCyc metabolite database. MIDAS identified many metabolites missed in the previous study. MIDAS identifications should be considered only as candidate metabolites, which need to be confirmed using standard compounds. To facilitate manual validation, MIDAS provides annotated spectra for MSMs and labels observed mass spectral peaks with predicted fragments. The database searching and manual validation can be performed online at http://midas.omicsbio.org.

  15. Sagace: A web-based search engine for biomedical databases in Japan

    PubMed Central

    2012-01-01

    Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816

  16. Global search tool for the Advanced Photon Source Integrated Relational Model of Installed Systems (IRMIS) database.

    SciTech Connect

    Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division; Purdue Univ.

    2007-01-01

    The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, the necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.

  17. Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

    PubMed

    Yadav, Amit Kumar; Kumar, Dhirendra; Dash, Debasis

    2012-01-01

    The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies--separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.

  18. The LAILAPS search engine: a feature model for relevance ranking in life science databases.

    PubMed

    Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe

    2010-03-25

    Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

  19. Using homology relations within a database markedly boosts protein sequence similarity search.

    PubMed

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  20. Using homology relations within a database markedly boosts protein sequence similarity search

    PubMed Central

    Tong, Jing; Sadreyev, Ruslan I.; Pei, Jimin; Kinch, Lisa N.; Grishin, Nick V.

    2015-01-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence–based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit’s known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  1. Using homology relations within a database markedly boosts protein sequence similarity search.

    PubMed

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-01

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre. PMID:26038555

  2. Development and Validation of Search Filters to Identify Articles on Family Medicine in Online Medical Databases.

    PubMed

    Pols, David H J; Bramer, Wichor M; Bindels, Patrick J E; van de Laar, Floris A; Bohnen, Arthur M

    2015-01-01

    Physicians and researchers in the field of family medicine often need to find relevant articles in online medical databases for a variety of reasons. Because a search filter may help improve the efficiency and quality of such searches, we aimed to develop and validate search filters to identify research studies of relevance to family medicine. Using a new and objective method for search filter development, we developed and validated 2 search filters for family medicine. The sensitive filter had a sensitivity of 96.8% and a specificity of 74.9%. The specific filter had a specificity of 97.4% and a sensitivity of 90.3%. Our new filters should aid literature searches in the family medicine field. The sensitive filter may help researchers conducting systematic reviews, whereas the specific filter may help family physicians find answers to clinical questions at the point of care when time is limited.

  3. Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

    PubMed Central

    Arita, Masanori; Suwa, Kazuhiro

    2008-01-01

    Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113

  4. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search

    PubMed Central

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result–the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4–5 times faster than SSEARCH, 6–25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases PMID:26719890

  5. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    PubMed

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases. PMID:26719890

  6. Using Search Algorithms and Probabilistic Graphical Models to Understand the Influence of Atmospheric Circulation on Western US Drought

    NASA Astrophysics Data System (ADS)

    Malevich, S. B.; Woodhouse, C. A.

    2015-12-01

    This work explores a new approach to quantify cool-season mid-latitude circulation dynamics as they relate western US streamflow variability and drought. This information is used to probabilistically associate patterns of synoptic atmospheric circulation with spatial patterns of drought in western US streamflow. Cool-season storms transport moisture from the Pacific Ocean and are a primary source for western US streamflow. Studies overthe past several decades have emphasized that the western US hydroclimate is influenced by the intensity and phasing of ocean and atmosphere dynamics and teleconnections, such as ENSO and North Pacific variability. These complex interactions are realized in atmospheric circulation along the west coast of North America. The region's atmospheric circulation can encourage a preferential flow in winter storm tracks from the Pacific, and thus influence the moisture conditions of a given river basin over the course of the cool season. These dynamics have traditionally been measured with atmospheric indices based on values from fixed points in space or principal component loadings. This study uses collective search agents to quantify the position and intensity of potentially non-stationary atmosphere features in climate reanalysis datasets, relative to regional hydrology. Results underline the spatio-temporal relationship between semi-permanent atmosphere characteristics and naturalized streamflow from major river basins of the western US. A probabilistic graphical model quantifies this relationship while accounting for uncertainty from noisy climate processes, and eventually, limitations from dataset length. This creates probabilities for semi-permanent atmosphere features which we hope to associate with extreme droughts of the paleo record, based on our understanding of atmosphere-streamflow relations observed in the instrumental record.

  7. Database search for safety information on cosmetic ingredients.

    PubMed

    Pauwels, Marleen; Rogiers, Vera

    2007-12-01

    Ethical considerations with respect to experimental animal use and regulatory testing are worldwide under heavy discussion and are, in certain cases, taken up in legislative measures. The most explicit example is the European cosmetic legislation, establishing a testing ban on finished cosmetic products since 11 September 2004 and enforcing that the safety of a cosmetic product is assessed by taking into consideration "the general toxicological profile of the ingredients, their chemical structure and their level of exposure" (OJ L151, 32-37, 23 June 1993; OJ L066, 26-35, 11 March 2003). Therefore the availability of referenced and reliable information on cosmetic ingredients becomes a dire necessity. Given the high-speed progress of the World Wide Web services and the concurrent drastic increase in free access to information, identification of relevant data sources and evaluation of the scientific value and quality of the retrieved data, are crucial. Based upon own practical experience, a survey is put together of freely and commercially available data sources with their individual description, field of application, benefits and drawbacks. It should be mentioned that the search strategies described are equally useful as a starting point for any quest for safety data on chemicals or chemical-related substances in general.

  8. Parallel database search and prime factorization with magnonic holographic memory devices

    NASA Astrophysics Data System (ADS)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  9. Parallel database search and prime factorization with magnonic holographic memory devices

    SciTech Connect

    Khitun, Alexander

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  10. Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

    ERIC Educational Resources Information Center

    Rzepa, Henry S.

    2016-01-01

    Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…

  11. An Interactive Iterative Method for Electronic Searching of Large Literature Databases

    ERIC Educational Resources Information Center

    Hernandez, Marco A.

    2013-01-01

    PubMed® is an on-line literature database hosted by the U.S. National Library of Medicine. Containing over 21 million citations for biomedical literature--both abstracts and full text--in the areas of the life sciences, behavioral studies, chemistry, and bioengineering, PubMed® represents an important tool for researchers. PubMed® searches return…

  12. More Databases Searched by a Business Generalist--Part 2: A Veritable Cornucopia of Sources.

    ERIC Educational Resources Information Center

    Meredith, Meri

    1986-01-01

    This second installment describes databases irregularly searched in the Business Information Center, Cummins Engine Company (Columbus, Indiana). Highlights include typical research topics (happenings among similar manufacturers); government topics (Department of Defense contracts); market and industry topics; corporate intelligence; and personnel,…

  13. Support Vector Machines for Improved Peptide Identification from Tandem Mass Spectrometry Database Search

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.

    2009-05-06

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptide identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample

  14. Sports Information Online: Searching the SPORT Database and Tips for Finding Sports Medicine Information Online.

    ERIC Educational Resources Information Center

    Janke, Richard V.; And Others

    1988-01-01

    The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)

  15. Planning for End-User Database Searching: Drexel and the Mac: A User-Consistent Interface.

    ERIC Educational Resources Information Center

    LaBorie, Tim; Donnelly, Leslie

    Drexel University instituted a microcomputing program in 1984 which required all freshmen to own Apple Macintosh microcomputers. All students were taught database searching on the BRS (Bibliographic Retrieval Services) system as part of the freshman humanities curriculum, and the university library was chosen as the site to house continuing…

  16. Successful Keyword Searching: Initiating Research on Popular Topics Using Electronic Databases.

    ERIC Educational Resources Information Center

    MacDonald, Randall M.; MacDonald, Susan Priest

    Students are using electronic resources more than ever before to locate information for assignments. Without the proper search terms, results are incomplete, and students are frustrated. Using the keywords, key people, organizations, and Web sites provided in this book and compiled from the most commonly used databases, students will be able to…

  17. A guide for medical information searches of bibliographic databases - psychiatric research as an example.

    PubMed

    Löhönen, Johanna; Isohanni, Matti; Nieminen, Pentti; Miettunen, Jouko

    2009-09-01

    Information overload, demanding work with strict time limits, and the extensive number of medical bibliographic databases and other research sources all underline the importance of being able to search for up-to-date information effectively. Medical journals play a key role in providing access to the latest information in medicine and health and bibliographic databases play an important role in accessing them. This paper sheds light on the role of the information search process and discusses how to approach key medical bibliographic databases and information sources, using the field of psychiatry as an example. Because of an increasing amount of information, the constant renewal within the discipline and a variety of services available, those seeking information must precisely define what kind of information they are looking for and from which sources the information needed may be found.

  18. IMPROVED SEARCH OF PRINCIPAL COMPONENT ANALYSIS DATABASES FOR SPECTRO-POLARIMETRIC INVERSION

    SciTech Connect

    Casini, R.; Lites, B. W.; Ramos, A. Asensio

    2013-08-20

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 2{sup 4n} bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of ''compatible'' models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 2{sup 4n} as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  19. Improved Search of Principal Component Analysis Databases for Spectro-polarimetric Inversion

    NASA Astrophysics Data System (ADS)

    Casini, R.; Asensio Ramos, A.; Lites, B. W.; López Ariste, A.

    2013-08-01

    We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 24n bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of "compatible" models for the inversion of a given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 24n as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.

  20. Speeding up tandem mass spectrometry-based database searching by longest common prefix

    PubMed Central

    2010-01-01

    Background Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm PMID:21108792

  1. A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

    PubMed

    Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

    2016-05-23

    The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.

  2. Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

    PubMed Central

    Jiang, Xinning; Jiang, Xiaogang; Han, Guanghui; Ye, Mingliang; Zou, Hanfa

    2007-01-01

    Background In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. Results In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. Conclusion Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage. PMID:17761002

  3. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-01-01

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. PMID:27603022

  4. Searching molecular structure databases with tandem mass spectra using CSI:FingerID.

    PubMed

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-10-13

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.

  5. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  6. Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.

    PubMed

    Adamo, Mark E; Gerber, Scott A

    2016-09-07

    MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc.

  7. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-01

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds. PMID:11428924

  8. Discovery of novel mesangial cell proliferation inhibitors using a three-dimensional database searching method.

    PubMed

    Kurogi, Y; Miyata, K; Okamura, T; Hashimoto, K; Tsutsumi, K; Nasu, M; Moriyasu, M

    2001-07-01

    A three-dimensional pharmacophore model of mesangial cell (MC) proliferation inhibitors was generated from a training set of 4-(diethoxyphosphoryl)methyl-N-(3-phenyl-[1,2,4]thiadiazol-5-yl)benzamide, 2, and its derivatives using the Catalyst/HIPHOP software program. On the basis of the in vitro MC proliferation inhibitory activity, a pharmacophore model was generated as seven features consisting of two hydrophobic regions, two hydrophobic aromatic regions, and three hydrogen bond acceptors. Using this model as a three-dimensional query to search the Maybridge database, structurally novel 41 compounds were identified. The evaluation of MC proliferation inhibitory activity using available samples from the 41 identified compounds exhibited over 50% inhibitory activity at the 100 nM range. Interestingly, the newly identified compounds by the 3D database searching method exhibited the reduced inhibition of normal proximal tubular epithelial cell proliferation compared to a training set of compounds.

  9. The evidential value in the DNA database search controversy and the two-stain problem.

    PubMed

    Meester, Ronald; Sjerps, Marjan

    2003-09-01

    Does the evidential strength of a DNA match depend on whether the suspect was identified through database search or through other evidence ("probable cause")? In Balding and Donnelly (1995, Journal of the Royal Statistical Society, Series A 158, 21-53) and elsewhere, it has been argued that the evidential strength is slightly larger in a database search case than in a probable cause case, while Stockmarr (1999, Biometrics 55, 671-677) reached the opposite conclusion. Both these approaches use likelihood ratios. By making an excursion to a similar problem, the two-stain problem, we argue in this article that there are certain fundamental difficulties with the use of a likelihood ratio, which can be avoided by concentrating on the posterior odds. This approach helps resolving the above-mentioned conflict.

  10. Rapid identification of anonymous subjects in large criminal databases: problems and solutions in IAFIS III/FBI subject searches

    NASA Astrophysics Data System (ADS)

    Kutzleb, C. D.

    1997-02-01

    The high incidence of recidivism (repeat offenders) in the criminal population makes the use of the IAFIS III/FBI criminal database an important tool in law enforcement. The problems and solutions employed by IAFIS III/FBI criminal subject searches are discussed for the following topics: (1) subject search selectivity and reliability; (2) the difficulty and limitations of identifying subjects whose anonymity may be a prime objective; (3) database size, search workload, and search response time; (4) techniques and advantages of normalizing the variability in an individual's name and identifying features into identifiable and discrete categories; and (5) the use of database demographics to estimate the likelihood of a match between a search subject and database subjects.

  11. A hybrid approach for addressing ring flexibility in 3D database searching.

    PubMed

    Sadowski, J

    1997-01-01

    A hybrid approach for flexible 3D database searching is presented that addresses the problem of ring flexibility. It combines the explicit storage of up to 25 multiple conformations of rings, with up to eight atoms, generated by the 3D structure generator CORINA with the power of a torsional fitting technique implemented in the 3D database system UNITY. A comparison with the original UNITY approach, using a database with about 130,000 entries and five different pharmacophore queries, was performed. The hybrid approach scored, on an average, 10-20% more hits than the reference run. Moreover, specific problems with unrealistic hit geometries produced by the original approach can be excluded. In addition, the influence of the maximum number of ring conformations per molecule was investigated. An optimal number of 10 conformations per molecule is recommended.

  12. Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

    NASA Astrophysics Data System (ADS)

    Munekawa, Yuma; Ino, Fumihiko; Hagihara, Kenichi

    This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.

  13. A grammar based methodology for structural motif finding in ncRNA database search.

    PubMed

    Quest, Daniel; Tapprich, William; Ali, Hesham

    2007-01-01

    In recent years, sequence database searching has been conducted through local alignment heuristics, pattern-matching, and comparison of short statistically significant patterns. While these approaches have unlocked many clues as to sequence relationships, they are limited in that they do not provide context-sensitive searching capabilities (e.g. considering pseudoknots, protein binding positions, and complementary base pairs). Stochastic grammars (hidden Markov models HMMs and stochastic context-free grammars SCFG) do allow for flexibility in terms of local context, but the context comes at the cost of increased computational complexity. In this paper we introduce a new grammar based method for searching for RNA motifs that exist within a conserved RNA structure. Our method constrains computational complexity by using a chain of topology elements. Through the use of a case study we present the algorithmic approach and benchmark our approach against traditional methods.

  14. Protein backbone angle restraints from searching a database for chemical shift and sequence homology.

    PubMed

    Cornilescu, G; Delaglio, F; Bax, A

    1999-03-01

    Chemical shifts of backbone atoms in proteins are exquisitely sensitive to local conformation, and homologous proteins show quite similar patterns of secondary chemical shifts. The inverse of this relation is used to search a database for triplets of adjacent residues with secondary chemical shifts and sequence similarity which provide the best match to the query triplet of interest. The database contains 13C alpha, 13C beta, 13C', 1H alpha and 15N chemical shifts for 20 proteins for which a high resolution X-ray structure is available. The computer program TALOS was developed to search this database for strings of residues with chemical shift and residue type homology. The relative importance of the weighting factors attached to the secondary chemical shifts of the five types of resonances relative to that of sequence similarity was optimized empirically. TALOS yields the 10 triplets which have the closest similarity in secondary chemical shift and amino acid sequence to those of the query sequence. If the central residues in these 10 triplets exhibit similar phi and psi backbone angles, their averages can reliably be used as angular restraints for the protein whose structure is being studied. Tests carried out for proteins of known structure indicate that the root-mean-square difference (rmsd) between the output of TALOS and the X-ray derived backbone angles is about 15 degrees. Approximately 3% of the predictions made by TALOS are found to be in error.

  15. EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

    PubMed Central

    Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

    2011-01-01

    We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336

  16. Exploring Multidisciplinary Data Sets through Database Driven Search Capabilities and Map-Based Web Services

    NASA Astrophysics Data System (ADS)

    O'Hara, S.; Ferrini, V.; Arko, R.; Carbotte, S. M.; Leung, A.; Bonczkowski, J.; Goodwillie, A.; Ryan, W. B.; Melkonian, A. K.

    2008-12-01

    Relational databases containing geospatially referenced data enable the construction of robust data access pathways that can be customized to suit the needs of a diverse user community. Web-based search capabilities driven by radio buttons and pull-down menus can be generated on-the-fly leveraging the power of the relational database and providing specialists a means of discovering specific data and data sets. While these data access pathways are sufficient for many scientists, map-based data exploration can also be an effective means of data discovery and integration by allowing users to rapidly assess the spatial co- registration of several data types. We present a summary of data access tools currently provided by the Marine Geoscience Data System (www.marine-geo.org) that are intended to serve a diverse community of users and promote data integration. Basic search capabilities allow users to discover data based on data type, device type, geographic region, research program, expedition parameters, personnel and references. In addition, web services are used to create database driven map interfaces that provide live access to metadata and data files.

  17. Searching for patterns in remote sensing image databases using neural networks

    NASA Technical Reports Server (NTRS)

    Paola, Justin D.; Schowengerdt, Robert A.

    1995-01-01

    We have investigated a method, based on a successful neural network multispectral image classification system, of searching for single patterns in remote sensing databases. While defining the pattern to search for and the feature to be used for that search (spectral, spatial, temporal, etc.) is challenging, a more difficult task is selecting competing patterns to train against the desired pattern. Schemes for competing pattern selection, including random selection and human interpreted selection, are discussed in the context of an example detection of dense urban areas in Landsat Thematic Mapper imagery. When applying the search to multiple images, a simple normalization method can alleviate the problem of inconsistent image calibration. Another potential problem, that of highly compressed data, was found to have a minimal effect on the ability to detect the desired pattern. The neural network algorithm has been implemented using the PVM (Parallel Virtual Machine) library and nearly-optimal speedups have been obtained that help alleviate the long process of searching through imagery.

  18. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  19. mirPub: a database for searching microRNA publications

    PubMed Central

    Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

    2015-01-01

    Summary: Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. Availability and Implementation: mirPub is freely available at http://www.microrna.gr/mirpub/. Contact: vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25527833

  20. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    SciTech Connect

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-15

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb{sub 1/2}Mn{sub 1/2})O{sub 3} as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb{sub 2}O{sub 4}; and (3) ferroelectric semiconductors with formula M{sub 2}P{sub 2}(S,Se){sub 6}. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties. - Graphical abstract: Integration of first-principles methods with crystallographic database mining, for the discovery and design of novel ferroelectric materials, could potentially lead to new classes of multifunctional materials. Highlights: Black-Right-Pointing-Pointer Integration of first-principles methods and database mining. Black-Right-Pointing-Pointer Minor structural families with desirable functional properties. Black-Right-Pointing-Pointer Survey of polar entries in the Inorganic Crystal Structural Database.

  1. Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.

    PubMed

    Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery

    2009-01-01

    We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).

  2. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  3. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  4. Allie: a database and a search service of abbreviations and long forms.

    PubMed

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.

  5. Allie: a database and a search service of abbreviations and long forms

    PubMed Central

    Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa

    2011-01-01

    Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548

  6. Intelligent technique to search for patterns within images in massive databases.

    PubMed

    Vega, J; Murari, A; Pereira, A; Portas, A; Castro, P

    2008-10-01

    An image retrieval system for JET has been developed. The image database contains the images of the JET high speed visible camera. The system input is a pattern selected inside an image and the output is the group of frames (defined by their discharge numbers and time slices) that show patterns similar to the selected one. This approach is based on morphological pattern recognition and it should be emphasized that the pattern is found independently of its location in the frame. The technique encodes images into characters and, therefore, it transforms the pattern search into a character-matching problem.

  7. Computational methodologies for compound database searching that utilize experimental protein-ligand interaction information.

    PubMed

    Tan, Lu; Batista, Jose; Bajorath, Jürgen

    2010-09-01

    Ligand- and target structure-based methods are widely used in virtual screening, but there is currently no methodology available that fully integrates these different approaches. Herein, we provide an overview of various attempts that have been made to combine ligand- and structure-based computational screening methods. We then review different types of approaches that utilize protein-ligand interaction information for database screening and filtering. Interaction-based approaches make use of a variety of methodological concepts including pharmacophore modeling and direct or indirect encoding of protein-ligand interactions in fingerprint formats. These interaction-based methods have been successfully applied to tackle different tasks related to virtual screening including postprocessing of docking poses, prioritization of binding modes, selectivity analysis, or similarity searching. Furthermore, we discuss the recently developed interacting fragment approach that indirectly incorporates 3D interaction information into 2D similarity searching and bridges between ligand- and structure-based methods.

  8. Local alignment tool for clinical history: temporal semantic search of clinical databases.

    PubMed

    Lee, Wei-Nchih; Das, Amar K

    2010-11-13

    Data collected through electronic health records is increasingly used for clinical research purposes. Common research tasks like identifying treatment cohorts based on similar treatment histories, assessing adherence to protocol based care, or determining clinical 'best practices' can be difficult given the complex array of treatment choices and the longitudinal nature of patient care. We present a temporal sequence alignment strategy to find patients with similar treatment histories starting from their initial regimen. The algorithm relies on a user defined threshold heuristic to further reduce the search space in large clinical databases. It also uses an ontology based scoring schema to measure the distance between two clinical treatment histories. We validate the algorithm with a search for patients who are placed on a guideline recommended alternative regimen for HIV after failing initial ideal therapy. Our approach can be used to summarize patterns of care as well as predict outcomes of care.

  9. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    PubMed

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  10. The Relationship between Searches Performed in Online Databases and the Number of Full-Text Articles Accessed: Measuring the Interaction between Database and E-Journal Collections

    ERIC Educational Resources Information Center

    Lamothe, Alain R.

    2011-01-01

    The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…

  11. Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies

    PubMed Central

    2012-01-01

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes

  12. Analysis of the tryptic search space in UniProt databases

    PubMed Central

    Alpi, Emanuele; Griss, Johannes; da Silva, Alan Wilter Sousa; Bely, Benoit; Antunes, Ricardo; Zellner, Hermann; Ríos, Daniel; O'Donovan, Claire; Vizcaíno, Juan Antonio; Martin, Maria J

    2015-01-01

    In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes. PMID:25307260

  13. Analysis of the tryptic search space in UniProt databases.

    PubMed

    Alpi, Emanuele; Griss, Johannes; da Silva, Alan Wilter Sousa; Bely, Benoit; Antunes, Ricardo; Zellner, Hermann; Ríos, Daniel; O'Donovan, Claire; Vizcaíno, Juan Antonio; Martin, Maria J

    2015-01-01

    In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.

  14. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  15. Associations between chromosomal anomalies and congenital heart defects: a database search.

    PubMed

    van Karnebeek, C D; Hennekam, R C

    1999-05-21

    Recent technical advances in molecular biology and cytogenetics, as well as a more developmental approach to congenital heart disorders (CHDs), have led to considerable progress in our understanding of their pathogenesis, especially of the important causative role of genetic factors. The complex embryology of the heart suggests the involvement of numerous genes, and hence, numerous chromosomal loci, such as the recently identified 22q11, in normal cardiomorphogenesis. In order to identify other loci, the Human Cytogenetics DataBase was searched for all chromosome anomalies associated with CHD. Through the application of several (arbitrary) criteria we have selected associations occurring so frequently that they may not be forfuituous, suggesting assignment of a gene or genes responsible for specific CHDs to certain chromosome regions. The results of this study may be a first step in the detection of specific chromosome defects responsible for CHD, which will be useful in daily patient care and may provide clues for further cytogenetic and molecular studies.

  16. Integration of first-principles methods and crystallographic database searches for new ferroelectrics: Strategies and explorations

    NASA Astrophysics Data System (ADS)

    Bennett, Joseph W.; Rabe, Karin M.

    2012-11-01

    In this concept paper, the development of strategies for the integration of first-principles methods with crystallographic database mining for the discovery and design of novel ferroelectric materials is discussed, drawing on the results and experience derived from exploratory investigations on three different systems: (1) the double perovskite Sr(Sb1/2Mn1/2)O3 as a candidate semiconducting ferroelectric; (2) polar derivatives of schafarzikite MSb2O4; and (3) ferroelectric semiconductors with formula M2P2(S,Se)6. A variety of avenues for further research and investigation are suggested, including automated structure type classification, low-symmetry improper ferroelectrics, and high-throughput first-principles searches for additional representatives of structural families with desirable functional properties.

  17. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  18. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  19. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis. PMID:27155864

  20. Neuron-Miner: An Advanced Tool for Morphological Search and Retrieval in Neuroscientific Image Databases.

    PubMed

    Conjeti, Sailesh; Mesbah, Sepideh; Negahdar, Mohammadreza; Rautenberg, Philipp L; Zhang, Shaoting; Navab, Nassir; Katouzian, Amin

    2016-10-01

    The steadily growing amounts of digital neuroscientific data demands for a reliable, systematic, and computationally effective retrieval algorithm. In this paper, we present Neuron-Miner, which is a tool for fast and accurate reference-based retrieval within neuron image databases. The proposed algorithm is established upon hashing (search and retrieval) technique by employing multiple unsupervised random trees, collectively called as Hashing Forests (HF). The HF are trained to parse the neuromorphological space hierarchically and preserve the inherent neuron neighborhoods while encoding with compact binary codewords. We further introduce the inverse-coding formulation within HF to effectively mitigate pairwise neuron similarity comparisons, thus allowing scalability to massive databases with little additional time overhead. The proposed hashing tool has superior approximation of the true neuromorphological neighborhood with better retrieval and ranking performance in comparison to existing generalized hashing methods. This is exhaustively validated by quantifying the results over 31266 neuron reconstructions from Neuromorpho.org dataset curated from 147 different archives. We envisage that finding and ranking similar neurons through reference-based querying via Neuron Miner would assist neuroscientists in objectively understanding the relationship between neuronal structure and function for applications in comparative anatomy or diagnosis.

  1. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  2. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  3. deconSTRUCT: general purpose protein database search on the substructure level.

    PubMed

    Zhang, Zong Hong; Bharatham, Kavitha; Sherman, Westley A; Mihalek, Ivana

    2010-07-01

    deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.

  4. Complementary Value of Databases for Discovery of Scholarly Literature: A User Survey of Online Searching for Publications in Art History

    ERIC Educational Resources Information Center

    Nemeth, Erik

    2010-01-01

    Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…

  5. Searching for NEO precoveries in the PS1 and MPC databases

    NASA Astrophysics Data System (ADS)

    Weryk, Robert J.; Wainscoat, Richard J.

    2016-10-01

    The Pan-STARRS (PS1) survey telescope, operated by the University of Hawai`i, covers the sky north of -49 degrees declination with its seven square degree field-of-view. Described in detail by Wainscoat et al. (2015), it has become the leading telescope for new Near Earth Object (NEO) discoveries. In 2015, it found almost half of the new Near Earth Asteroids, as well as half of the new comets.Observations of potential NEOs must be followed up before they can be confirmed and announced as new discoveries, and we are dependent on the follow-up capabilities of other telescopes for this. However, not every NEO candidate is immediately followed up and linked into a well established orbit, possibly due to the fact that smaller bodies may not be visible at current instrument sensitivity limits for very long, or that their predicted orbits are too uncertain so follow-up telescopes look in the wrong location. But in certain cases, these objects may have been observed during previous lunations.We present a method to search for precovery detections in both the PS1 database, and the Isolated Tracklet File (ITF) provided by the Minor Planet Center (MPC). This file contains over 12 million detections mostly from the large surveys, which are not associated with any known objects. We demonstrate that multi-tracklet linkages for both known and unknown objects may be found in these databases, including detections for both NEOs and non-NEOs which often appear on the MPC's NEO Confirmation Page.[1] Wainscoat, R. et al., IAU Symposium 318, editors S. Chesley and R. Jedicke (2015)

  6. Introducing a New Interface for the Online MagIC Database by Integrating Data Uploading, Searching, and Visualization

    NASA Astrophysics Data System (ADS)

    Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.

    2013-12-01

    The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.

  7. Elimination of Duplicate Citations from Cross Database Searching Using an "Intelligent" Terminal to Produce Report Style Searches.

    ERIC Educational Resources Information Center

    Riley, Connie; And Others

    1981-01-01

    The Tarrytown Technical Information Center at General Foods produces report style searches by using upgraded equipment which allows search strategies to be stored and edited offline, thus reducing costs for both online searching and for the searcher's time. A computer program for eliminating duplicate bibliographic citations is included. (RBF)

  8. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling

    PubMed Central

    Bhattacharya, Debswapna; Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivation: Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named ‘foldons’ through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. Results: Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. Availability and Implementation: Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/. Contact: chengji@missouri.edu Supplementary information: Supplementary data are

  9. Probabilistic Risk Assessment: A Bibliography

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Probabilistic risk analysis is an integration of failure modes and effects analysis (FMEA), fault tree analysis and other techniques to assess the potential for failure and to find ways to reduce risk. This bibliography references 160 documents in the NASA STI Database that contain the major concepts, probabilistic risk assessment, risk and probability theory, in the basic index or major subject terms, An abstract is included with most citations, followed by the applicable subject terms.

  10. Semi-automated Search For Lyman-alpha And Other Emission Lines In The DEEP2 And DEEP3 Databases

    NASA Astrophysics Data System (ADS)

    McCormick, Katherine; Alvarez-Buylla, A.; Dean, V.; Guhathakurta, P.; Lai, K.; Sawicki, M.; Lemaux, B.; Grishaw-Jones, C.; DEEP2; DEEP3

    2012-01-01

    The DEEP2 and DEEP3 redshift surveys have obtained spectra of approximately 75,000 distant galaxies. In an effort to obtain as much information as possible from these spectra, we have performed a semi-automated, systematic search for emission lines in the DEEP2 and DEEP3 databases. The process is a two-step one: first, we run the SExtractor software on sky-subtracted 2D DEIMOS spectra to find emission lines and we categorize these emission lines based on whether they are associated with the target galaxy, single emission lines, possible artifacts resulting from poorly subtracted night sky emission lines, etc. Next, we supplement the automated search with both a guided and an unguided visual search and compare our findings with the output of the program. During this visual inspection process, we check the program for completeness and contamination. By introducing an automated element to the search we have compiled a more objective and complete census of the emission lines in the DEEP2 and DEEP3 databases than a pure visual search would yield. Our program has detected some faint emission lines that had been missed by the human eye. In addition, through our semi-automated search, we have located several possible serendipitous high redshift Lyman-alpha emitting galaxies in the redshift range of 3 to 7. This research was funded by the NSF and the Science Internship Program (SIP) at UCSC.

  11. Searching biosignal databases by content and context: Research Oriented Integration System for ECG Signals (ROISES).

    PubMed

    Kokkinaki, Alexandra; Chouvarda, Ioanna; Maglaveras, Nicos

    2012-11-01

    Technological advances in textile, biosensor and electrocardiography domain induced the wide spread use of bio-signal acquisition devices leading to the generation of massive bio-signal datasets. Among the most popular bio-signals, electrocardiogram (ECG) possesses the longest tradition in bio-signal monitoring and recording, being a strong and relatively robust signal. As research resources are fostered, research community promotes the need to extract new knowledge from bio-signals towards the adoption of new medical procedures. However, integrated access, query and management of ECGs are impeded by the diversity and heterogeneity of bio-signal storage data formats. In this scope, the proposed work introduces a new methodology for the unified access to bio-signal databases and the accompanying metadata. It allows decoupling information retrieval from actual underlying datasource structures and enables transparent content and context based searching from multiple data resources. Our approach is based on the definition of an interactive global ontology which manipulates the similarities and the differences of the underlying sources to either establish similarity mappings or enrich its terminological structure. We also introduce ROISES (Research Oriented Integration System for ECG Signals), for the definition of complex content based queries against the diverse bio-signal data sources. PMID:21397354

  12. In search of yoga: Research trends in a western medical database

    PubMed Central

    McCall, Marcy C

    2014-01-01

    Context: The promotion of yoga practice as a preventative and treatment therapy for health outcomes in the western hemisphere is increasing rapidly. As the commercial success of yoga burgeons in popular culture, it is important to investigate the trends of yoga as a therapeutic intervention in academic literature. The free-access search engine, PubMed is a preeminent resource to identify health-related research articles published for academics, health practitioners and others. Aims: To report the recent yoga-related publications in the western healthcare context with particular interest in the subject and type of yoga titles. Materials and Methods: A bibliometric analysis to describe the annual trends in publication on PubMed from January 1950 to December 2012. Results: The number of yoga-related titles included in the PubMed database is limited until a marked increase 2000 and steady surge since 2007. Bibliometric analysis indicates that more than 200 new titles are added per annum since 2011. Systematic reviews and yoga trials are increasing exponentially, indicating a potential increase in the quality of evidence. Titles including pain management, stress or anxiety, depression and cancer conditions are highly correlated with yoga and healthcare research. Conclusions: The prevalence of yoga research in western healthcare is increasing. The marked increase in volume indicates the need for more systematic analysis of the literature in terms of quality and results. PMID:25035601

  13. In search of a statistical probability model for petroleum-resource assessment : a critique of the probabilistic significance of certain concepts and methods used in petroleum-resource assessment : to that end, a probabilistic model is sketched

    USGS Publications Warehouse

    Grossling, Bernardo F.

    1975-01-01

    Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then

  14. Preparing College Students To Search Full-Text Databases: Is Instruction Necessary?

    ERIC Educational Resources Information Center

    Riley, Cheryl; Wales, Barbara

    Full-text databases allow Central Missouri State University's clients to access some of the serials that libraries have had to cancel due to escalating subscription costs; EbscoHost, the subject of this study, is one such database. The database is available free to all Missouri residents. A survey was designed consisting of 21 questions intended…

  15. Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach.

    PubMed

    Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R

    2011-11-01

    A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders.

  16. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.

    PubMed

    Bender, Andreas; Mussa, Hamse Y; Glen, Robert C; Reiling, Stephan

    2004-01-01

    A molecular similarity searching technique based on atom environments, information-gain-based feature selection, and the naive Bayesian classifier has been applied to a series of diverse datasets and its performance compared to those of alternative searching methods. Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecular structure. In this application, using a recently published dataset of more than 100000 molecules from the MDL Drug Data Report database, the atom environment approach appears to outperform fusion of ranking scores as well as binary kernel discrimination, which are both used in combination with Unity fingerprints. Overall retrieval rates among the top 5% of the sorted library are nearly 10% better (more than 14% better in relative numbers) than those of the second best method, Unity fingerprints and binary kernel discrimination. In 10 out of 11 sets of active compounds the combination of atom environments and the naive Bayesian classifier appears to be the superior method, while in the remaining dataset, data fusion and binary kernel discrimination in combination with Unity fingerprints is the method of choice. Binary kernel discrimination in combination with Unity fingerprints generally comes second in performance overall. The difference in performance can largely be attributed to the different molecular descriptors used. Atom environments outperform Unity fingerprints by a large margin if the combination of these descriptors with the Tanimoto coefficient is compared. The naive Bayesian classifier in combination with information-gain-based feature selection and selection of a sensible number of features performs about as well as binary kernel discrimination in experiments where these classification methods are compared. When used on a monoaminooxidase dataset, atom environments and the naive Bayesian classifier perform as well as binary kernel discrimination in the case of a 50

  17. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

    PubMed Central

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  18. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks.

    PubMed

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/. PMID:26657893

  19. HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks.

    PubMed

    Dai, Xinbin; Li, Jun; Liu, Tingsong; Zhao, Patrick Xuechun

    2016-01-01

    The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many 'unknown' yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.

  20. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases.

    PubMed

    Henzel, W J; Billeci, T M; Stults, J T; Wong, S C; Grimley, C; Watanabe, C

    1993-06-01

    A rapid method for the identification of known proteins separated by two-dimensional gel electrophoresis is described in which molecular masses of peptide fragments are used to search a protein sequence database. The peptides are generated by in situ reduction, alkylation, and tryptic digestion of proteins electroblotted from two-dimensional gels. Masses are determined at the subpicomole level by matrix-assisted laser desorption/ionization mass spectrometry of the unfractionated digest. A computer program has been developed that searches the protein sequence database for multiple peptides of individual proteins that match the measured masses. To ensure that the most recent database updates are included, a theoretical digest of the entire database is generated each time the program is executed. This method facilitates simultaneous processing of a large number of two-dimensional gel spots. The method was applied to a two-dimensional gel of a crude Escherichia coli extract that was electroblotted onto poly(vinylidene difluoride) membrane. Ten randomly chosen spots were analyzed. With as few as three peptide masses, each protein was uniquely identified from over 91,000 protein sequences. All identifications were verified by concurrent N-terminal sequencing of identical spots from a second blot. One of the spots contained an N-terminally blocked protein that required enzymatic cleavage, peptide separation, and Edman degradation for confirmation of its identity.

  1. Millennial Students' Mental Models of Search: Implications for Academic Librarians and Database Developers

    ERIC Educational Resources Information Center

    Holman, Lucy

    2011-01-01

    Today's students exhibit generational differences in the way they search for information. Observations of first-year students revealed a proclivity for simple keyword or phrases searches with frequent misspellings and incorrect logic. Although no students had strong mental models of search mechanisms, those with stronger models did construct more…

  2. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases

    ERIC Educational Resources Information Center

    Walters, William H.

    2011-01-01

    This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23…

  3. Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

    SciTech Connect

    Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H; Uberbacher, Edward C; Leuze, Michael Rex

    2010-01-01

    The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.

  4. Review and Comparison of the Search Effectiveness and User Interface of Three Major Online Chemical Databases

    ERIC Educational Resources Information Center

    Bharti, Neelam; Leonard, Michelle; Singh, Shailendra

    2016-01-01

    Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…

  5. Reach for Reference. Don't Judge a Database by Its Search Screen

    ERIC Educational Resources Information Center

    Safford, Barbara Ripp

    2005-01-01

    In this column, the author provides a description and brief review of the "Children's Literature Comprehensive Database" (CLCD). This subscription database is a 1999 spinoff from Marilyn Courtot's "Children's Literature" website, which began in 1993 and is a free resource of reviews and features about books, authors, and illustrators. The separate…

  6. Lead generation using pharmacophore mapping and three-dimensional database searching: application to muscarinic M(3) receptor antagonists.

    PubMed

    Marriott, D P; Dougall, I G; Meghani, P; Liu, Y J; Flower, D R

    1999-08-26

    By using a pharmacophore model, a geometrical representation of the features necessary for molecules to show a particular biological activity, it is possible to search databases containing the 3D structures of molecules and identify novel compounds which may possess this activity. We describe our experiences of establishing a working 3D database system and its use in rational drug design. By using muscarinic M(3) receptor antagonists as an example, we show that it is possible to identify potent novel lead compounds using this approach. Pharmacophore generation based on the structures of known M(3) receptor antagonists, 3D database searching, and medium-throughput screening were used to identify candidate compounds. Three compounds were chosen to define the pharmacophore: a lung-selective M(3) antagonist patented by Pfizer and two Astra compounds which show affinity at the M(3) receptor. From these, a pharmacophore model was generated, using the program DISCO, and this was used subsequently to search a UNITY 3D database of proprietary compounds; 172 compounds were found to fit the pharmacophore. These compounds were then screened, and 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone (pA(2) 6.67) was identified as the best hit, with N-[2-(piperidin-1-ylmethyl)cycohexyl]-2-propoxybenz amide (pA(2) 4. 83) and phenylcarbamic acid 2-(morpholin-4-ylmethyl)cyclohexyl ester (pA(2) 5.54) demonstrating lower activity. As well as its potency, 1-[2-(2-(diethylamino)ethoxy)phenyl]-2-phenylethanone is a simple structure with limited similarity to existing M(3) receptor antagonists.

  7. Boolean Logic: An Aid for Searching Computer Databases in Special Education and Rehabilitation.

    ERIC Educational Resources Information Center

    Summers, Edward G.

    1989-01-01

    The article discusses using Boolean logic as a tool for searching computerized information retrieval systems in special education and rehabilitation technology. It includes discussion of the Boolean search operators AND, OR, and NOT; Venn diagrams; and disambiguating parentheses. Six suggestions are offered for development of good Boolean logic…

  8. SimShiftDB; local conformational restraints derived from chemical shift similarity searches on a large synthetic database.

    PubMed

    Ginzinger, Simon W; Coles, Murray

    2009-03-01

    We present SimShiftDB, a new program to extract conformational data from protein chemical shifts using structural alignments. The alignments are obtained in searches of a large database containing 13,000 structures and corresponding back-calculated chemical shifts. SimShiftDB makes use of chemical shift data to provide accurate results even in the case of low sequence similarity, and with even coverage of the conformational search space. We compare SimShiftDB to HHSearch, a state-of-the-art sequence-based search tool, and to TALOS, the current standard tool for the task. We show that for a significant fraction of the predicted similarities, SimShiftDB outperforms the other two methods. Particularly, the high coverage afforded by the larger database often allows predictions to be made for residues not involved in canonical secondary structure, where TALOS predictions are both less frequent and more error prone. Thus SimShiftDB can be seen as a complement to currently available methods.

  9. Supervised learning of tools for content-based search of image databases

    NASA Astrophysics Data System (ADS)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  10. Combining history of medicine and library instruction: an innovative approach to teaching database searching to medical students.

    PubMed

    Timm, Donna F; Jones, Dee; Woodson, Deidra; Cyrus, John W

    2012-01-01

    Library faculty members at the Health Sciences Library at the LSU Health Shreveport campus offer a database searching class for third-year medical students during their surgery rotation. For a number of years, students completed "ten-minute clinical challenges," but the instructors decided to replace the clinical challenges with innovative exercises using The Edwin Smith Surgical Papyrus to emphasize concepts learned. The Surgical Papyrus is an online resource that is part of the National Library of Medicine's "Turning the Pages" digital initiative. In addition, vintage surgical instruments and historic books are displayed in the classroom to enhance the learning experience.

  11. Combining history of medicine and library instruction: an innovative approach to teaching database searching to medical students.

    PubMed

    Timm, Donna F; Jones, Dee; Woodson, Deidra; Cyrus, John W

    2012-01-01

    Library faculty members at the Health Sciences Library at the LSU Health Shreveport campus offer a database searching class for third-year medical students during their surgery rotation. For a number of years, students completed "ten-minute clinical challenges," but the instructors decided to replace the clinical challenges with innovative exercises using The Edwin Smith Surgical Papyrus to emphasize concepts learned. The Surgical Papyrus is an online resource that is part of the National Library of Medicine's "Turning the Pages" digital initiative. In addition, vintage surgical instruments and historic books are displayed in the classroom to enhance the learning experience. PMID:22853300

  12. Online/CD-ROM Bibliographic Database Searching in a Small Academic Library.

    ERIC Educational Resources Information Center

    Pitet, Lynn T.

    The purpose of the project described in this paper was to gather information about online/CD-ROM database systems that would be useful in improving the services offered at the University of Findlay, a small private liberal arts college in northwestern Ohio. A survey was sent to 67 libraries serving colleges similar in size which included questions…

  13. Online Searching of Bibliographic Databases: Microcomputer Access to National Information Systems.

    ERIC Educational Resources Information Center

    Coons, Bill

    This paper describes the range and scope of various information databases available for technicians, researchers, and managers employed in forestry and the forest products industry. Availability of information on reports of field and laboratory research, business trends, product prices, and company profiles through national distributors of…

  14. Searching Reference Databases: What Students Experience and What Teachers Believe that Students Experience

    ERIC Educational Resources Information Center

    Avdic, Anders; Eklund, Anders

    2010-01-01

    The Internet has made it possible for students to access a vast amount of high-quality references when writing papers. Yet research has shown that the use of reference databases is poor and the quality of student papers is consequently often below expectation. The objective of this article is twofold. First, it aims to describe the problems…

  15. Searching the Cambridge Structural Database for the 'best' representative of each unique polymorph.

    PubMed

    van de Streek, Jacco

    2006-08-01

    A computer program has been written that removes suspicious crystal structures from the Cambridge Structural Database and clusters the remaining crystal structures as polymorphs or redeterminations. For every set of redeterminations, one crystal structure is selected to be the best representative of that polymorph. The results, 243,355 well determined crystal structures grouped by unique polymorph, are presented and analysed. PMID:16840806

  16. Exchange, interpretation, and database-search of ion mobility spectra supported by data format JCAMP-DX

    NASA Technical Reports Server (NTRS)

    Baumback, J. I.; Davies, A. N.; Vonirmer, A.; Lampen, P. H.

    1995-01-01

    To assist peak assignment in ion mobility spectrometry it is important to have quality reference data. The reference collection should be stored in a database system which is capable of being searched using spectral or substance information. We propose to build such a database customized for ion mobility spectra. To start off with it is important to quickly reach a critical mass of data in the collection. We wish to obtain as many spectra combined with their IMS parameters as possible. Spectra suppliers will be rewarded for their participation with access to the database. To make the data exchange between users and system administration possible, it is important to define a file format specially made for the requirements of ion mobility spectra. The format should be computer readable and flexible enough for extensive comments to be included. In this document we propose a data exchange format, and we would like you to give comments on it. For the international data exchange it is important, to have a standard data exchange format. We propose to base the definition of this format on the JCAMP-DX protocol, which was developed for the exchange of infrared spectra. This standard made by the Joint Committee on Atomic and Molecular Physical Data is of a flexible design. The aim of this paper is to adopt JCAMP-DX to the special requirements of ion mobility spectra.

  17. The "Clipping Thesis": An Exercise in Developing Critical Thinking and Online Database Searching Skills.

    ERIC Educational Resources Information Center

    Minnich, Nancy P.; McCarthy, Carrol B.

    1986-01-01

    Designed to help high school students develop critical thinking and writing skills, the "Clipping Thesis" project requires students to find newspaper and journal articles on a given topic through printed indexes or online searching, read the articles, write brief and final summaries of their readings, and compile a bibliography. (EM)

  18. SCOOP: A Measurement and Database of Student Online Search Behavior and Performance

    ERIC Educational Resources Information Center

    Zhou, Mingming

    2015-01-01

    The ability to access and process massive amounts of online information is required in many learning situations. In order to develop a better understanding of student online search process especially in academic contexts, an online tool (SCOOP) is developed for tracking mouse behavior on the web to build a more extensive account of student web…

  19. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    ERIC Educational Resources Information Center

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  20. A Search for Nontriggered Gamma-Ray Bursts in the BATSE Database

    NASA Technical Reports Server (NTRS)

    Kommers, Jefferson M.; Lewin, Walter H. G.; Kouveliotou, Chryssa; VanParadus, Jan; Pendleton, Geoffrey N.; Meegan, Charles A.; Fishman, Gerald J.

    1997-01-01

    We describe a search of archival data from the Burst and Transient Source Experiment (BATSE). The purpose of the search is to find astronomically interesting transients that did not activate the burst-detection (or "trigger") system on board the spacecraft. Our search is sensitive to events with peak fluxes (on the 1.024 s timescale) that are lower by a factor of approximately 2 than can be detected with the on-board burst trigger. In a search of 345 days of archival data, we detected 91 events in the 50-300 keV range that resemble classical gamma-ray bursts but that did not activate the on-board burst trigger. We also detected 110 low-energy (25-50 keV) events of unknown origin that may include activity from' soft gamma repeater (SGR) 1806-20 and bursts and flares from X-ray binaries. This paper gives the occurrence times, estimated source directions, durations, peak fluxes, and fluences for the 91 gamma-ray burst candidates. The direction and intensity distributions of these bursts imply that the biases inherent in the on-board trigger mechanism have not significantly affected the completeness of the published BATSE gamma-ray burst catalogs.

  1. Searching the UVSP database and a list of experiments showing mass motions

    NASA Technical Reports Server (NTRS)

    Thompson, William

    1986-01-01

    Since the Solar Maximum Mission (SMM) satellite was launched, a large database has been built up of experiments using the Ultraviolet Spectrometer and Polarimeter (UVSP) instrument. Access to this database can be gained through the SMM Vax 750 computer at Goddard Space Flight Center. One useful way to do this is with a program called USEARCH. This program allows one to make a listing of different types of UVSP experiments. It is evident that this program is useful to those who would wish to make use of UVSP data, but who don't know what data is available. Therefore it was decided to include a short description of how to make use of the USEARCH program. Also described, but not included, is a listing of all UVSP experiments showing mass motions in prominences and filaments. This list was made with the aid of the USEARCH program.

  2. Toward Non-Ergodic and Site-Specific Probabilistic Seismic Hazard Assessment: Requirements for the Next Generation of Strong Motion Network and Databases.

    NASA Astrophysics Data System (ADS)

    Cotton, F.; Ktenidou, O. J.; Derras, B.; Roumelioti, Z.; Pierre-Yves, B.; Hollender, F.

    2014-12-01

    Ground-motion models used in engineering seismology are usually calibrated on global databases that are usually created by mixing data from different regions. These models also assume that the ground-motion variability observed in a global dataset is the same as the variability in ground motion at a single site-source combination. This assumption is referred to as the ergodic assumption. New data give a unique opportunity to remove the ergodic assumption and take into account regional source, path and site specificities. Using recent data analysis performed on the EUROSEISTEST valley (Greece) and global ground-motion datasets (Kiknet, Knet, NGA2 and the European strong-motion databases) we will show the impact of source parameters, site monitoring and site-characterisation on the uncertainty of the ground motion estimates and associated hazard curves. Our results suggest that future strong-motion networks should use higher sampling rates (to better evaluate site-specific high frequency attenuations) and record both strong and weak motions (to evaluate single-station sigma). These results also quantify the impact of a better characterisation of source parameters (depth, fault maturity, source to site distances) ans site parameters on ground-motion models. We finally will show how new networks and high-level strong-motion databases may help to built consistent ergodic PSHA at a regional scale and non-ergodic, site specific, PSHA.

  3. On-line biomedical databases-the best source for quick search of the scientific information in the biomedicine.

    PubMed

    Masic, Izet; Milinovic, Katarina

    2012-06-01

    Most of medical journals now has it's electronic version, available over public networks. Although there are parallel printed and electronic versions, and one other form need not to be simultaneously published. Electronic version of a journal can be published a few weeks before the printed form and must not has identical content. Electronic form of a journals may have an extension that does not contain a printed form, such as animation, 3D display, etc., or may have available fulltext, mostly in PDF or XML format, or just the contents or a summary. Access to a full text is usually not free and can be achieved only if the institution (library or host) enters into an agreement on access. Many medical journals, however, provide free access for some articles, or after a certain time (after 6 months or a year) to complete content. The search for such journals provide the network archive as High Wire Press, Free Medical Journals.com. It is necessary to allocate PubMed and PubMed Central, the first public digital archives unlimited collect journals of available medical literature, which operates in the system of the National Library of Medicine in Bethesda (USA). There are so called on- line medical journals published only in electronic form. It could be searched over on-line databases. In this paper authors shortly described about 30 data bases and short instructions how to make access and search the published papers in indexed medical journals. PMID:23322957

  4. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra

    NASA Astrophysics Data System (ADS)

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  5. Crescendo: A Protein Sequence Database Search Engine for Tandem Mass Spectra.

    PubMed

    Wang, Jianqi; Zhang, Yajie; Yu, Yonghao

    2015-07-01

    A search engine that discovers more peptides reliably is essential to the progress of the computational proteomics. We propose two new scoring functions (L- and P-scores), which aim to capture similar characteristics of a peptide-spectrum match (PSM) as Sequest and Comet do. Crescendo, introduced here, is a software program that implements these two scores for peptide identification. We applied Crescendo to test datasets and compared its performance with widely used search engines, including Mascot, Sequest, and Comet. The results indicate that Crescendo identifies a similar or larger number of peptides at various predefined false discovery rates (FDR). Importantly, it also provides a better separation between the true and decoy PSMs, warranting the future development of a companion post-processing filtering algorithm.

  6. Image content engine (ICE): a system for fast image database searches

    NASA Astrophysics Data System (ADS)

    Brase, James M.; Poland, Douglas N.; Paglieroni, David W.; Weinert, George F.; Grant, Charles W.; Lopez, Aseneth S.; Nikolaev, Sergei

    2005-05-01

    The Image Content Engine (ICE) is being developed to provide cueing assistance to human image analysts faced with increasingly large and intractable amounts of image data. The ICE architecture includes user configurable feature extraction pipelines which produce intermediate feature vector and match surface files which can then be accessed by interactive relational queries. Application of the feature extraction algorithms to large collections of images may be extremely time consuming and is launched as a batch job on a Linux cluster. The query interface accesses only the intermediate files and returns candidate hits nearly instantaneously. Queries may be posed for individual objects or collections. The query interface prompts the user for feedback, and applies relevance feedback algorithms to revise the feature vector weighting and focus on relevant search results. Examples of feature extraction and both model-based and search-by-example queries are presented.

  7. Image Content Engine (ICE): A System for Fast Image Database Searches

    SciTech Connect

    Brase, J M; Paglieroni, D W; Weinert, G F; Grant, C W; Lopez, A S; Nikolaev, S

    2005-03-22

    The Image Content Engine (ICE) is being developed to provide cueing assistance to human image analysts faced with increasingly large and intractable amounts of image data. The ICE architecture includes user configurable feature extraction pipelines which produce intermediate feature vector and match surface files which can then be accessed by interactive relational queries. Application of the feature extraction algorithms to large collections of images may be extremely time consuming and is launched as a batch job on a Linux cluster. The query interface accesses only the intermediate files and returns candidate hits nearly instantaneously. Queries may be posed for individual objects or collections. The query interface prompts the user for feedback, and applies relevance feedback algorithms to revise the feature vector weighting and focus on relevant search results. Examples of feature extraction and both model-based and search-by-example queries are presented.

  8. Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.

    PubMed

    Nahnsen, Sven; Bertsch, Andreas; Rahnenführer, Jörg; Nordheim, Alfred; Kohlbacher, Oliver

    2011-08-01

    Database search is a standard technique for identifying peptides from their tandem mass spectra. To increase the number of correctly identified peptides, we suggest a probabilistic framework that allows the combination of scores from different search engines into a joint consensus score. Central to the approach is a novel method to estimate scores for peptides not found by an individual search engine. This approach allows the estimation of p-values for each candidate peptide and their combination across all search engines. The consensus approach works better than any single search engine across all different instrument types considered in this study. Improvements vary strongly from platform to platform and from search engine to search engine. Compared to the industry standard MASCOT, our approach can identify up to 60% more peptides. The software for consensus predictions is implemented in C++ as part of OpenMS, a software framework for mass spectrometry. The source code is available in the current development version of OpenMS and can easily be used as a command line application or via a graphical pipeline designer TOPPAS.

  9. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  10. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). NewSearch

    SciTech Connect

    Not Available

    1994-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  11. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1996-10-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  12. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1995-09-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  13. Chemical and biological warfare: General studies. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  14. Chemical and biological warfare: General studies. (Latest citations from the NTIS Bibliographic database). Published Search

    SciTech Connect

    Not Available

    1993-11-01

    The bibliography contains citations concerning federally sponsored and conducted studies into chemical and biological warfare operations and planning. These studies cover areas not addressed in other parts of this series. The topics include production and storage of agents, delivery techniques, training, military and civil defense, general planning studies, psychological reactions to chemical warfare, evaluations of materials exposed to chemical agents, and studies on banning or limiting chemical warfare. Other published searches in this series on chemical warfare cover detection and warning, defoliants, protection, and biological studies, including chemistry and toxicology. (Contains 250 citations and includes a subject term index and title list.)

  15. Genetic Networks of Complex Disorders: from a Novel Search Engine for PubMed Article Database.

    PubMed

    Jung, Jae-Yoon; Wall, Dennis Paul

    2013-01-01

    Finding genetic risk factors of complex disorders may involve reviewing hundreds of genes or thousands of research articles iteratively, but few tools have been available to facilitate this procedure. In this work, we built a novel publication search engine that can identify target-disorder specific, genetics-oriented research articles and extract the genes with significant results. Preliminary test results showed that the output of this engine has better coverage in terms of genes or publications, than other existing applications. We consider it as an essential tool for understanding genetic networks of complex disorders.

  16. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ. PMID:26286455

  17. FTP-Server for exchange, interpretation, and database-search of ion mobility spectra, literature, preprints and software

    NASA Technical Reports Server (NTRS)

    Baumbach, J. I.; Vonirmer, A.

    1995-01-01

    To assist current discussion in the field of ion mobility spectrometry, at the Institut fur Spectrochemie und angewandte Spektroskopie, Dortmund, start with 4th of December, 1994 work of an FTP-Server, available for all research groups at univerisities, institutes and research worker in industry. We support the exchange, interpretation, and database-search of ion mobility spectra through data format JCAMP-DS (Joint Committee on Atomic and Molecular Physical Data) as well as literature retrieval, pre-print, notice, and discussion board. We describe in general lines the entrance conditions, local addresses, and main code words. For further details, a monthly news report will be prepared for all common users. Internet email address for subscribing is included in document.

  18. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching.

    PubMed

    Yates, John R

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.

  19. An ultra-tolerant database search reveals that a myriad of modified peptides contributes to unassigned spectra in shotgun proteomics

    PubMed Central

    Chick, Joel M.; Kolippakkam, Deepak; Nusinow, David P.; Zhai, Bo; Rad, Ramin; Huttlin, Edward L.; Gygi, Steven P.

    2015-01-01

    Fewer than half of all tandem mass spectrometry (MS/MS) spectra acquired in shotgun proteomics experiments are typically matched to a peptide with high confidence. Here we determine the identity of unassigned peptides using an ultra-tolerant Sequest database search that allows peptide matching even with modifications of unknown masses up to ±500 Da. In a proteome-wide dataset on HEK293 cells (9,513 proteins and 396,736 peptides), this approach matched an additional 184,000 modified peptides, which were linked to biological and chemical modifications representing 523 distinct mass bins, including phosphorylation, glycosylation, and methylation. We localized all unknown modification masses to specific regions within a peptide. Known modifications were assigned to the correct amino acids with frequencies often >90%. We conclude that at least one third of unassigned spectra arise from peptides with substoichiometric modifications. PMID:26076430

  20. Pivotal Role of Computers and Software in Mass Spectrometry - SEQUEST and 20 Years of Tandem MS Database Searching

    NASA Astrophysics Data System (ADS)

    Yates, John R.

    2015-11-01

    Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures.

  1. A Filtered Database Search Algorithm for Endogenous Serum Protein Carbonyl Modifications in a Mouse Model of Inflammation*

    PubMed Central

    Slade, Peter G.; Williams, Michelle V.; Chiang, Alison; Iffrig, Elizabeth; Tannenbaum, Steven R.; Wishnok, John S.

    2011-01-01

    During inflammation, the resulting oxidative stress can damage surrounding host tissue, forming protein-carbonyls. The SJL mouse is an experimental animal model used to assess in vivo toxicological responses to reactive oxygen and nitrogen species from inflammation. The goals of this study were to identify the major serum proteins modified with a carbonyl functionality and to identify the types of carbonyl adducts. To select for carbonyl-modified proteins, serum proteins were reacted with an aldehyde reactive probe that biotinylated the carbonyl modification. Modified proteins were enriched by avidin affinity and identified by two-dimensional liquid chromatography tandem MS. To identify the carbonyl modification, tryptic peptides from serum proteins were subjected to avidin affinity and the enriched modified peptides were analyzed by liquid chromatography tandem MS. It was noted that the aldehyde reactive probe tag created tag-specific fragment ions and neutral losses, and these extra features in the mass spectra inhibited identification of the modified peptides by database searching. To enhance the identification of carbonyl-modified peptides, a program was written that used the tag-specific fragment ions as a fingerprint (in silico filter program) and filtered the mass spectrometry data to highlight only modified peptides. A de novo-like database search algorithm was written (biotin peptide identification program) to identify the carbonyl-modified peptides. Although written specifically for our experiments, this software can be adapted to other modification and enrichment systems. Using these routines, a number of lipid peroxidation-derived protein carbonyls and direct side-chain oxidation proteins carbonyls were identified in SJL mouse serum. PMID:21768395

  2. Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

    PubMed

    Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

    2013-10-01

    This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing.

  3. Searching the databases: a quick look at Amazon and two other online catalogues.

    PubMed

    Potts, Hilary

    2003-01-01

    The Amazon Online Catalogue was compared with the Library of Congress Catalogue and the British Library Catalogue, both also available online, by searching on both neutral (Gay, Lesbian, Homosexual) and pejorative (Perversion, Sex Crime) subject terms, and also by searches using Boolean logic in an attempt to identify Lesbian Fiction items and religion-based anti-gay material. Amazon was much more likely to be the first port of call for non-academic enquiries. Although excluding much material necessary for academic research, it carried more information about the individual books and less historical homophobic baggage in its terminology than the great national catalogues. Its back catalogue of second-hand books outnumbered those in print. Current attitudes may partially be gauged by the relative numbers of titles published under each heading--e.g., there may be an inverse relationship between concern about child sex abuse and homophobia, more noticeable in U.S. because of the activities of the religious right. PMID:14567657

  4. Searching the databases: a quick look at Amazon and two other online catalogues.

    PubMed

    Potts, Hilary

    2003-01-01

    The Amazon Online Catalogue was compared with the Library of Congress Catalogue and the British Library Catalogue, both also available online, by searching on both neutral (Gay, Lesbian, Homosexual) and pejorative (Perversion, Sex Crime) subject terms, and also by searches using Boolean logic in an attempt to identify Lesbian Fiction items and religion-based anti-gay material. Amazon was much more likely to be the first port of call for non-academic enquiries. Although excluding much material necessary for academic research, it carried more information about the individual books and less historical homophobic baggage in its terminology than the great national catalogues. Its back catalogue of second-hand books outnumbered those in print. Current attitudes may partially be gauged by the relative numbers of titles published under each heading--e.g., there may be an inverse relationship between concern about child sex abuse and homophobia, more noticeable in U.S. because of the activities of the religious right.

  5. Utility of rapid database searching for quality assurance: 'detective work' in uncovering radiology coding and billing errors

    NASA Astrophysics Data System (ADS)

    Horii, Steven C.; Kim, Woojin; Boonn, William; Iyoob, Christopher; Maston, Keith; Coleman, Beverly G.

    2011-03-01

    When the first quarter of 2010 Department of Radiology statistics were provided to the Section Chiefs, the authors (SH, BC) were alarmed to discover that Ultrasound showed a decrease of 2.5 percent in billed examinations. This seemed to be in direct contradistinction to the experience of the ultrasound faculty members and sonographers. Their experience was that they were far busier than during the same quarter of 2009. The one exception that all acknowledged was the month of February, 2010 when several major winter storms resulted in a much decreased Hospital admission and Emergency Department visit rate. Since these statistics in part help establish priorities for capital budget items, professional and technical staffing levels, and levels of incentive salary, they are taken very seriously. The availability of a desktop, Web-based RIS database search tool developed by two of the authors (WK, WB) and built-in database functions of the ultrasound miniPACS, made it possible for us very rapidly to develop and test hypotheses for why the number of billable examinations was declining in the face of what experience told the authors was an increasing number of examinations being performed. Within a short time, we identified the major cause as errors on the part of the company retained to verify billable Current Procedural Terminology (CPT) codes against ultrasound reports. This information is being used going forward to recover unbilled examinations and take measures to reduce or eliminate the types of coding errors that resulted in the problem.

  6. TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites.

    PubMed

    Wallace, A C; Borkakoti, N; Thornton, J M

    1997-11-01

    It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions.

  7. Comparison of fixed-fee Grateful Med database use and searching success rates given the continued availability of MEDLINE in other formats.

    PubMed Central

    Blecic, D D

    1996-01-01

    This study compared in-house Grateful Med database use and searching success rates for four-month periods at four sites of the Library of the Health Sciences of the University of Illinois at Chicago. Data were collected from Grateful Med workstation uselogs and analyzed. Database use patterns and searching success rates were fairly consistent across the four sites. Though MEDLINE was available in other formats and use in these other formats remained high, 65.48% of all Grateful Med searching was done by using MEDLINE, and an additional 21.34% was done by using the MEDLINE Backfiles. In-house use patterns were similar to the overall use pattern for the University of Illinois, with the exception of MEDLINE Backfiles. Overall, 54.30% of searches were successful. PMID:8913553

  8. ANDY: A general, fault-tolerant tool for database searching oncomputer clusters

    SciTech Connect

    Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Summary: ANDY (seArch coordination aND analYsis) is a set ofPerl programs and modules for distributing large biological databasesearches, and in general any sequence of commands, across the nodes of aLinux computer cluster. ANDY is compatible with several commonly usedDistributed Resource Management (DRM) systems, and it can be easilyextended to new DRMs. A distinctive feature of ANDY is the choice ofeither dedicated or fair-use operation: ANDY is almost as efficient assingle-purpose tools that require a dedicated cluster, but it runs on ageneral-purpose cluster along with any other jobs scheduled by a DRM.Other features include communication through named pipes for performance,flexible customizable routines for error-checking and summarizingresults, and multiple fault-tolerance mechanisms. Availability: ANDY isfreely available and may be obtained fromhttp://compbio.berkeley.edu/proj/andy; this site also containssupplemental data and figures and amore detailed overview of thesoftware.

  9. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    PubMed

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  10. Uploading, Searching and Visualizing of Paleomagnetic and Rock Magnetic Data in the Online MagIC Database

    NASA Astrophysics Data System (ADS)

    Minnett, R.; Koppers, A.; Tauxe, L.; Constable, C.; Donadini, F.

    2007-12-01

    The Magnetics Information Consortium (MagIC) is commissioned to implement and maintain an online portal to a relational database populated by both rock and paleomagnetic data. The goal of MagIC is to archive all available measurements and derived properties from paleomagnetic studies of directions and intensities, and for rock magnetic experiments (hysteresis, remanence, susceptibility, anisotropy). MagIC is hosted under EarthRef.org at http://earthref.org/MAGIC/ and will soon implement two search nodes, one for paleomagnetism and one for rock magnetism. Currently the PMAG node is operational. Both nodes provide query building based on location, reference, methods applied, material type and geological age, as well as a visual map interface to browse and select locations. Users can also browse the database by data type or by data compilation to view all contributions associated with well known earlier collections like PINT, GMPDB or PSVRL. The query result set is displayed in a digestible tabular format allowing the user to descend from locations to sites, samples, specimens and measurements. At each stage, the result set can be saved and, where appropriate, can be visualized by plotting global location maps, equal area, XY, age, and depth plots, or typical Zijderveld, hysteresis, magnetization and remanence diagrams. User contributions to the MagIC database are critical to achieving a useful research tool. We have developed a standard data and metadata template (version 2.3) that can be used to format and upload all data at the time of publication in Earth Science journals. Software tools are provided to facilitate population of these templates within Microsoft Excel. These tools allow for the import/export of text files and provide advanced functionality to manage and edit the data, and to perform various internal checks to maintain data integrity and prepare for uploading. The MagIC Contribution Wizard at http://earthref.org/MAGIC/upload.htm executes the upload

  11. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    SciTech Connect

    Dogget, N.; Myers, G.; Wills, C.J.

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  12. Astrobiological Complexity with Probabilistic Cellular Automata

    NASA Astrophysics Data System (ADS)

    Vukotić, Branislav; Ćirković, Milan M.

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  13. Astrobiological complexity with probabilistic cellular automata.

    PubMed

    Vukotić, Branislav; Ćirković, Milan M

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches.

  14. Astrobiological complexity with probabilistic cellular automata.

    PubMed

    Vukotić, Branislav; Ćirković, Milan M

    2012-08-01

    The search for extraterrestrial life and intelligence constitutes one of the major endeavors in science, but has yet been quantitatively modeled only rarely and in a cursory and superficial fashion. We argue that probabilistic cellular automata (PCA) represent the best quantitative framework for modeling the astrobiological history of the Milky Way and its Galactic Habitable Zone. The relevant astrobiological parameters are to be modeled as the elements of the input probability matrix for the PCA kernel. With the underlying simplicity of the cellular automata constructs, this approach enables a quick analysis of large and ambiguous space of the input parameters. We perform a simple clustering analysis of typical astrobiological histories with "Copernican" choice of input parameters and discuss the relevant boundary conditions of practical importance for planning and guiding empirical astrobiological and SETI projects. In addition to showing how the present framework is adaptable to more complex situations and updated observational databases from current and near-future space missions, we demonstrate how numerical results could offer a cautious rationale for continuation of practical SETI searches. PMID:22832998

  15. PhenoMeter: A Metabolome Database Search Tool Using Statistical Similarity Matching of Metabolic Phenotypes for High-Confidence Detection of Functional Links

    PubMed Central

    Carroll, Adam J.; Zhang, Peng; Whitehead, Lynne; Kaines, Sarah; Tcherkez, Guillaume; Badger, Murray R.

    2015-01-01

    This article describes PhenoMeter (PM), a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PM score, was a function of both Pearson correlation and Fisher’s Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher’s Exact Test used alone. To demonstrate general applicability, we show that the PM reliably retrieved the most closely functionally linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PM is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php). PMID:26284240

  16. Reprint of "pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data".

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-11-01

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.This article is part of a Special Issue entitled: Computational Proteomics.

  17. Automatic sorting of toxicological information into the IUCLID (International Uniform Chemical Information Database) endpoint-categories making use of the semantic search engine Go3R.

    PubMed

    Sauer, Ursula G; Wächter, Thomas; Hareng, Lars; Wareing, Britta; Langsch, Angelika; Zschunke, Matthias; Alvers, Michael R; Landsiedel, Robert

    2014-06-01

    The knowledge-based search engine Go3R, www.Go3R.org, has been developed to assist scientists from industry and regulatory authorities in collecting comprehensive toxicological information with a special focus on identifying available alternatives to animal testing. The semantic search paradigm of Go3R makes use of expert knowledge on 3Rs methods and regulatory toxicology, laid down in the ontology, a network of concepts, terms, and synonyms, to recognize the contents of documents. Search results are automatically sorted into a dynamic table of contents presented alongside the list of documents retrieved. This table of contents allows the user to quickly filter the set of documents by topics of interest. Documents containing hazard information are automatically assigned to a user interface following the endpoint-specific IUCLID5 categorization scheme required, e.g. for REACH registration dossiers. For this purpose, complex endpoint-specific search queries were compiled and integrated into the search engine (based upon a gold standard of 310 references that had been assigned manually to the different endpoint categories). Go3R sorts 87% of the references concordantly into the respective IUCLID5 categories. Currently, Go3R searches in the 22 million documents available in the PubMed and TOXNET databases. However, it can be customized to search in other databases including in-house databanks.

  18. pFind-Alioth: A novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data.

    PubMed

    Chi, Hao; He, Kun; Yang, Bing; Chen, Zhen; Sun, Rui-Xiang; Fan, Sheng-Bo; Zhang, Kun; Liu, Chao; Yuan, Zuo-Fei; Wang, Quan-Hui; Liu, Si-Qi; Dong, Meng-Qiu; He, Si-Min

    2015-07-01

    Database search is the dominant approach in high-throughput proteomic analysis. However, the interpretation rate of MS/MS spectra is very low in such a restricted mode, which is mainly due to unexpected modifications and irregular digestion types. In this study, we developed a new algorithm called Alioth, to be integrated into the search engine of pFind, for fast and accurate unrestricted database search on high-resolution MS/MS data. An ion index is constructed for both peptide precursors and fragment ions, by which arbitrary digestions and a single site of any modifications and mutations can be searched efficiently. A new re-ranking algorithm is used to distinguish the correct peptide-spectrum matches from random ones. The algorithm is tested on several HCD datasets and the interpretation rate of MS/MS spectra using Alioth is as high as 60%-80%. Peptides from semi- and non-specific digestions, as well as those with unexpected modifications or mutations, can be effectively identified using Alioth and confidently validated using other search engines. The average processing speed of Alioth is 5-10 times faster than some other unrestricted search engines and is comparable to or even faster than the restricted search algorithms tested.

  19. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  20. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%.

  1. Evidential significance of automotive paint trace evidence using a pattern recognition based infrared library search engine for the Paint Data Query Forensic Database.

    PubMed

    Lavine, Barry K; White, Collin G; Allen, Matthew D; Fasasi, Ayuba; Weakley, Andrew

    2016-10-01

    A prototype library search engine has been further developed to search the infrared spectral libraries of the paint data query database to identify the line and model of a vehicle from the clear coat, surfacer-primer, and e-coat layers of an intact paint chip. For this study, search prefilters were developed from 1181 automotive paint systems spanning 3 manufacturers: General Motors, Chrysler, and Ford. The best match between each unknown and the spectra in the hit list generated by the search prefilters was identified using a cross-correlation library search algorithm that performed both a forward and backward search. In the forward search, spectra were divided into intervals and further subdivided into windows (which corresponds to the time lag for the comparison) within those intervals. The top five hits identified in each search window were compiled; a histogram was computed that summarized the frequency of occurrence for each library sample, with the IR spectra most similar to the unknown flagged. The backward search computed the frequency and occurrence of each line and model without regard to the identity of the individual spectra. Only those lines and models with a frequency of occurrence greater than or equal to 20% were included in the final hit list. If there was agreement between the forward and backward search results, the specific line and model common to both hit lists was always the correct assignment. Samples assigned to the same line and model by both searches are always well represented in the library and correlate well on an individual basis to specific library samples. For these samples, one can have confidence in the accuracy of the match. This was not the case for the results obtained using commercial library search algorithms, as the hit quality index scores for the top twenty hits were always greater than 99%. PMID:27474314

  2. Does an English appeal court ruling increase the risks of miscarriages of justice when complex DNA profiles are searched against the national DNA database?

    PubMed

    Gill, P; Bleka, Ø; Egeland, T

    2014-11-01

    Likelihood ratio (LR) methods to interpret multi-contributor, low template, complex DNA mixtures are becoming standard practice. The next major development will be to introduce search engines based on the new methods to interrogate very large national DNA databases, such as those held by China, the USA and the UK. Here we describe a rapid method that was used to assign a LR to each individual member of database of 5 million genotypes which can be ranked in order. Previous authors have only considered database trawls in the context of binary match or non-match criteria. However, the concept of match/non-match no longer applies within the new paradigm introduced, since the distribution of resultant LRs is continuous for practical purposes. An English appeal court decision allows scientists to routinely report complex DNA profiles using nothing more than their subjective personal 'experience of casework' and 'observations' in order to apply an expression of the rarity of an evidential sample. This ruling must be considered in context of a recent high profile English case, where an individual was extracted from a database and wrongly accused of a serious crime. In this case the DNA evidence was used to negate the overwhelming exculpatory (non-DNA) evidence. Demonstrable confirmation bias, also known as the 'CSI-effect, seriously affected the investigation. The case demonstrated that in practice, databases could be used to select and prosecute an individual, simply because he ranked high in the list of possible matches. We have identified this phenomenon as a cognitive error which we term: 'the naïve investigator effect'. We take the opportunity to test the performance of database extraction strategies either by using a simple matching allele count (MAC) method or LR. The example heard by the appeal court is used as the exemplar case. It is demonstrated that the LR search-method offers substantial benefits compared to searches based on simple matching allele count (MAC

  3. Probabilistic Choice, Reversibility, Loops, and Miracles

    NASA Astrophysics Data System (ADS)

    Stoddart, Bill; Bell, Pete

    We consider an addition of probabilistic choice to Abrial's Generalised Substitution Language (GSL) in a form that accommodates the backtracking interpretation of non-deterministic choice. Our formulation is introduced as an extension of the Prospective Values formalism we have developed to describe the results from a backtracking search. Significant features are that probabilistic choice is governed by feasibility, and non-termination is strict. The former property allows us to use probabilistic choice to generate search heuristics. In this paper we are particularly interested in iteration. By demonstrating sub-conjunctivity and monotonicity properties of expectations we give the basis for a fixed point semantics of iterative constructs, and we consider the practical proof treatment of probabilistic loops. We discuss loop invariants, loops with probabilistic behaviour, and probabilistic termination in the context of a formalism in which a small probability of non-termination can dominate our calculations, proposing a method of limits to avoid this problem. The formal programming constructs described have been implemented in a reversible virtual machine (RVM).

  4. Combining text retrieval and content-based image retrieval for searching a large-scale medical image database in an integrated RIS/PACS environment

    NASA Astrophysics Data System (ADS)

    He, Zhenyu; Zhu, Yanjie; Ling, Tonghui; Zhang, Jianguo

    2009-02-01

    Medical imaging modalities generate huge amount of medical images daily, and there are urgent demands to search large-scale image databases in an RIS-integrated PACS environment to support medical research and diagnosis by using image visual content to find visually similar images. However, most of current content-based image retrieval (CBIR) systems require distance computations to perform query by image content. Distance computations can be time consuming when image database grows large, and thus limits the usability of such systems. Furthermore, there is still a semantic gap between the low-level visual features automatically extracted and the high-level concepts that users normally search for. To address these problems, we propose a novel framework that combines text retrieval and CBIR techniques in order to support searching large-scale medical image database while integrated RIS/PACS is in place. A prototype system for CBIR has been implemented, which can query similar medical images both by their visual content and relevant semantic descriptions (symptoms and/or possible diagnosis). It also can be used as a decision support tool for radiology diagnosis and a learning tool for education.

  5. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

    PubMed Central

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively. PMID:26568953

  6. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa

    PubMed Central

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa. PMID:22123747

  7. MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa.

    PubMed

    D'Onorio de Meo, Paolo; D'Antonio, Mattia; Griggio, Francesca; Lupi, Renato; Borsani, Massimiliano; Pavesi, Giulio; Castrignanò, Tiziana; Pesole, Graziano; Gissi, Carmela

    2012-01-01

    The MITOchondrial genome database of metaZOAns (MitoZoa) is a public resource for comparative analyses of metazoan mitochondrial genomes (mtDNA) at both the sequence and genomic organizational levels. The main characteristics of the MitoZoa database are the careful revision of mtDNA entry annotations and the possibility of retrieving gene order and non-coding region (NCR) data in appropriate formats. The MitoZoa retrieval system enables basic and complex queries at various taxonomic levels using different search menus. MitoZoa 2.0 has been enhanced in several aspects, including: a re-annotation pipeline to check the correctness of protein-coding gene predictions; a standardized annotation of introns and of precursor ORFs whose functionality is post-transcriptionally recovered by RNA editing or programmed translational frameshifting; updates of taxon-related fields and a BLAST sequence similarity search tool. Database novelties and the definition of standard mtDNA annotation rules, together with the user-friendly retrieval system and the BLAST service, make MitoZoa a valuable resource for comparative and evolutionary analyses as well as a reference database to assist in the annotation of novel mtDNA sequences. MitoZoa is freely accessible at http://www.caspur.it/mitozoa.

  8. The Object-analogue approach for probabilistic forecasting

    NASA Astrophysics Data System (ADS)

    Frediani, M. E.; Hopson, T. M.; Anagnostou, E. N.; Hacker, J.

    2015-12-01

    The object-analogue is a new method to estimate forecast uncertainty and to derive probabilistic predictions of gridded forecast fields over larger regions rather than point locations. The method has been developed for improving the forecast of 10-meter wind speed over the northeast US, and it can be extended to other forecast variables, vertical levels, and other regions. The object-analogue approach combines the analog post-processing technique (Hopson 2005; Hamill 2006; Delle Monache 2011) with the Method for Object-based Diagnostic Evaluation (MODE) for forecast verification (Davis et al 2006a, b). Originally, MODE is used to verify mainly precipitation forecasts using features of a forecast region represented by an object. The analog technique is used to reduce the NWP systematic and random errors of a gridded forecast field. In this study we use MODE-derived objects to characterize the wind fields forecasts into attributes such as object area, centroid location, and intensity percentiles, and apply the analogue concept to these objects. The object-analogue method uses a database of objects derived from reforecasts and their respective reanalysis. Given a real-time forecast field, it searches the database and selects the top-ranked objects with the most similar set of attributes using the MODE fuzzy logic algorithm for object matching. The attribute probabilities obtained with the set of selected object-analogues are used to derive a multi-layer probabilistic prediction. The attribute probabilities are combined into three uncertainty layers that address the main concerns of most applications: location, area, and magnitude. The multi-layer uncertainty can be weighted and combined or used independently in such that it provides a more accurate prediction, adjusted according to the application interest. In this study we present preliminary results of the object-analogue method. Using a database with one hundred storms we perform a leave-one-out cross-validation to

  9. Learning Probabilistic Logic Models from Probabilistic Examples.

    PubMed

    Chen, Jianzhong; Muggleton, Stephen; Santos, José

    2008-10-01

    We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples.

  10. Sequential interval motif search: unrestricted database surveys of global MS/MS data sets for detection of putative post-translational modifications.

    PubMed

    Liu, Jian; Erassov, Alexandre; Halina, Patrick; Canete, Myra; Nguyen, Dinh Vo; Chung, Clement; Cagney, Gerard; Ignatchenko, Alexandr; Fong, Vincent; Emili, Andrew

    2008-10-15

    Tandem mass spectrometry is the prevailing approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. Effective database search engines have been developed to identify peptide sequences from MS/MS fragmentation spectra. Since proteins are polymorphic and subject to post-translational modifications (PTM), however, computational methods for detecting unanticipated variants are also needed to achieve true proteome-wide coverage. Different from existing "unrestrictive" search tools, we present a novel algorithm, termed SIMS (for Sequential Motif Interval Search), that interprets pairs of product ion peaks, representing potential amino acid residues or "intervals", as a means of mapping PTMs or substitutions in a blind database search mode. An effective heuristic software program was likewise developed to evaluate, rank, and filter optimal combinations of relevant intervals to identify candidate sequences, and any associated PTM or polymorphism, from large collections of MS/MS spectra. The prediction performance of SIMS was benchmarked extensively against annotated reference spectral data sets and compared favorably with, and was complementary to, current state-of-the-art methods. An exhaustive discovery screen using SIMS also revealed thousands of previously overlooked putative PTMs in a compendium of yeast protein complexes and in a proteome-wide map of adult mouse cardiomyocytes. We demonstrate that SIMS, freely accessible for academic research use, addresses gaps in current proteomic data interpretation pipelines, improving overall detection coverage, and facilitating comprehensive investigations of the fundamental multiplicity of the expressed proteome.

  11. Semi-Automated Identification of N-Glycopeptides by Hydrophilic Interaction Chromatography, nano-Reverse-Phase LC-MS/MS, and Glycan Database Search

    PubMed Central

    Pompach, Petr; Chandler, Kevin B.; Lan, Renny; Edwards, Nathan; Goldman, Radoslav

    2012-01-01

    Glycoproteins fulfill many indispensable biological functions and changes in protein glycosylation have been observed in various diseases. Improved analytical methods are needed to allow a complete characterization of this complex and common posttranslational modification. In this study, we present a workflow for the analysis of the microheterogeneity of N-glycoproteins which couples hydrophilic interaction and nano-reverse-phase C18 chromatography to tandem QTOF mass spectrometric analysis. A glycan database search program, GlycoPeptideSearch, was developed to match N-glycopeptide MS/MS spectra with the glycopeptides comprised of a glycan drawn from the GlycomeDB glycan structure database and a peptide from a user-specified set of potentially glycosylated peptides. Application of the workflow to human haptoglobin and hemopexin, two microheterogeneous N-glycoproteins, identified a total of 57 distinct site-specific glycoforms in the case of haptoglobin and 14 site-specific glycoforms of hemopexin. Using glycan oxonium ions, peptide-characteristic glycopeptide fragment ions, and by collapsing topologically redundant glycans, the search software was able to make unique N-glycopeptide assignments for 51% of assigned spectra, with the remaining assignments primarily representing isobaric topological rearrangements. The optimized workflow, coupled with GlycoPeptideSearch, is expected to make high-throughput semi-automated glycopeptide identification feasible for a wide range of users. PMID:22239659

  12. Dietary Supplement Label Database (DSLD)

    MedlinePlus

    ... Print Report Error T he Dietary Supplement Label Database (DSLD) is a joint project of the National ... participants in the latest survey in the DSLD database (NHANES): The search options: Quick Search, Browse Dietary ...

  13. Database search of spontaneous reports and pharmacological investigations on the sulfonylureas and glinides-induced atrophy in skeletal muscle

    PubMed Central

    Mele, Antonietta; Calzolaro, Sara; Cannone, Gianluigi; Cetrone, Michela; Conte, Diana; Tricarico, Domenico

    2014-01-01

    The ATP-sensitive K+ (KATP) channel is an emerging pathway in the skeletal muscle atrophy which is a comorbidity condition in diabetes. The “in vitro” effects of the sulfonylureas and glinides were evaluated on the protein content/muscle weight, fibers viability, mitochondrial succinic dehydrogenases (SDH) activity, and channel currents in oxidative soleus (SOL), glycolitic/oxidative flexor digitorum brevis (FDB), and glycolitic extensor digitorum longus (EDL) muscle fibers of mice using biochemical and cell-counting Kit-8 assay, image analysis, and patch-clamp techniques. The sulfonylureas were: tolbutamide, glibenclamide, and glimepiride; the glinides were: repaglinide and nateglinide. Food and Drug Administration-Adverse Effects Reporting System (FDA-AERS) database searching of atrophy-related signals associated with the use of these drugs in humans has been performed. The drugs after 24 h of incubation time reduced the protein content/muscle weight and fibers viability more effectively in FDB and SOL than in the EDL. The order of efficacy of the drugs in reducing the protein content in FDB was: repaglinide (EC50 = 5.21 × 10−6) ≥ glibenclamide(EC50 = 8.84 × 10−6) > glimepiride(EC50 = 2.93 × 10−5) > tolbutamide(EC50 = 1.07 × 10−4) > nateglinide(EC50 = 1.61 × 10−4) and it was: repaglinide(7.15 × 10−5) ≥ glibenclamide(EC50 = 9.10 × 10−5) > nateglinide(EC50 = 1.80 × 10−4) ≥ tolbutamide(EC50 = 2.19 × 10−4) > glimepiride(EC50=–) in SOL. The drug-induced atrophy can be explained by the KATP channel block and by the enhancement of the mitochondrial SDH activity. In an 8-month period, muscle atrophy was found in 0.27% of the glibenclamide reports in humans and in 0.022% of the other not sulfonylureas and glinides drugs. No reports of atrophy were found for the other sulfonylureas and glinides in the FDA-AERS. Glibenclamide induces atrophy in animal experiments and in human patients. Glimepiride shows less potential for inducing

  14. Database search of spontaneous reports and pharmacological investigations on the sulfonylureas and glinides-induced atrophy in skeletal muscle.

    PubMed

    Mele, Antonietta; Calzolaro, Sara; Cannone, Gianluigi; Cetrone, Michela; Conte, Diana; Tricarico, Domenico

    2014-02-01

    The ATP-sensitive K(+) (KATP) channel is an emerging pathway in the skeletal muscle atrophy which is a comorbidity condition in diabetes. The "in vitro" effects of the sulfonylureas and glinides were evaluated on the protein content/muscle weight, fibers viability, mitochondrial succinic dehydrogenases (SDH) activity, and channel currents in oxidative soleus (SOL), glycolitic/oxidative flexor digitorum brevis (FDB), and glycolitic extensor digitorum longus (EDL) muscle fibers of mice using biochemical and cell-counting Kit-8 assay, image analysis, and patch-clamp techniques. The sulfonylureas were: tolbutamide, glibenclamide, and glimepiride; the glinides were: repaglinide and nateglinide. Food and Drug Administration-Adverse Effects Reporting System (FDA-AERS) database searching of atrophy-related signals associated with the use of these drugs in humans has been performed. The drugs after 24 h of incubation time reduced the protein content/muscle weight and fibers viability more effectively in FDB and SOL than in the EDL. The order of efficacy of the drugs in reducing the protein content in FDB was: repaglinide (EC50 = 5.21 × 10(-6)) ≥ glibenclamide(EC50 = 8.84 × 10(-6)) > glimepiride(EC50 = 2.93 × 10(-5)) > tolbutamide(EC50 = 1.07 × 10(-4)) > nateglinide(EC50 = 1.61 × 10(-4)) and it was: repaglinide(7.15 × 10(-5)) ≥ glibenclamide(EC50 = 9.10 × 10(-5)) > nateglinide(EC50 = 1.80 × 10(-4)) ≥ tolbutamide(EC50 = 2.19 × 10(-4)) > glimepiride(EC50=-) in SOL. The drug-induced atrophy can be explained by the KATP channel block and by the enhancement of the mitochondrial SDH activity. In an 8-month period, muscle atrophy was found in 0.27% of the glibenclamide reports in humans and in 0.022% of the other not sulfonylureas and glinides drugs. No reports of atrophy were found for the other sulfonylureas and glinides in the FDA-AERS. Glibenclamide induces atrophy in animal experiments and in human patients. Glimepiride shows less potential for inducing atrophy

  15. Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

    PubMed

    Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P

    2014-01-01

    The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We

  16. Reduction in database search space by utilization of amino acid composition information from electron transfer dissociation and higher-energy collisional dissociation mass spectra.

    PubMed

    Hansen, Thomas A; Kryuchkov, Fedor; Kjeldsen, Frank

    2012-08-01

    With high-mass accuracy and consecutively obtained electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS), reliable (≥97%) and sensitive fragment ions have been extracted for identification of specific amino acid residues in peptide sequences. The analytical benefit of these specific amino acid composition (AAC) ions is to restrict the database search space and provide identification of peptides with higher confidence and reduced false negative rates. The 6706 uniquely identified peptide sequences determined with a conservative Mascot score of >30 were used to characterize the AAC ions. The loss of amino acid side chains (small neutral losses, SNLs) from the charge reduced peptide radical cations was studied using ETD. Complementary AAC information from HCD spectra was provided by immonium ions. From the ETD/HCD mass spectra, 5162 and 6720 reliable SNLs and immonium ions were successfully extracted, respectively. Automated application of the AAC information during database searching resulted in an average 3.5-fold higher confidence level of peptide identification. In addition, 4% and 28% more peptides were identified above the significance level in a standard and extended search space, respectively.

  17. Genome databases

    SciTech Connect

    Courteau, J.

    1991-10-11

    Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.

  18. The Factor VIII Mutation Database on the World Wide Web: the haemophilia A mutation, search, test and resource site. HAMSTeRS update (version 3.0).

    PubMed Central

    Kemball-Cook, G; Tuddenham, E G

    1997-01-01

    The HAMSTeRS WWW site was set up in 1996 in order to facilitate easy access to, and aid understanding of, the causes of haemophilia A at the molecular level; previously, the first and second text editions of the database have been published in Nucleic Acids Research. This report describes the facilities originally available at the site and the recent additions which we have made to increase its usefulness to clinicians, the molecular genetics community and structural biologists interested in factor VIII. The database (version 3.0) has been completely updated with easy submission of point mutations, deletions and insertions via e-mail of custom-designed forms. The searching of point mutations in the database has been made simpler and more robust, with a concomitantly expanded real-time bioinformatic analysis of the database. A methods section devoted to mutation detection has been added, highlighting issues such as choice of technique and PCR primer sequences. Finally, a FVIII structure section gives access to 3D VRML (Virtual Reality Modelling Language) files for any user-definable residue in a FVIII A domain homology model based on the crystal structure of human caeruloplasmin, together with secondary structural data and a sound+video animation of the model. It is intended that the general availability of this model will assist both in interpretation of causative mutations and selection of candidate residues forin vitromutagenesis. The HAMSTeRS URL is http://europium.mrc.rpms.ac.uk. PMID:9016520

  19. The Factor VIII Mutation Database on the World Wide Web: the haemophilia A mutation, search, test and resource site. HAMSTeRS update (version 3.0).

    PubMed

    Kemball-Cook, G; Tuddenham, E G

    1997-01-01

    The HAMSTeRS WWW site was set up in 1996 in order to facilitate easy access to, and aid understanding of, the causes of haemophilia A at the molecular level; previously, the first and second text editions of the database have been published in Nucleic Acids Research. This report describes the facilities originally available at the site and the recent additions which we have made to increase its usefulness to clinicians, the molecular genetics community and structural biologists interested in factor VIII. The database (version 3.0) has been completely updated with easy submission of point mutations, deletions and insertions via e-mail of custom-designed forms. The searching of point mutations in the database has been made simpler and more robust, with a concomitantly expanded real-time bioinformatic analysis of the database. A methods section devoted to mutation detection has been added, highlighting issues such as choice of technique and PCR primer sequences. Finally, a FVIII structure section gives access to 3D VRML (Virtual Reality Modelling Language) files for any user-definable residue in a FVIII A domain homology model based on the crystal structure of human caeruloplasmin, together with secondary structural data and a sound+video animation of the model. It is intended that the general availability of this model will assist both in interpretation of causative mutations and selection of candidate residues forin vitromutagenesis. The HAMSTeRS URL is http://europium.mrc.rpms.ac.uk.

  20. Probabilistic record linkage.

    PubMed

    Sayers, Adrian; Ben-Shlomo, Yoav; Blom, Ashley W; Steele, Fiona

    2016-06-01

    Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a 'black box' research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the concept of deterministic linkage and contrast this with probabilistic linkage. We illustrate each step of the process using a simple exemplar and describe the data structure required to perform a probabilistic linkage. We describe the process of calculating and interpreting matched weights and how to convert matched weights into posterior probabilities of a match using Bayes theorem. We conclude this article with a brief discussion of some of the computational demands of record linkage, how you might assess the quality of your linkage algorithm, and how epidemiologists can maximize the value of their record-linked research using robust record linkage methods. PMID:26686842

  1. Savvy Searching.

    ERIC Educational Resources Information Center

    Jacso, Peter

    2002-01-01

    Explains desktop metasearch engines, which search the databases of several search engines simultaneously. Reviews two particular versions, the Copernic 2001 Pro and the BullsEye Pro 3, comparing costs, subject categories, display capabilities, and layout for presenting results. (LRW)

  2. VIEWCACHE: An incremental database access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, Nick; Sellis, Timoleon

    1991-01-01

    The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.

  3. Probabilistic Structural Analysis Program

    NASA Technical Reports Server (NTRS)

    Pai, Shantaram S.; Chamis, Christos C.; Murthy, Pappu L. N.; Stefko, George L.; Riha, David S.; Thacker, Ben H.; Nagpal, Vinod K.; Mital, Subodh K.

    2010-01-01

    NASA/NESSUS 6.2c is a general-purpose, probabilistic analysis program that computes probability of failure and probabilistic sensitivity measures of engineered systems. Because NASA/NESSUS uses highly computationally efficient and accurate analysis techniques, probabilistic solutions can be obtained even for extremely large and complex models. Once the probabilistic response is quantified, the results can be used to support risk-informed decisions regarding reliability for safety-critical and one-of-a-kind systems, as well as for maintaining a level of quality while reducing manufacturing costs for larger-quantity products. NASA/NESSUS has been successfully applied to a diverse range of problems in aerospace, gas turbine engines, biomechanics, pipelines, defense, weaponry, and infrastructure. This program combines state-of-the-art probabilistic algorithms with general-purpose structural analysis and lifting methods to compute the probabilistic response and reliability of engineered structures. Uncertainties in load, material properties, geometry, boundary conditions, and initial conditions can be simulated. The structural analysis methods include non-linear finite-element methods, heat-transfer analysis, polymer/ceramic matrix composite analysis, monolithic (conventional metallic) materials life-prediction methodologies, boundary element methods, and user-written subroutines. Several probabilistic algorithms are available such as the advanced mean value method and the adaptive importance sampling method. NASA/NESSUS 6.2c is structured in a modular format with 15 elements.

  4. Ten Most Searched Databases by a Business Generalist--Part 1 or A Day in the Life of....

    ERIC Educational Resources Information Center

    Meredith, Meri

    1986-01-01

    Describes databases frequently used in Business Information Center, Cummins Engine Company (Columbus, Indiana): Dun and Bradstreet Business Information Report System, Newsearch, Dun and Bradstreet Market Identifiers, Trade and Industry Index, PTS PROMT, Bureau of Labor Statistics files, ABI/INFORM, Magazine Index, NEXIS, Dow Jones News/Retrieval.…

  5. Identifying Gel-Separated Proteins Using In-Gel Digestion, Mass Spectrometry, and Database Searching: Consider the Chemistry

    ERIC Educational Resources Information Center

    Albright, Jessica C.; Dassenko, David J.; Mohamed, Essa A.; Beussman, Douglas J.

    2009-01-01

    Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on…

  6. Visualization Tools and Techniques for Search and Validation of Large Earth Science Spatial-Temporal Metadata Databases

    NASA Astrophysics Data System (ADS)

    Baskin, W. E.; Herbert, A.; Kusterer, J.

    2014-12-01

    Spatial-temporal metadata databases are critical components of interactive data discovery services for ordering Earth Science datasets. The development staff at the Atmospheric Science Data Center (ASDC) works closely with satellite Earth Science mission teams such as CERES, CALIPSO, TES, MOPITT, and CATS to create and maintain metadata databases that are tailored to the data discovery needs of the Earth Science community. This presentation focuses on the visualization tools and techniques used by the ASDC software development team for data discovery and validation/optimization of spatial-temporal objects in large multi-mission spatial-temporal metadata databases. The following topics will be addressed: Optimizing the level of detail of spatial temporal metadata to provide interactive spatial query performance over a multi-year Earth Science mission Generating appropriately scaled sensor footprint gridded (raster) metadata from Level1 and Level2 Satellite and Aircraft time-series data granules Performance comparison of raster vs vector spatial granule footprint mask queries in large metadata database and a description of the visualization tools used to assist with this analysis

  7. Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tandem mass spectrometry (MS/MS) is routinely used to identify proteins by comparing peptide spectra to those generated in silico from protein sequence databases. Wheat storage proteins (gliadins and glutenins) are difficult to distinguish by MS/MS as they have few cleavable tryptic sites, often res...

  8. Direct identification of human cellular microRNAs by nanoflow liquid chromatography-high-resolution tandem mass spectrometry and database searching.

    PubMed

    Nakayama, Hiroshi; Yamauchi, Yoshio; Taoka, Masato; Isobe, Toshiaki

    2015-03-01

    MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene networks and participate in many physiological and pathological pathways. To date, miRNAs have been characterized mostly by genetic technologies, which have the advantages of being very sensitive and using high-throughput instrumentation; however, these techniques cannot identify most post-transcriptional modifications of miRNAs that would affect their functions. Herein, we report an analytical system for the direct identification of miRNAs that incorporates nanoflow liquid chromatography-high-resolution tandem mass spectrometry and RNA-sequence database searching. By introducing a spray-assisting device that stabilizes negative nanoelectrospray ionization of RNAs and by searching an miRNA sequence database using the obtained tandem mass spectrometry data for the RNA mixture, we successfully identified femtomole quantities of human cellular miRNAs and their 3'-terminal variants. This is the first report of a fully automated, and thus objective, tandem mass spectrometry-based analytical system that can be used to identify miRNAs.

  9. Construction of an Indonesian herbal constituents database and its use in Random Forest modelling in a search for inhibitors of aldose reductase.

    PubMed

    Naeem, Sadaf; Hylands, Peter; Barlow, David

    2012-02-01

    Data on phytochemical constituents of plants commonly used in traditional Indonesian medicine have been compiled as a computer database. This database (the Indonesian Herbal constituents database, IHD) currently contains details on ∼1,000 compounds found in 33 different plants. For each entry, the IHD gives details of chemical structure, trivial and systematic name, CAS registry number, pharmacology (where known), toxicology (LD(50)), botanical species, the part(s) of the plant(s) where the compounds are found, typical dosage(s) and reference(s). A second database has been also been compiled for plant-derived compounds with known activity against the enzyme, aldose reductase (AR). This database (the aldose reductase inhibitors database, ARID) contains the same details as the IHD, and currently comprises information on 120 different AR inhibitors. Virtual screening of all compounds in the IHD has been performed using Random Forest (RF) modelling, in a search for novel leads active against AR-to provide for new forms of symptomatic relief in diabetic patients. For the RF modelling, a set of simple 2D chemical descriptors were employed to classify all compounds in the combined ARID and IHD databases as either active or inactive as AR inhibitors. The resulting RF models (which gave misclassification rates of 21%) were used to identify putative new AR inhibitors in the IHD, with such compounds being identified as those giving RF scores >0.5 (in each of the three different RF models developed). In vitro assays were subsequently performed for four of the compounds obtained as hits in this in silico screening, to determine their inhibitory activity against human recombinant AR. The two compounds having the highest RF scores (prunetin and ononin) were shown to have the highest activities experimentally (giving ∼58% and ∼52% inhibition at a concentration of 15μM, respectively), while the compounds with lowest RF scores (vanillic acid and cinnamic acid) showed the

  10. Integration of an Evidence Base into a Probabilistic Risk Assessment Model. The Integrated Medical Model Database: An Organized Evidence Base for Assessing In-Flight Crew Health Risk and System Design

    NASA Technical Reports Server (NTRS)

    Saile, Lynn; Lopez, Vilma; Bickham, Grandin; FreiredeCarvalho, Mary; Kerstman, Eric; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    This slide presentation reviews the Integrated Medical Model (IMM) database, which is an organized evidence base for assessing in-flight crew health risk. The database is a relational database accessible to many people. The database quantifies the model inputs by a ranking based on the highest value of the data as Level of Evidence (LOE) and the quality of evidence (QOE) score that provides an assessment of the evidence base for each medical condition. The IMM evidence base has already been able to provide invaluable information for designers, and for other uses.

  11. What's My Substrate? Computational Function Assignment of Candida parapsilosis ADH5 by Genome Database Search, Virtual Screening, and QM/MM Calculations.

    PubMed

    Dhoke, Gaurao V; Ensari, Yunus; Davari, Mehdi D; Ruff, Anna Joëlle; Schwaneberg, Ulrich; Bocola, Marco

    2016-07-25

    Zinc-dependent medium chain reductase from Candida parapsilosis can be used in the reduction of carbonyl compounds to pharmacologically important chiral secondary alcohols. To date, the nomenclature of cpADH5 is differing (CPCR2/RCR/SADH) in the literature, and its natural substrate is not known. In this study, we utilized a substrate docking based virtual screening method combined with KEGG, MetaCyc pathway, and Candida genome databases search for the discovery of natural substrates of cpADH5. The virtual screening of 7834 carbonyl compounds from the ZINC database provided 94 aldehydes or methyl/ethyl ketones as putative carbonyl substrates. Out of which, 52 carbonyl substrates of cpADH5 with catalytically active docking pose were identified by employing mechanism based substrate docking protocol. Comparison of the virtual screening results with KEGG, MetaCyc database search, and Candida genome pathway analysis suggest that cpADH5 might be involved in the Ehrlich pathway (reduction of fusel aldehydes in leucine, isoleucine, and valine degradation). Our QM/MM calculations and experimental activity measurements affirmed that butyraldehyde substrates are the potential natural substrates of cpADH5, suggesting a carbonyl reductase role for this enzyme in butyraldehyde reduction in aliphatic amino acid degradation pathways. Phylogenetic tree analysis of known ADHs from Candida albicans shows that cpADH5 is close to caADH5. We therefore propose, according to the experimental substrate identification and sequence similarity, the common name butyraldehyde dehydrogenase cpADH5 for Candida parapsilosis CPCR2/RCR/SADH. PMID:27387009

  12. Local image descriptor-based searching framework of usable similar cases in a radiation treatment planning database for stereotactic body radiotherapy

    NASA Astrophysics Data System (ADS)

    Nonaka, Ayumi; Arimura, Hidetaka; Nakamura, Katsumasa; Shioyama, Yoshiyuki; Soufi, Mazen; Magome, Taiki; Honda, Hiroshi; Hirata, Hideki

    2014-03-01

    Radiation treatment planning (RTP) of the stereotactic body radiotherapy (SBRT) was more complex compared with conventional radiotherapy because of using a number of beam directions. We reported that similar planning cases could be helpful for determination of beam directions for treatment planners, who have less experiences of SBRT. The aim of this study was to develop a framework of searching for usable similar cases to an unplanned case in a RTP database based on a local image descriptor. This proposed framework consists of two steps searching and rearrangement. In the first step, the RTP database was searched for 10 cases most similar to object cases based on the shape similarity of two-dimensional lung region at the isocenter plane. In the second step, the 5 most similar cases were selected by using geometric features related to the location, size and shape of the planning target volume, lung and spinal cord. In the third step, the selected 5 cases were rearranged by use of the Euclidean distance of a local image descriptor, which is a similarity index based on the magnitudes and orientations of image gradients within a region of interest around an isocenter. It was assumed that the local image descriptor represents the information around lung tumors related to treatment planning. The cases, which were selected as cases most similar to test cases by the proposed method, were more resemble in terms of the tumor location than those selected by a conventional method. For evaluation of the proposed method, we applied a similar-cases-based beam arrangement method developed in the previous study to the similar cases selected by the proposed method based on a linear registration. The proposed method has the potential to suggest the superior beam-arrangements from the treatment point of view.

  13. [Method of traditional Chinese medicine formula design based on 3D-database pharmacophore search and patent retrieval].

    PubMed

    He, Yu-su; Sun, Zhi-yi; Zhang, Yan-ling

    2014-11-01

    By using the pharmacophore model of mineralocorticoid receptor antagonists as a starting point, the experiment stud- ies the method of traditional Chinese medicine formula design for anti-hypertensive. Pharmacophore models were generated by 3D-QSAR pharmacophore (Hypogen) program of the DS3.5, based on the training set composed of 33 mineralocorticoid receptor antagonists. The best pharmacophore model consisted of two Hydrogen-bond acceptors, three Hydrophobic and four excluded volumes. Its correlation coefficient of training set and test set, N, and CAI value were 0.9534, 0.6748, 2.878, and 1.119. According to the database screening, 1700 active compounds from 86 source plant were obtained. Because of lacking of available anti-hypertensive medi cation strategy in traditional theory, this article takes advantage of patent retrieval in world traditional medicine patent database, in order to design drug formula. Finally, two formulae was obtained for antihypertensive. PMID:25850277

  14. Scopus database: a review

    PubMed Central

    Burnham, Judy F

    2006-01-01

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs. PMID:16522216

  15. Scopus database: a review.

    PubMed

    Burnham, Judy F

    2006-03-08

    The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.

  16. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-Scale Peptide Identification

    SciTech Connect

    Kalyanaraman, Anantharaman; Cannon, William R.; Latt, Benjamin K.; Baxter, Douglas J.

    2011-11-01

    A MapReduce-based implementation called MR- MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.

  17. Where the bugs are: analyzing distributions of bacterial phyla by descriptor keyword search in the nucleotide database

    PubMed Central

    2011-01-01

    Background The associations between bacteria and environment underlie their preferential interactions with given physical or chemical conditions. Microbial ecology aims at extracting conserved patterns of occurrence of bacterial taxa in relation to defined habitats and contexts. Results In the present report the NCBI nucleotide sequence database is used as dataset to extract information relative to the distribution of each of the 24 phyla of the bacteria superkingdom and of the Archaea. Over two and a half million records are filtered in their cross-association with each of 48 sets of keywords, defined to cover natural or artificial habitats, interactions with plant, animal or human hosts, and physical-chemical conditions. The results are processed showing: (a) how the different descriptors enrich or deplete the proportions at which the phyla occur in the total database; (b) in which order of abundance do the different keywords score for each phylum (preferred habitats or conditions), and to which extent are phyla clustered to few descriptors (specific) or spread across many (cosmopolitan); (c) which keywords individuate the communities ranking highest for diversity and evenness. Conclusions A number of cues emerge from the results, contributing to sharpen the picture on the functional systematic diversity of prokaryotes. Suggestions are given for a future automated service dedicated to refining and updating such kind of analyses via public bioinformatic engines. PMID:22587876

  18. Probabilistic record linkage

    PubMed Central

    Sayers, Adrian; Ben-Shlomo, Yoav; Blom, Ashley W; Steele, Fiona

    2016-01-01

    Studies involving the use of probabilistic record linkage are becoming increasingly common. However, the methods underpinning probabilistic record linkage are not widely taught or understood, and therefore these studies can appear to be a ‘black box’ research tool. In this article, we aim to describe the process of probabilistic record linkage through a simple exemplar. We first introduce the concept of deterministic linkage and contrast this with probabilistic linkage. We illustrate each step of the process using a simple exemplar and describe the data structure required to perform a probabilistic linkage. We describe the process of calculating and interpreting matched weights and how to convert matched weights into posterior probabilities of a match using Bayes theorem. We conclude this article with a brief discussion of some of the computational demands of record linkage, how you might assess the quality of your linkage algorithm, and how epidemiologists can maximize the value of their record-linked research using robust record linkage methods. PMID:26686842

  19. Probabilistic microcell prediction model

    NASA Astrophysics Data System (ADS)

    Kim, Song-Kyoo

    2002-06-01

    A microcell is a cell with 1-km or less radius which is suitable for heavily urbanized area such as a metropolitan city. This paper deals with the microcell prediction model of propagation loss which uses probabilistic techniques. The RSL (Receive Signal Level) is the factor which can evaluate the performance of a microcell and the LOS (Line-Of-Sight) component and the blockage loss directly effect on the RSL. We are combining the probabilistic method to get these performance factors. The mathematical methods include the CLT (Central Limit Theorem) and the SPC (Statistical Process Control) to get the parameters of the distribution. This probabilistic solution gives us better measuring of performance factors. In addition, it gives the probabilistic optimization of strategies such as the number of cells, cell location, capacity of cells, range of cells and so on. Specially, the probabilistic optimization techniques by itself can be applied to real-world problems such as computer-networking, human resources and manufacturing process.

  20. Detection and Identification of Heme c-Modified Peptides by Histidine Affinity Chromatography, High-Performance Liquid Chromatography-Mass Spectrometry, and Database Searching

    SciTech Connect

    Merkley, Eric D.; Anderson, Brian J.; Park, Jea H.; Belchik, Sara M.; Shi, Liang; Monroe, Matthew E.; Smith, Richard D.; Lipton, Mary S.

    2012-12-07

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry-(LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed, or if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, a bacterial decaheme cytochrome, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded three- to six-fold more confident peptide-spectrum matches to heme-c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for four of the ten expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering nine out of ten sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 10-4 was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c.

  1. Detection and identification of heme c-modified peptides by histidine affinity chromatography, high-performance liquid chromatography-mass spectrometry, and database searching.

    PubMed

    Merkley, Eric D; Anderson, Brian J; Park, Jea; Belchik, Sara M; Shi, Liang; Monroe, Matthew E; Smith, Richard D; Lipton, Mary S

    2012-12-01

    Multiheme c-type cytochromes (proteins with covalently attached heme c moieties) play important roles in extracellular metal respiration in dissimilatory metal-reducing bacteria. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) characterization of c-type cytochromes is hindered by the presence of multiple heme groups, since the heme c modified peptides are typically not observed or, if observed, not identified. Using a recently reported histidine affinity chromatography (HAC) procedure, we enriched heme c tryptic peptides from purified bovine heart cytochrome c, two bacterial decaheme cytochromes, and subjected these samples to LC-MS/MS analysis. Enriched bovine cytochrome c samples yielded 3- to 6-fold more confident peptide-spectrum matches to heme c containing peptides than unenriched digests. In unenriched digests of the decaheme cytochrome MtoA from Sideroxydans lithotrophicus ES-1, heme c peptides for 4 of the 10 expected sites were observed by LC-MS/MS; following HAC fractionation, peptides covering 9 out of 10 sites were obtained. Heme c peptide spiked into E. coli lysates at mass ratios as low as 1×10(-4) was detected with good signal-to-noise after HAC and LC-MS/MS analysis. In addition to HAC, we have developed a proteomics database search strategy that takes into account the unique physicochemical properties of heme c peptides. The results suggest that accounting for the double thioether link between heme c and peptide, and the use of the labile heme fragment as a reporter ion, can improve database searching results. The combination of affinity chromatography and heme-specific informatics yielded increases in the number of peptide-spectrum matches of 20-100-fold for bovine cytochrome c. PMID:23082897

  2. Quality Control of Biomedicinal Allergen Products – Highly Complex Isoallergen Composition Challenges Standard MS Database Search and Requires Manual Data Analyses

    PubMed Central

    Spiric, Jelena; Engin, Anna M.; Karas, Michael; Reuter, Andreas

    2015-01-01

    Allergy against birch pollen is among the most common causes of spring pollinosis in Europe and is diagnosed and treated using extracts from natural sources. Quality control is crucial for safe and effective diagnosis and treatment. However, current methods are very difficult to standardize and do not address individual allergen or isoallergen composition. MS provides information regarding selected proteins or the entire proteome and could overcome the aforementioned limitations. We studied the proteome of birch pollen, focusing on allergens and isoallergens, to clarify which of the 93 published sequence variants of the major allergen, Bet v 1, are expressed as proteins within one source material in parallel. The unexpectedly complex Bet v 1 isoallergen composition required manual data interpretation and a specific design of databases, as current database search engines fail to unambiguously assign spectra to highly homologous, partially identical proteins. We identified 47 non-allergenic proteins and all 5 known birch pollen allergens, and unambiguously proved the existence of 18 Bet v 1 isoallergens and variants by manual data analysis. This highly complex isoallergen composition raises questions whether isoallergens can be ignored or must be included for the quality control of allergen products, and which data analysis strategies are to be applied. PMID:26561299

  3. Probabilistic Composite Design

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Probabilistic composite design is described in terms of a computational simulation. This simulation tracks probabilistically the composite design evolution from constituent materials, fabrication process, through composite mechanics and structural components. Comparisons with experimental data are provided to illustrate selection of probabilistic design allowables, test methods/specimen guidelines, and identification of in situ versus pristine strength, For example, results show that: in situ fiber tensile strength is 90% of its pristine strength; flat-wise long-tapered specimens are most suitable for setting ply tensile strength allowables: a composite radome can be designed with a reliability of 0.999999; and laminate fatigue exhibits wide-spread scatter at 90% cyclic-stress to static-strength ratios.

  4. Formalizing Probabilistic Safety Claims

    NASA Technical Reports Server (NTRS)

    Herencia-Zapana, Heber; Hagen, George E.; Narkawicz, Anthony J.

    2011-01-01

    A safety claim for a system is a statement that the system, which is subject to hazardous conditions, satisfies a given set of properties. Following work by John Rushby and Bev Littlewood, this paper presents a mathematical framework that can be used to state and formally prove probabilistic safety claims. It also enables hazardous conditions, their uncertainties, and their interactions to be integrated into the safety claim. This framework provides a formal description of the probabilistic composition of an arbitrary number of hazardous conditions and their effects on system behavior. An example is given of a probabilistic safety claim for a conflict detection algorithm for aircraft in a 2D airspace. The motivation for developing this mathematical framework is that it can be used in an automated theorem prover to formally verify safety claims.

  5. Probabilistic composite analysis

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Murthy, P. L. N.

    1991-01-01

    Formal procedures are described which are used to computationally simulate the probabilistic behavior of composite structures. The computational simulation starts with the uncertainties associated with all aspects of a composite structure (constituents, fabrication, assembling, etc.) and encompasses all aspects of composite behavior (micromechanics, macromechanics, combined stress failure, laminate theory, structural response, and tailoring) optimization. Typical cases are included to illustrate the formal procedure for computational simulation. The collective results of the sample cases demonstrate that uncertainties in composite behavior and structural response can be probabilistically quantified.

  6. Chemical and biological warfare: Protection, decontamination, and disposal. (Latest citations from the NTIS bibliographic database). Published Search

    SciTech Connect

    1997-11-01

    The bibliography contains citations concerning the means to defend against chemical and biological agents used in military operations, and to eliminate the effects of such agents on personnel, equipment, and grounds. Protection is accomplished through protective clothing and masks, and in buildings and shelters through filtration. Elimination of effects includes decontamination and removal of the agents from clothing, equipment, buildings, grounds, and water, using chemical deactivation, incineration, and controlled disposal of material in injection wells and ocean dumping. Other Published Searches in this series cover chemical warfare detection; defoliants; general studies; biochemistry and therapy; and biology, chemistry, and toxicology associated with chemical warfare agents.(Contains 50-250 citations and includes a subject term index and title list.) (Copyright NERAC, Inc. 1995)

  7. Probabilistic liquefaction triggering based on the cone penetration test

    USGS Publications Warehouse

    Moss, R.E.S.; Seed, R.B.; Kayen, R.E.; Stewart, J.P.; Tokimatsu, K.

    2005-01-01

    Performance-based earthquake engineering requires a probabilistic treatment of potential failure modes in order to accurately quantify the overall stability of the system. This paper is a summary of the application portions of the probabilistic liquefaction triggering correlations proposed recently proposed by Moss and co-workers. To enable probabilistic treatment of liquefaction triggering, the variables comprising the seismic load and the liquefaction resistance were treated as inherently uncertain. Supporting data from an extensive Cone Penetration Test (CPT)-based liquefaction case history database were used to develop a probabilistic correlation. The methods used to measure the uncertainty of the load and resistance variables, how the interactions of these variables were treated using Bayesian updating, and how reliability analysis was applied to produce curves of equal probability of liquefaction are presented. The normalization for effective overburden stress, the magnitude correlated duration weighting factor, and the non-linear shear mass participation factor used are also discussed.

  8. Probabilistic Threshold Criterion

    SciTech Connect

    Gresshoff, M; Hrousis, C A

    2010-03-09

    The Probabilistic Shock Threshold Criterion (PSTC) Project at LLNL develops phenomenological criteria for estimating safety or performance margin on high explosive (HE) initiation in the shock initiation regime, creating tools for safety assessment and design of initiation systems and HE trains in general. Until recently, there has been little foundation for probabilistic assessment of HE initiation scenarios. This work attempts to use probabilistic information that is available from both historic and ongoing tests to develop a basis for such assessment. Current PSTC approaches start with the functional form of the James Initiation Criterion as a backbone, and generalize to include varying areas of initiation and provide a probabilistic response based on test data for 1.8 g/cc (Ultrafine) 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) and LX-17 (92.5% TATB, 7.5% Kel-F 800 binder). Application of the PSTC methodology is presented investigating the safety and performance of a flying plate detonator and the margin of an Ultrafine TATB booster initiating LX-17.

  9. Database in Artificial Intelligence.

    ERIC Educational Resources Information Center

    Wilkinson, Julia

    1986-01-01

    Describes a specialist bibliographic database of literature in the field of artificial intelligence created by the Turing Institute (Glasgow, Scotland) using the BRS/Search information retrieval software. The subscription method for end-users--i.e., annual fee entitles user to unlimited access to database, document provision, and printed awareness…

  10. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  11. Exact and Approximate Probabilistic Symbolic Execution

    NASA Technical Reports Server (NTRS)

    Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem

    2014-01-01

    Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.

  12. Method for the Compound Annotation of Conjugates in Nontargeted Metabolomics Using Accurate Mass Spectrometry, Multistage Product Ion Spectra and Compound Database Searching.

    PubMed

    Ogura, Tairo; Bamba, Takeshi; Tai, Akihiro; Fukusaki, Eiichiro

    2015-01-01

    Owing to biotransformation, xenobiotics are often found in conjugated form in biological samples such as urine and plasma. Liquid chromatography coupled with accurate mass spectrometry with multistage collision-induced dissociation provides spectral information concerning these metabolites in complex materials. Unfortunately, compound databases typically do not contain a sufficient number of records for such conjugates. We report here on the development of a novel protocol, referred to as ChemProphet, to annotate compounds, including conjugates, using compound databases such as PubChem and ChemSpider. The annotation of conjugates involves three steps: 1. Recognition of the type and number of conjugates in the sample; 2. Compound search and annotation of the deconjugated form; and 3. In silico evaluation of the candidate conjugate. ChemProphet assigns a spectrum to each candidate by automatically exploring the substructures corresponding to the observed product ion spectrum. When finished, it annotates the candidates assigning a rank for each candidate based on the calculated score that ranks its relative likelihood. We assessed our protocol by annotating a benchmark dataset by including the product ion spectra for 102 compounds, annotating the commercially available standard for quercetin 3-glucuronide, and by conducting a model experiment using urine from mice that had been administered a green tea extract. The results show that by using the ChemProphet approach, it is possible to annotate not only the deconjugated molecules but also the conjugated molecules using an automatic interpretation method based on deconjugation that involves multistage collision-induced dissociation and in silico calculated conjugation.

  13. JICST Factual Database

    NASA Astrophysics Data System (ADS)

    Suzuki, Kazuaki; Shimura, Kazuki; Monma, Yoshio; Sakamoto, Masao; Morishita, Hiroshi; Kanazawa, Kenji

    The Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST/NRIM Materials Strength Database for Engineering Steels and Alloys (JICST ME) in this March (1990). This database has been developed under the joint research between JICST and the National Research Institute for Metals (NRIM). It provides material strength data (creep, fatigue, etc.) of engineering steels and alloys. It is able to search and display on-line, and to analyze the searched data statistically and plot the result on graphic display. The database system and the data in JICST ME are described.

  14. Probabilistic authenticated quantum dialogue

    NASA Astrophysics Data System (ADS)

    Hwang, Tzonelih; Luo, Yi-Ping

    2015-12-01

    This work proposes a probabilistic authenticated quantum dialogue (PAQD) based on Bell states with the following notable features. (1) In our proposed scheme, the dialogue is encoded in a probabilistic way, i.e., the same messages can be encoded into different quantum states, whereas in the state-of-the-art authenticated quantum dialogue (AQD), the dialogue is encoded in a deterministic way; (2) the pre-shared secret key between two communicants can be reused without any security loophole; (3) each dialogue in the proposed PAQD can be exchanged within only one-step quantum communication and one-step classical communication. However, in the state-of-the-art AQD protocols, both communicants have to run a QKD protocol for each dialogue and each dialogue requires multiple quantum as well as classical communicational steps; (4) nevertheless, the proposed scheme can resist the man-in-the-middle attack, the modification attack, and even other well-known attacks.

  15. Geothermal probabilistic cost study

    NASA Astrophysics Data System (ADS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  16. Geothermal probabilistic cost study

    NASA Technical Reports Server (NTRS)

    Orren, L. H.; Ziman, G. M.; Jones, S. C.; Lee, T. K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-01-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model was used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents was analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance were examined.

  17. Geothermal probabilistic cost study

    SciTech Connect

    Orren, L.H.; Ziman, G.M.; Jones, S.C.; Lee, T.K.; Noll, R.; Wilde, L.; Sadanand, V.

    1981-08-01

    A tool is presented to quantify the risks of geothermal projects, the Geothermal Probabilistic Cost Model (GPCM). The GPCM model is used to evaluate a geothermal reservoir for a binary-cycle electric plant at Heber, California. Three institutional aspects of the geothermal risk which can shift the risk among different agents are analyzed. The leasing of geothermal land, contracting between the producer and the user of the geothermal heat, and insurance against faulty performance are examined. (MHR)

  18. Probabilistic river forecast methodology

    NASA Astrophysics Data System (ADS)

    Kelly, Karen Suzanne

    1997-09-01

    The National Weather Service (NWS) operates deterministic conceptual models to predict the hydrologic response of a river basin to precipitation. The output from these models are forecasted hydrographs (time series of the future river stage) at certain locations along a river. In order for the forecasts to be useful for optimal decision making, the uncertainty associated with them must be quantified. A methodology is developed for this purpose that (i) can be implemented with any deterministic hydrologic model, (ii) receives a probabilistic forecast of precipitation as input, (iii) quantifies all sources of uncertainty, (iv) operates in real-time and within computing constraints, and (v) produces probability distributions of future river stages. The Bayesian theory which supports the methodology involves transformation of a distribution of future precipitation into one of future river stage, and statistical characterization of the uncertainty in the hydrologic model. This is accomplished by decomposing total uncertainty into that associated with future precipitation and that associated with the hydrologic transformations. These are processed independently and then integrated into a predictive distribution which constitutes a probabilistic river stage forecast. A variety of models are presented for implementation of the methodology. In the most general model, a probability of exceedance associated with a given future hydrograph specified. In the simplest model, a probability of exceedance associated with a given future river stage is specified. In conjunction with the Ohio River Forecast Center of the NWS, the simplest model is used to demonstrate the feasibility of producing probabilistic river stage forecasts for a river basin located in headwaters. Previous efforts to quantify uncertainty in river forecasting have only considered selected sources of uncertainty, been specific to a particular hydrologic model, or have not obtained an entire probability

  19. Probabilistic simple splicing systems

    NASA Astrophysics Data System (ADS)

    Selvarajoo, Mathuri; Heng, Fong Wan; Sarmin, Nor Haniza; Turaev, Sherzod

    2014-06-01

    A splicing system, one of the early theoretical models for DNA computing was introduced by Head in 1987. Splicing systems are based on the splicing operation which, informally, cuts two strings of DNA molecules at the specific recognition sites and attaches the prefix of the first string to the suffix of the second string, and the prefix of the second string to the suffix of the first string, thus yielding the new strings. For a specific type of splicing systems, namely the simple splicing systems, the recognition sites are the same for both strings of DNA molecules. It is known that splicing systems with finite sets of axioms and splicing rules only generate regular languages. Hence, different types of restrictions have been considered for splicing systems in order to increase their computational power. Recently, probabilistic splicing systems have been introduced where the probabilities are initially associated with the axioms, and the probabilities of the generated strings are computed from the probabilities of the initial strings. In this paper, some properties of probabilistic simple splicing systems are investigated. We prove that probabilistic simple splicing systems can also increase the computational power of the splicing languages generated.

  20. Probabilistic Models for Solar Particle Events

    NASA Technical Reports Server (NTRS)

    Adams, James H., Jr.; Dietrich, W. F.; Xapsos, M. A.; Welton, A. M.

    2009-01-01

    Probabilistic Models of Solar Particle Events (SPEs) are used in space mission design studies to provide a description of the worst-case radiation environment that the mission must be designed to tolerate.The models determine the worst-case environment using a description of the mission and a user-specified confidence level that the provided environment will not be exceeded. This poster will focus on completing the existing suite of models by developing models for peak flux and event-integrated fluence elemental spectra for the Z>2 elements. It will also discuss methods to take into account uncertainties in the data base and the uncertainties resulting from the limited number of solar particle events in the database. These new probabilistic models are based on an extensive survey of SPE measurements of peak and event-integrated elemental differential energy spectra. Attempts are made to fit the measured spectra with eight different published models. The model giving the best fit to each spectrum is chosen and used to represent that spectrum for any energy in the energy range covered by the measurements. The set of all such spectral representations for each element is then used to determine the worst case spectrum as a function of confidence level. The spectral representation that best fits these worst case spectra is found and its dependence on confidence level is parameterized. This procedure creates probabilistic models for the peak and event-integrated spectra.

  1. Probabilistic Tsunami Hazard Analysis

    NASA Astrophysics Data System (ADS)

    Thio, H. K.; Ichinose, G. A.; Somerville, P. G.; Polet, J.

    2006-12-01

    The recent tsunami disaster caused by the 2004 Sumatra-Andaman earthquake has focused our attention to the hazard posed by large earthquakes that occur under water, in particular subduction zone earthquakes, and the tsunamis that they generate. Even though these kinds of events are rare, the very large loss of life and material destruction caused by this earthquake warrant a significant effort towards the mitigation of the tsunami hazard. For ground motion hazard, Probabilistic Seismic Hazard Analysis (PSHA) has become a standard practice in the evaluation and mitigation of seismic hazard to populations in particular with respect to structures, infrastructure and lifelines. Its ability to condense the complexities and variability of seismic activity into a manageable set of parameters greatly facilitates the design of effective seismic resistant buildings but also the planning of infrastructure projects. Probabilistic Tsunami Hazard Analysis (PTHA) achieves the same goal for hazards posed by tsunami. There are great advantages of implementing such a method to evaluate the total risk (seismic and tsunami) to coastal communities. The method that we have developed is based on the traditional PSHA and therefore completely consistent with standard seismic practice. Because of the strong dependence of tsunami wave heights on bathymetry, we use a full waveform tsunami waveform computation in lieu of attenuation relations that are common in PSHA. By pre-computing and storing the tsunami waveforms at points along the coast generated for sets of subfaults that comprise larger earthquake faults, we can efficiently synthesize tsunami waveforms for any slip distribution on those faults by summing the individual subfault tsunami waveforms (weighted by their slip). This efficiency make it feasible to use Green's function summation in lieu of attenuation relations to provide very accurate estimates of tsunami height for probabilistic calculations, where one typically computes

  2. Mathematical Notation in Bibliographic Databases.

    ERIC Educational Resources Information Center

    Pasterczyk, Catherine E.

    1990-01-01

    Discusses ways in which using mathematical symbols to search online bibliographic databases in scientific and technical areas can improve search results. The representations used for Greek letters, relations, binary operators, arrows, and miscellaneous special symbols in the MathSci, Inspec, Compendex, and Chemical Abstracts databases are…

  3. Computer Databases: A Survey. Part 3: Product Databases.

    ERIC Educational Resources Information Center

    O'Leary, Mick

    1987-01-01

    Describes five online databases that focus on computer products, primarily software and microcomputing hardware, and compares the databases in terms of record content, product coverage, vertical market coverage, currency, availability, and price. Sample records and searches are provided, as well as a directory of product databases. (CLB)

  4. On Relations between Current Global Volcano Databases

    NASA Astrophysics Data System (ADS)

    Newhall, C. G.; Siebert, L.; Sparks, S.

    2009-12-01

    The Smithsonian’s Volcano Reference File (VRF), the database that underlies Volcanoes of the World and This Dynamic Planet, is the premier source for the “what, when, where, and how big?” of Holocene and historical eruptions. VOGRIPA (Volcanic Global Risk Identification and Analysis) will catalogue details of large eruptions, including specific phenomena and their impacts. CCDB (Collapse Caldera Database) also considers large eruptions with an emphasis on the resulting calderas. WOVOdat is bringing monitoring data from the world’s observatories into a centralized database in common formats, so that they can be searched and compared during volcanic crises and for research on preeruption processes. Oceanographic and space institutions worldwide have growing archives of volcano imagery and derivative products. Petrologic databases such as PETRODB and GEOROC offer compositions of many erupted and non-erupted magmas. Each of these informs and complements the others. Examples of interrelations include: ● Information in the VRF about individual volcanoes is the starting point and major source of background “volcano” data in WOVOdat, VOGRIPA, and petrologic databases. ● Images and digital topography from remote sensing archives offer high-resolution, consistent geospatial "base maps" for all of the other databases. ● VRF data about eruptions shows whether unrest of WOVOdat culminated in an eruption and, if yes, its type and magnitude. ● Data from WOVOdat fills in the “blanks” between eruptions in the VRF. ● VOGRIPA adds more detail to the VRF’s descriptions of eruptions, including quantification of runout distances, expanded estimated column heights and eruption impact data, and other parameters not included in the Smithsonian VRF. ● Petrologic databases can add detail to existing petrologic data of the VRF, WOVOdat, and VOGRIPA, e.g, detail needed to estimate viscosity of melt and its influence on magma and eruption dynamics ● Hazard

  5. Probabilistic Failure Assessment For Fatigue

    NASA Technical Reports Server (NTRS)

    Moore, Nicholas; Ebbeler, Donald; Newlin, Laura; Sutharshana, Sravan; Creager, Matthew

    1995-01-01

    Probabilistic Failure Assessment for Fatigue (PFAFAT) package of software utilizing probabilistic failure-assessment (PFA) methodology to model high- and low-cycle-fatigue modes of failure of structural components. Consists of nine programs. Three programs perform probabilistic fatigue analysis by means of Monte Carlo simulation. Other six used for generating random processes, characterizing fatigue-life data pertaining to materials, and processing outputs of computational simulations. Written in FORTRAN 77.

  6. Dynamical systems and probabilistic methods in partial differential equations

    SciTech Connect

    Deift, P.; Levermore, C.D.; Wayne, C.E.

    1996-12-31

    This publication covers material presented at the American Mathematical Society summer seminar in June, 1994. This seminar sought to provide participants exposure to a wide range of interesting and ongoing work on dynamic systems and the application of probabilistic methods in applied mathematics. Topics discussed include: the application of dynamical systems theory to the solution of partial differential equations; specific work with the complex Ginzburg-Landau, nonlinear Schroedinger, and Korteweg-deVries equations; applications in the area of fluid mechanics; turbulence studies from the perspective of probabilistic methods. Separate abstracts have been indexed into the database from articles in this proceedings.

  7. Probabilistic analysis of fires in nuclear plants

    SciTech Connect

    Unione, A.; Teichmann, T.

    1985-01-01

    The aim of this paper is to describe a multilevel (i.e., staged) probabilistic analysis of fire risks in nuclear plants (as part of a general PRA) which maximizes the benefits of the FRA (fire risk assessment) in a cost effective way. The approach uses several stages of screening, physical modeling of clearly dominant risk contributors, searches for direct (e.g., equipment dependences) and secondary (e.g., fire induced internal flooding) interactions, and relies on lessons learned and available data from and surrogate FRAs. The general methodology is outlined. 6 figs., 10 tabs.

  8. Probabilistic Seismic Hazard Analysis

    SciTech Connect

    Not Available

    1988-01-01

    The purpose of Probabilistic Seismic Hazard Analysis (PSHA) is to evaluate the hazard of seismic ground motion at a site by considering all possible earthquakes in the area, estimating the associated shaking at the site, and calculating the probabilities of these occurrences. The Panel on Seismic Hazard Analysis is charged with assessment of the capabilities, limitations, and future trends of PSHA in the context of alternatives. The report identifies and discusses key issues of PSHA and is addressed to decision makers with a modest scientific and technical background and to the scientific and technical community. 37 refs., 19 figs.

  9. Web Search Engines: Search Syntax and Features.

    ERIC Educational Resources Information Center

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  10. Conducting a Web Search.

    ERIC Educational Resources Information Center

    Miller-Whitehead, Marie

    Keyword and text string searches of online library catalogs often provide different results according to library and database used and depending upon how books and journals are indexed. For this reason, online databases such as ERIC often provide tutorials and recommendations for searching their site, such as how to use Boolean search strategies.…

  11. Probabilistic Cellular Automata

    PubMed Central

    Agapie, Alexandru; Giuclea, Marius

    2014-01-01

    Abstract Cellular automata are binary lattices used for modeling complex dynamical systems. The automaton evolves iteratively from one configuration to another, using some local transition rule based on the number of ones in the neighborhood of each cell. With respect to the number of cells allowed to change per iteration, we speak of either synchronous or asynchronous automata. If randomness is involved to some degree in the transition rule, we speak of probabilistic automata, otherwise they are called deterministic. With either type of cellular automaton we are dealing with, the main theoretical challenge stays the same: starting from an arbitrary initial configuration, predict (with highest accuracy) the end configuration. If the automaton is deterministic, the outcome simplifies to one of two configurations, all zeros or all ones. If the automaton is probabilistic, the whole process is modeled by a finite homogeneous Markov chain, and the outcome is the corresponding stationary distribution. Based on our previous results for the asynchronous case—connecting the probability of a configuration in the stationary distribution to its number of zero-one borders—the article offers both numerical and theoretical insight into the long-term behavior of synchronous cellular automata. PMID:24999557

  12. Quantum probabilistic logic programming

    NASA Astrophysics Data System (ADS)

    Balu, Radhakrishnan

    2015-05-01

    We describe a quantum mechanics based logic programming language that supports Horn clauses, random variables, and covariance matrices to express and solve problems in probabilistic logic. The Horn clauses of the language wrap random variables, including infinite valued, to express probability distributions and statistical correlations, a powerful feature to capture relationship between distributions that are not independent. The expressive power of the language is based on a mechanism to implement statistical ensembles and to solve the underlying SAT instances using quantum mechanical machinery. We exploit the fact that classical random variables have quantum decompositions to build the Horn clauses. We establish the semantics of the language in a rigorous fashion by considering an existing probabilistic logic language called PRISM with classical probability measures defined on the Herbrand base and extending it to the quantum context. In the classical case H-interpretations form the sample space and probability measures defined on them lead to consistent definition of probabilities for well formed formulae. In the quantum counterpart, we define probability amplitudes on Hinterpretations facilitating the model generations and verifications via quantum mechanical superpositions and entanglements. We cast the well formed formulae of the language as quantum mechanical observables thus providing an elegant interpretation for their probabilities. We discuss several examples to combine statistical ensembles and predicates of first order logic to reason with situations involving uncertainty.

  13. Topics in Probabilistic Judgment Aggregation

    ERIC Educational Resources Information Center

    Wang, Guanchun

    2011-01-01

    This dissertation is a compilation of several studies that are united by their relevance to probabilistic judgment aggregation. In the face of complex and uncertain events, panels of judges are frequently consulted to provide probabilistic forecasts, and aggregation of such estimates in groups often yield better results than could have been made…

  14. Passage Retrieval: A Probabilistic Technique.

    ERIC Educational Resources Information Center

    Melucci, Massimo

    1998-01-01

    Presents a probabilistic technique to retrieve passages from texts having a large size or heterogeneous semantic content. Results of experiments comparing the probabilistic technique to one based on a text segmentation algorithm revealed that the passage size affects passage retrieval performance; text organization and query generality may have an…

  15. Chemical Kinetics Database

    National Institute of Standards and Technology Data Gateway

    SRD 17 NIST Chemical Kinetics Database (Web, free access)   The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.

  16. Probabilistic retinal vessel segmentation

    NASA Astrophysics Data System (ADS)

    Wu, Chang-Hua; Agam, Gady

    2007-03-01

    Optic fundus assessment is widely used for diagnosing vascular and non-vascular pathology. Inspection of the retinal vasculature may reveal hypertension, diabetes, arteriosclerosis, cardiovascular disease and stroke. Due to various imaging conditions retinal images may be degraded. Consequently, the enhancement of such images and vessels in them is an important task with direct clinical applications. We propose a novel technique for vessel enhancement in retinal images that is capable of enhancing vessel junctions in addition to linear vessel segments. This is an extension of vessel filters we have previously developed for vessel enhancement in thoracic CT scans. The proposed approach is based on probabilistic models which can discern vessels and junctions. Evaluation shows the proposed filter is better than several known techniques and is comparable to the state of the art when evaluated on a standard dataset. A ridge-based vessel tracking process is applied on the enhanced image to demonstrate the effectiveness of the enhancement filter.

  17. Probabilistic Fiber Composite Micromechanics

    NASA Technical Reports Server (NTRS)

    Stock, Thomas A.

    1996-01-01

    Probabilistic composite micromechanics methods are developed that simulate expected uncertainties in unidirectional fiber composite properties. These methods are in the form of computational procedures using Monte Carlo simulation. The variables in which uncertainties are accounted for include constituent and void volume ratios, constituent elastic properties and strengths, and fiber misalignment. A graphite/epoxy unidirectional composite (ply) is studied to demonstrate fiber composite material property variations induced by random changes expected at the material micro level. Regression results are presented to show the relative correlation between predictor and response variables in the study. These computational procedures make possible a formal description of anticipated random processes at the intra-ply level, and the related effects of these on composite properties.

  18. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  19. Novel probabilistic neuroclassifier

    NASA Astrophysics Data System (ADS)

    Hong, Jiang; Serpen, Gursel

    2003-09-01

    A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.

  20. Specialist Bibliographic Databases

    PubMed Central

    2016-01-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  1. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485

  2. Specialist Bibliographic Databases.

    PubMed

    Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D

    2016-05-01

    Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.

  3. 78 FR 15746 - Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-12

    ... COMMISSION Compendium of Analyses To Investigate Select Level 1 Probabilistic Risk Assessment End-State... document entitled: Compendium of Analyses to Investigate Select Level 1 Probabilistic Risk Assessment End..., select ``ADAMS Public Documents'' and then select ``Begin Web- based ADAMS Search.'' For problems...

  4. Probabilistic brains: knowns and unknowns

    PubMed Central

    Pouget, Alexandre; Beck, Jeffrey M; Ma, Wei Ji; Latham, Peter E

    2015-01-01

    There is strong behavioral and physiological evidence that the brain both represents probability distributions and performs probabilistic inference. Computational neuroscientists have started to shed light on how these probabilistic representations and computations might be implemented in neural circuits. One particularly appealing aspect of these theories is their generality: they can be used to model a wide range of tasks, from sensory processing to high-level cognition. To date, however, these theories have only been applied to very simple tasks. Here we discuss the challenges that will emerge as researchers start focusing their efforts on real-life computations, with a focus on probabilistic learning, structural learning and approximate inference. PMID:23955561

  5. Probabilistic Design of Composite Structures

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2006-01-01

    A formal procedure for the probabilistic design evaluation of a composite structure is described. The uncertainties in all aspects of a composite structure (constituent material properties, fabrication variables, structural geometry, and service environments, etc.), which result in the uncertain behavior in the composite structural responses, are included in the evaluation. The probabilistic evaluation consists of: (1) design criteria, (2) modeling of composite structures and uncertainties, (3) simulation methods, and (4) the decision-making process. A sample case is presented to illustrate the formal procedure and to demonstrate that composite structural designs can be probabilistically evaluated with accuracy and efficiency.

  6. Online Patent Searching: The Realities.

    ERIC Educational Resources Information Center

    Kaback, Stuart M.

    1983-01-01

    Considers patent subject searching capabilities of major online databases, noting patent claims, "deep-indexed" files, test searches, retrieval of related references, multi-database searching, improvements needed in indexing of chemical structures, full text searching, improvements needed in handling numerical data, and augmenting a subject search…

  7. Probabilistic graphic models applied to identification of diseases.

    PubMed

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases.

  8. Probabilistic graphic models applied to identification of diseases

    PubMed Central

    Sato, Renato Cesar; Sato, Graziela Tiemy Kajita

    2015-01-01

    ABSTRACT Decision-making is fundamental when making diagnosis or choosing treatment. The broad dissemination of computed systems and databases allows systematization of part of decisions through artificial intelligence. In this text, we present basic use of probabilistic graphic models as tools to analyze causality in health conditions. This method has been used to make diagnosis of Alzheimer´s disease, sleep apnea and heart diseases. PMID:26154555

  9. Switching strategies to optimize search

    NASA Astrophysics Data System (ADS)

    Shlesinger, Michael F.

    2016-03-01

    Search strategies are explored when the search time is fixed, success is probabilistic and the estimate for success can diminish with time if there is not a successful result. Under the time constraint the problem is to find the optimal time to switch a search strategy or search location. Several variables are taken into account, including cost, gain, rate of success if a target is present and the probability that a target is present.

  10. Online Petroleum Industry Bibliographic Databases: A Review.

    ERIC Educational Resources Information Center

    Anderson, Margaret B.

    This paper discusses the present status of the bibliographic database industry, reviews the development of online databases of interest to the petroleum industry, and considers future developments in online searching and their effect on libraries and information centers. Three groups of databases are described: (1) databases developed by the…

  11. Probabilistic Open Set Recognition

    NASA Astrophysics Data System (ADS)

    Jain, Lalit Prithviraj

    Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary

  12. UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

    PubMed

    Robasky, Kimberly; Bulyk, Martha L

    2011-01-01

    The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.

  13. Probabilistic theories with purification

    SciTech Connect

    Chiribella, Giulio; D'Ariano, Giacomo Mauro; Perinotti, Paolo

    2010-06-15

    We investigate general probabilistic theories in which every mixed state has a purification, unique up to reversible channels on the purifying system. We show that the purification principle is equivalent to the existence of a reversible realization of every physical process, that is, to the fact that every physical process can be regarded as arising from a reversible interaction of the system with an environment, which is eventually discarded. From the purification principle we also construct an isomorphism between transformations and bipartite states that possesses all structural properties of the Choi-Jamiolkowski isomorphism in quantum theory. Such an isomorphism allows one to prove most of the basic features of quantum theory, like, e.g., existence of pure bipartite states giving perfect correlations in independent experiments, no information without disturbance, no joint discrimination of all pure states, no cloning, teleportation, no programming, no bit commitment, complementarity between correctable channels and deletion channels, characterization of entanglement-breaking channels as measure-and-prepare channels, and others, without resorting to the mathematical framework of Hilbert spaces.

  14. Simple implementation of a quantum search with trapped ions

    NASA Astrophysics Data System (ADS)

    Ivanov, Svetoslav S.; Ivanov, Peter A.; Vitanov, Nikolay V.

    2008-09-01

    We propose an ion-trap implementation of Grover’s quantum search algorithm for an unstructured database of arbitrary length N . The experimental implementation is appealingly simple because the linear ion trap allows for a straightforward construction, in a single interaction step and without a multitude of Hadamard transforms, of the reflection operator, which is the engine of the Grover algorithm. Consequently, a dramatic reduction in the number of the required physical steps takes place, to just O(N) , the same as the number of mathematical steps. The proposed setup allows for demonstration of both the original (probabilistic) Grover search and its deterministic variation, and is remarkably robust to imperfections in the register initialization.

  15. Biofuel Database

    National Institute of Standards and Technology Data Gateway

    Biofuel Database (Web, free access)   This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.

  16. Degradation monitoring using probabilistic inference

    NASA Astrophysics Data System (ADS)

    Alpay, Bulent

    In order to increase safety and improve economy and performance in a nuclear power plant (NPP), the source and extent of component degradations should be identified before failures and breakdowns occur. It is also crucial for the next generation of NPPs, which are designed to have a long core life and high fuel burnup to have a degradation monitoring system in order to keep the reactor in a safe state, to meet the designed reactor core lifetime and to optimize the scheduled maintenance. Model-based methods are based on determining the inconsistencies between the actual and expected behavior of the plant, and use these inconsistencies for detection and diagnostics of degradations. By defining degradation as a random abrupt change from the nominal to a constant degraded state of a component, we employed nonlinear filtering techniques based on state/parameter estimation. We utilized a Bayesian recursive estimation formulation in the sequential probabilistic inference framework and constructed a hidden Markov model to represent a general physical system. By addressing the problem of a filter's inability to estimate an abrupt change, which is called the oblivious filter problem in nonlinear extensions of Kalman filtering, and the sample impoverishment problem in particle filtering, we developed techniques to modify filtering algorithms by utilizing additional data sources to improve the filter's response to this problem. We utilized a reliability degradation database that can be constructed from plant specific operational experience and test and maintenance reports to generate proposal densities for probable degradation modes. These are used in a multiple hypothesis testing algorithm. We then test samples drawn from these proposal densities with the particle filtering estimates based on the Bayesian recursive estimation formulation with the Metropolis Hastings algorithm, which is a well-known Markov chain Monte Carlo method (MCMC). This multiple hypothesis testing

  17. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    NASA Astrophysics Data System (ADS)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption

  18. Databases for K-8 Students

    ERIC Educational Resources Information Center

    Young, Terrence E., Jr.

    2004-01-01

    Today's elementary school students have been exposed to computers since birth, so it is not surprising that they are so proficient at using them. As a result, they are ready to search databases that include topics and information appropriate for their age level. Subscription databases are digital copies of magazines, newspapers, journals,…

  19. PROBABILISTIC INFORMATION INTEGRATION TECHNOLOGY

    SciTech Connect

    J. BOOKER; M. MEYER; ET AL

    2001-02-01

    The Statistical Sciences Group at Los Alamos has successfully developed a structured, probabilistic, quantitative approach for the evaluation of system performance based on multiple information sources, called Information Integration Technology (IIT). The technology integrates diverse types and sources of data and information (both quantitative and qualitative), and their associated uncertainties, to develop distributions for performance metrics, such as reliability. Applications include predicting complex system performance, where test data are lacking or expensive to obtain, through the integration of expert judgment, historical data, computer/simulation model predictions, and any relevant test/experimental data. The technology is particularly well suited for tracking estimated system performance for systems under change (e.g. development, aging), and can be used at any time during product development, including concept and early design phases, prior to prototyping, testing, or production, and before costly design decisions are made. Techniques from various disciplines (e.g., state-of-the-art expert elicitation, statistical and reliability analysis, design engineering, physics modeling, and knowledge management) are merged and modified to develop formal methods for the data/information integration. The power of this technology, known as PREDICT (Performance and Reliability Evaluation with Diverse Information Combination and Tracking), won a 1999 R and D 100 Award (Meyer, Booker, Bement, Kerscher, 1999). Specifically the PREDICT application is a formal, multidisciplinary process for estimating the performance of a product when test data are sparse or nonexistent. The acronym indicates the purpose of the methodology: to evaluate the performance or reliability of a product/system by combining all available (often diverse) sources of information and then tracking that performance as the product undergoes changes.

  20. Database Administrator

    ERIC Educational Resources Information Center

    Moore, Pam

    2010-01-01

    The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…

  1. Comparison of vacuum matrix-assisted laser desorption/ionization (MALDI) and atmospheric pressure MALDI (AP-MALDI) tandem mass spectrometry of 2-dimensional separated and trypsin-digested glomerular proteins for database search derived identification.

    PubMed

    Mayrhofer, Corina; Krieger, Sigurd; Raptakis, Emmanuel; Allmaier, Günter

    2006-08-01

    Mass spectrometric based sequencing of enzymatic generated peptides is widely used to obtain specific sequence tags allowing the unambiguous identification of proteins. In the present study, two types of desorption/ionization techniques combined with different modes of ion dissociation, namely vacuum matrix-assisted laser desorption/ionization (vMALDI) high energy collision induced dissociation (CID) and post-source decay (PSD) as well as atmospheric pressure (AP)-MALDI low energy CID, were applied for the fragmentation of singly protonated peptide ions, which were derived from two-dimensional separated, silver-stained and trypsin-digested hydrophilic as well as hydrophobic glomerular proteins. Thereby, defined properties of the individual fragmentation pattern generated by the specified modes could be observed. Furthermore, the compatibility of the varying PSD and CID (MS/MS) data with database search derived identification using two public accessible search algorithms has been evaluated. The peptide sequence tag information obtained by PSD and high energy CID enabled in the majority of cases an unambiguous identification. In contrast, part of the data obtained by low energy CID were not assignable using similar search parameters and therefore no clear results were obtainable. The knowledge of the properties of available MALDI-based fragmentation techniques presents an important factor for data interpretation using public accessible search algorithms and moreover for the identification of two-dimensional gel separated proteins.

  2. Hawaii bibliographic database

    USGS Publications Warehouse

    Wright, T.L.; Takahashi, T.J.

    1998-01-01

    The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.

  3. Vagueness as Probabilistic Linguistic Knowledge

    NASA Astrophysics Data System (ADS)

    Lassiter, Daniel

    Consideration of the metalinguistic effects of utterances involving vague terms has led Barker [1] to treat vagueness using a modified Stalnakerian model of assertion. I present a sorites-like puzzle for factual beliefs in the standard Stalnakerian model [28] and show that it can be resolved by enriching the model to make use of probabilistic belief spaces. An analogous problem arises for metalinguistic information in Barker's model, and I suggest that a similar enrichment is needed here as well. The result is a probabilistic theory of linguistic representation that retains a classical metalanguage but avoids the undesirable divorce between meaning and use inherent in the epistemic theory [34]. I also show that the probabilistic approach provides a plausible account of the sorites paradox and higher-order vagueness and that it fares well empirically and conceptually in comparison to leading competitors.

  4. Custom Search Engines: Tools & Tips

    ERIC Educational Resources Information Center

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  5. Probabilistic reasoning in data analysis.

    PubMed

    Sirovich, Lawrence

    2011-09-20

    This Teaching Resource provides lecture notes, slides, and a student assignment for a lecture on probabilistic reasoning in the analysis of biological data. General probabilistic frameworks are introduced, and a number of standard probability distributions are described using simple intuitive ideas. Particular attention is focused on random arrivals that are independent of prior history (Markovian events), with an emphasis on waiting times, Poisson processes, and Poisson probability distributions. The use of these various probability distributions is applied to biomedical problems, including several classic experimental studies.

  6. JCGGDB: Japan Consortium for Glycobiology and Glycotechnology Database.

    PubMed

    Maeda, Masako; Fujita, Noriaki; Suzuki, Yoshinori; Sawaki, Hiromichi; Shikanai, Toshihide; Narimatsu, Hisashi

    2015-01-01

    The biological significance of glycans has been widely studied and reported in the past. However, most achievements of our predecessors are not readily available in existing databases. JCGGDB is a meta-database involving 15 original databases in AIST and 5 cooperative databases in alliance with JCGG: Japan Consortium for Glycobiology and Glycotechnology. It centers on a glycan structure database and accumulates information such as glycan preferences of lectins, glycosylation sites in proteins, and genes related to glycan syntheses from glycoscience and related fields. This chapter illustrates how to use three major search interfaces (Keyword Search, Structure Search, and GlycoChem Explorer) available in JCGGDB to search across multiple databases.

  7. The Giardia genome project database.

    PubMed

    McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L

    2000-08-15

    The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.

  8. Online Database Coverage of Forensic Medicine.

    ERIC Educational Resources Information Center

    Snow, Bonnie; Ifshin, Steven L.

    1984-01-01

    Online seaches of sample topics in the area of forensic medicine were conducted in the following life science databases: Biosis Previews, Excerpta Medica, Medline, Scisearch, and Chemical Abstracts Search. Search outputs analyzed according to criteria of recall, uniqueness, overlap, and utility reveal the need for a cross-database approach to…

  9. A Stemming Algorithm for Latin Text Databases.

    ERIC Educational Resources Information Center

    Schinke, Robyn; And Others

    1996-01-01

    Describes the design of a stemming algorithm for searching Latin text databases. The algorithm uses a longest-match approach with some recoding but differs from most stemmers in its use of two separate suffix dictionaries for processing query and database words that enables users to pursue specific searches for single grammatical forms of words.…

  10. ProbCons: Probabilistic consistency-based multiple sequence alignment.

    PubMed

    Do, Chuong B; Mahabhashyam, Mahathi S P; Brudno, Michael; Batzoglou, Serafim

    2005-02-01

    To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consistency, a novel scoring function for multiple sequence comparisons. We present ProbCons, a practical tool for progressive protein multiple sequence alignment based on probabilistic consistency, and evaluate its performance on several standard alignment benchmark data sets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, ProbCons achieves statistically significant improvement over other leading methods while maintaining practical speed. ProbCons is publicly available as a Web resource.

  11. Data mining in forensic image databases

    NASA Astrophysics Data System (ADS)

    Geradts, Zeno J.; Bijhold, Jurrien

    2002-07-01

    Forensic Image Databases appear in a wide variety. The oldest computer database is with fingerprints. Other examples of databases are shoeprints, handwriting, cartridge cases, toolmarks drugs tablets and faces. In these databases searches are conducted on shape, color and other forensic features. There exist a wide variety of methods for searching in images in these databases. The result will be a list of candidates that should be compared manually. The challenge in forensic science is to combine the information acquired. The combination of the shape of a partial shoe print with information on a cartridge case can result in stronger evidence. It is expected that searching in the combination of these databases with other databases (e.g. network traffic information) more crimes will be solved. Searching in image databases is still difficult, as we can see in databases of faces. Due to lighting conditions and altering of the face by aging, it is nearly impossible to find a right face from a database of one million faces in top position by a image searching method, without using other information. The methods for data mining in images in databases (e.g. MPEG-7 framework) are discussed, and the expectations of future developments are presented in this study.

  12. Generalization in probabilistic RAM nets.

    PubMed

    Clarkson, T G; Guan, Y; Taylor, J G; Gorse, D

    1993-01-01

    The probabilistic RAM (pRAM) is a hardware-realizable neural device which is stochastic in operation and highly nonlinear. Even small nets of pRAMs offer high levels of functionality. The means by which a pRAM network generalizes when trained in noise is shown and the results of this behavior are described.

  13. Probabilistic assessment of composite structures

    NASA Technical Reports Server (NTRS)

    Shiao, Michael E.; Abumeri, Galib H.; Chamis, Christos C.

    1993-01-01

    A general computational simulation methodology for an integrated probabilistic assessment of composite structures is discussed and demonstrated using aircraft fuselage (stiffened composite cylindrical shell) structures with rectangular cutouts. The computational simulation was performed for the probabilistic assessment of the structural behavior including buckling loads, vibration frequencies, global displacements, and local stresses. The scatter in the structural response is simulated based on the inherent uncertainties in the primitive (independent random) variables at the fiber matrix constituent, ply, laminate, and structural scales that describe the composite structures. The effect of uncertainties due to fabrication process variables such as fiber volume ratio, void volume ratio, ply orientation, and ply thickness is also included. The methodology has been embedded in the computer code IPACS (Integrated Probabilistic Assessment of Composite Structures). In addition to the simulated scatter, the IPACS code also calculates the sensitivity of the composite structural behavior to all the primitive variables that influence the structural behavior. This information is useful for assessing reliability and providing guidance for improvement. The results from the probabilistic assessment for the composite structure with rectangular cutouts indicate that the uncertainty in the longitudinal ply stress is mainly caused by the uncertainty in the laminate thickness, and the large overlap of the scatter in the first four buckling loads implies that the buckling mode shape for a specific buckling load can be either of the four modes.

  14. Research on probabilistic information processing

    NASA Technical Reports Server (NTRS)

    Edwards, W.

    1973-01-01

    The work accomplished on probabilistic information processing (PIP) is reported. The research proposals and decision analysis are discussed along with the results of research on MSC setting, multiattribute utilities, and Bayesian research. Abstracts of reports concerning the PIP research are included.

  15. Enhanced probabilistic microcell prediction model

    NASA Astrophysics Data System (ADS)

    Kim, Song-Kyoo

    2005-06-01

    A microcell is a cell with 1-km or less radius which is suitable not only for heavily urbanized area such as a metropolitan city but also for in-building area such as offices and shopping malls. This paper deals with the microcell prediction model of propagation loss focused on in-buildng solution that is analyzed by probabilistic techniques. The RSL (Receive Signal Level) is the factor which can evaluate the performance of a microcell and the LOS (Line-Of-Sight) component and the blockage loss directly effect on the RSL. Combination of the probabilistic method is applied to get these performance factors. The mathematical methods include the CLT (Central Limit Theorem) and the SSQC (Six-Sigma Quality Control) to get the parameters of the distribution. This probabilistic solution gives us compact measuring of performance factors. In addition, it gives the probabilistic optimization of strategies such as the number of cells, cell location, capacity of cells, range of cells and so on. In addition, the optimal strategies for antenna allocation for a building can be obtained by using this model.

  16. Designing Probabilistic Tasks for Kindergartners

    ERIC Educational Resources Information Center

    Skoumpourdi, Chrysanthi; Kafoussi, Sonia; Tatsis, Konstantinos

    2009-01-01

    Recent research suggests that children could be engaged in probability tasks at an early age and task characteristics seem to play an important role in the way children perceive an activity. To this direction in the present article we investigate the role of some basic characteristics of probabilistic tasks in their design and implementation. In…

  17. A rapid profiling of gallotannins and flavonoids of the aqueous extract of Rhus coriaria L. by flow injection analysis with high-resolution mass spectrometry assisted with database searching.

    PubMed

    Regazzoni, Luca; Arlandini, Emanuele; Garzon, Davide; Santagati, Natale Alfredo; Beretta, Giangiacomo; Maffei Facino, Roberto

    2013-01-01

    Hydrolysable tannins appear to have some extremely important biological roles including antioxidant, antibacterial, anti-inflammatory, anti-hypoglycemic, anti-angiogenic, and anticancer activities. The aim of this work was to set up a flow injection high-resolution mass spectrometric approach combined with database searching to obtain rapidly a profiling of gallotannins and other phenolics in a crude extract from plant tissue. The flow injection analysis (FIA) takes place in an electrospray ionization source of an hybrid orbitrap high resolution mass spectrometry system (ESI-HR-MS/MS(2), resolution 100,000, negative ion mode) and polyphenols are tentatively identified by matching the monoisotopic masses of the spectra with those of polyphenols databases. This leads to the most probable molecular formulas and to the possible structures among those reported in the database. The structures confirmation occurs by the compliance of MS(2) fragments with those of a prediction fragment commercial database. With this method we identified in the aqueous extract of sumac leaves, with a maximum error of 1.7 ppm, a group of ten gallotannins from mono- to deca-galloyl glycosides of the class of hydrolysable tannins and a set of coextracted flavonoid derivatives including myricetin, quercetin-3-O-rhamnoside, myricetin-3-O-glucoside, myricetin-3-O-glucuronide, and myricetin-3-O-rhamnoglucoside. The separation of isomers of gallotannins and flavonoids present in the same extract occurred by high-performance liquid chromatography with electrospray ionization tandem mass spectrometry (HPLC/ESI-HR-MS(2)); this approach allowed the structure resolution of the isobaric flavonoids quercetin-3-O-glucoside and myricetin-3-O-rhamnoside.

  18. Probabilistic Mass Growth Uncertainties

    NASA Technical Reports Server (NTRS)

    Plumer, Eric; Elliott, Darren

    2013-01-01

    Mass has been widely used as a variable input parameter for Cost Estimating Relationships (CER) for space systems. As these space systems progress from early concept studies and drawing boards to the launch pad, their masses tend to grow substantially, hence adversely affecting a primary input to most modeling CERs. Modeling and predicting mass uncertainty, based on historical and analogous data, is therefore critical and is an integral part of modeling cost risk. This paper presents the results of a NASA on-going effort to publish mass growth datasheet for adjusting single-point Technical Baseline Estimates (TBE) of masses of space instruments as well as spacecraft, for both earth orbiting and deep space missions at various stages of a project's lifecycle. This paper will also discusses the long term strategy of NASA Headquarters in publishing similar results, using a variety of cost driving metrics, on an annual basis. This paper provides quantitative results that show decreasing mass growth uncertainties as mass estimate maturity increases. This paper's analysis is based on historical data obtained from the NASA Cost Analysis Data Requirements (CADRe) database.

  19. The comprehensive peptaibiotics database.

    PubMed

    Stoppacher, Norbert; Neumann, Nora K N; Burgstaller, Lukas; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer

    2013-05-01

    Peptaibiotics are nonribosomally biosynthesized peptides, which - according to definition - contain the marker amino acid α-aminoisobutyric acid (Aib) and possess antibiotic properties. Being known since 1958, a constantly increasing number of peptaibiotics have been described and investigated with a particular emphasis on hypocrealean fungi. Starting from the existing online 'Peptaibol Database', first published in 1997, an exhaustive literature survey of all known peptaibiotics was carried out and resulted in a list of 1043 peptaibiotics. The gathered information was compiled and used to create the new 'The Comprehensive Peptaibiotics Database', which is presented here. The database was devised as a software tool based on Microsoft (MS) Access. It is freely available from the internet at http://peptaibiotics-database.boku.ac.at and can easily be installed and operated on any computer offering a Windows XP/7 environment. It provides useful information on characteristic properties of the peptaibiotics included such as peptide category, group name of the microheterogeneous mixture to which the peptide belongs, amino acid sequence, sequence length, producing fungus, peptide subfamily, molecular formula, and monoisotopic mass. All these characteristics can be used and combined for automated search within the database, which makes The Comprehensive Peptaibiotics Database a versatile tool for the retrieval of valuable information about peptaibiotics. Sequence data have been considered as to December 14, 2012. PMID:23681723

  20. The apoptosis database.

    PubMed

    Doctor, K S; Reed, J C; Godzik, A; Bourne, P E

    2003-06-01

    The apoptosis database is a public resource for researchers and students interested in the molecular biology of apoptosis. The resource provides functional annotation, literature references, diagrams/images, and alternative nomenclatures on a set of proteins having 'apoptotic domains'. These are the distinctive domains that are often, if not exclusively, found in proteins involved in apoptosis. The initial choice of proteins to be included is defined by apoptosis experts and bioinformatics tools. Users can browse through the web accessible lists of domains, proteins containing these domains and their associated homologs. The database can also be searched by sequence homology using basic local alignment search tool, text word matches of the annotation, and identifiers for specific records. The resource is available at http://www.apoptosis-db.org and is updated on a regular basis.

  1. Collection Fusion Using Bayesian Estimation of a Linear Regression Model in Image Databases on the Web.

    ERIC Educational Resources Information Center

    Kim, Deok-Hwan; Chung, Chin-Wan

    2003-01-01

    Discusses the collection fusion problem of image databases, concerned with retrieving relevant images by content based retrieval from image databases distributed on the Web. Focuses on a metaserver which selects image databases supporting similarity measures and proposes a new algorithm which exploits a probabilistic technique using Bayesian…

  2. Database Manager

    ERIC Educational Resources Information Center

    Martin, Andrew

    2010-01-01

    It is normal practice today for organizations to store large quantities of records of related information as computer-based files or databases. Purposeful information is retrieved by performing queries on the data sets. The purpose of DATABASE MANAGER is to communicate to students the method by which the computer performs these queries. This…

  3. Maize databases

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  4. A search for pre-main sequence stars in the high-latitude molecular clouds. II - A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris

    1990-01-01

    The preliminary results are reported of a survey of every EINSTEIN image which overlaps any high-latitude molecular cloud in a search for X-ray emitting pre-main sequence stars. This survey, together with complementary KPNO and IRAS data, will allow the determination of how prevalent low mass star formation is in these clouds in general and, particularly, in the translucent molecular clouds.

  5. A probabilistic Hu-Washizu variational principle

    NASA Technical Reports Server (NTRS)

    Liu, W. K.; Belytschko, T.; Besterfield, G. H.

    1987-01-01

    A Probabilistic Hu-Washizu Variational Principle (PHWVP) for the Probabilistic Finite Element Method (PFEM) is presented. This formulation is developed for both linear and nonlinear elasticity. The PHWVP allows incorporation of the probabilistic distributions for the constitutive law, compatibility condition, equilibrium, domain and boundary conditions into the PFEM. Thus, a complete probabilistic analysis can be performed where all aspects of the problem are treated as random variables and/or fields. The Hu-Washizu variational formulation is available in many conventional finite element codes thereby enabling the straightforward inclusion of the probabilistic features into present codes.

  6. Entanglement and thermodynamics in general probabilistic theories

    NASA Astrophysics Data System (ADS)

    Chiribella, Giulio; Scandolo, Carlo Maria

    2015-10-01

    Entanglement is one of the most striking features of quantum mechanics, and yet it is not specifically quantum. More specific to quantum mechanics is the connection between entanglement and thermodynamics, which leads to an identification between entropies and measures of pure state entanglement. Here we search for the roots of this connection, investigating the relation between entanglement and thermodynamics in the framework of general probabilistic theories. We first address the question whether an entangled state can be transformed into another by means of local operations and classical communication. Under two operational requirements, we prove a general version of the Lo-Popescu theorem, which lies at the foundations of the theory of pure-state entanglement. We then consider a resource theory of purity where free operations are random reversible transformations, modelling the scenario where an agent has limited control over the dynamics of a closed system. Our key result is a duality between the resource theory of entanglement and the resource theory of purity, valid for every physical theory where all processes arise from pure states and reversible interactions at the fundamental level. As an application of the main result, we establish a one-to-one correspondence between entropies and measures of pure bipartite entanglement. The correspondence is then used to define entanglement measures in the general probabilistic framework. Finally, we show a duality between the task of information erasure and the task of entanglement generation, whereby the existence of entropy sinks (systems that can absorb arbitrary amounts of information) becomes equivalent to the existence of entanglement sources (correlated systems from which arbitrary amounts of entanglement can be extracted).

  7. Evaluation of Federated Searching Options for the School Library

    ERIC Educational Resources Information Center

    Abercrombie, Sarah E.

    2008-01-01

    Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…

  8. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.

  9. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery. PMID:19727614

  10. Andromeda: a peptide search engine integrated into the MaxQuant environment.

    PubMed

    Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias

    2011-04-01

    A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.

  11. A search for pre-main-sequence stars in high-latitude molecular clouds. 3: A survey of the Einstein database

    NASA Technical Reports Server (NTRS)

    Caillault, Jean-Pierre; Magnani, Loris; Fryer, Chris

    1995-01-01

    In order to discern whether the high-latitude molecular clouds are regions of ongoing star formation, we have used X-ray emission as a tracer of youthful stars. The entire Einstein database yields 18 images which overlap 10 of the clouds mapped partially or completely in the CO (1-0) transition, providing a total of approximately 6 deg squared of overlap. Five previously unidentified X-ray sources were detected: one has an optical counterpart which is a pre-main-sequence (PMS) star, and two have normal main-sequence stellar counterparts, while the other two are probably extragalactic sources. The PMS star is located in a high Galactic latitude Lynds dark cloud, so this result is not too suprising. The translucent clouds, though, have yet to reveal any evidence of star formation.

  12. Environmental probabilistic quantitative assessment methodologies

    USGS Publications Warehouse

    Crovelli, R.A.

    1995-01-01

    In this paper, four petroleum resource assessment methodologies are presented as possible pollution assessment methodologies, even though petroleum as a resource is desirable, whereas pollution is undesirable. A methodology is defined in this paper to consist of a probability model and a probabilistic method, where the method is used to solve the model. The following four basic types of probability models are considered: 1) direct assessment, 2) accumulation size, 3) volumetric yield, and 4) reservoir engineering. Three of the four petroleum resource assessment methodologies were written as microcomputer systems, viz. TRIAGG for direct assessment, APRAS for accumulation size, and FASPU for reservoir engineering. A fourth microcomputer system termed PROBDIST supports the three assessment systems. The three assessment systems have different probability models but the same type of probabilistic method. The type of advantages of the analytic method are in computational speed and flexibility, making it ideal for a microcomputer. -from Author

  13. WWW Search Tools in Reference Services.

    ERIC Educational Resources Information Center

    Kimmel, Stacey

    1997-01-01

    Provides an introduction to World Wide Web search tools for reference services. Discusses characteristics of search services and types of tools available, including search engines/robot-generated databases, directories, metasearch engines, and review/rating sites. (AEF)

  14. Probabilistic Simulation for Nanocomposite Characterization

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.; Coroneos, Rula M.

    2007-01-01

    A unique probabilistic theory is described to predict the properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths properties of a mononanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions.

  15. Probabilistic Cloning and Quantum Computation

    NASA Astrophysics Data System (ADS)

    Gao, Ting; Yan, Feng-Li; Wang, Zhi-Xi

    2004-06-01

    We discuss the usefulness of quantum cloning and present examples of quantum computation tasks for which the cloning offers an advantage which cannot be matched by any approach that does not resort to quantum cloning. In these quantum computations, we need to distribute quantum information contained in the states about which we have some partial information. To perform quantum computations, we use a state-dependent probabilistic quantum cloning procedure to distribute quantum information in the middle of a quantum computation.

  16. Distribution functions of probabilistic automata

    NASA Technical Reports Server (NTRS)

    Vatan, F.

    2001-01-01

    Each probabilistic automaton M over an alphabet A defines a probability measure Prob sub(M) on the set of all finite and infinite words over A. We can identify a k letter alphabet A with the set {0, 1,..., k-1}, and, hence, we can consider every finite or infinite word w over A as a radix k expansion of a real number X(w) in the interval [0, 1]. This makes X(w) a random variable and the distribution function of M is defined as usual: F(x) := Prob sub(M) { w: X(w) < x }. Utilizing the fixed-point semantics (denotational semantics), extended to probabilistic computations, we investigate the distribution functions of probabilistic automata in detail. Automata with continuous distribution functions are characterized. By a new, and much more easier method, it is shown that the distribution function F(x) is an analytic function if it is a polynomial. Finally, answering a question posed by D. Knuth and A. Yao, we show that a polynomial distribution function F(x) on [0, 1] can be generated by a prob abilistic automaton iff all the roots of F'(x) = 0 in this interval, if any, are rational numbers. For this, we define two dynamical systems on the set of polynomial distributions and study attracting fixed points of random composition of these two systems.

  17. Using the Reactome Database

    PubMed Central

    Haw, Robin

    2012-01-01

    There is considerable interest in the bioinformatics community in creating pathway databases. The Reactome project (a collaboration between the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University Medical Center and the European Bioinformatics Institute) is one such pathway database and collects structured information on all the biological pathways and processes in the human. It is an expert-authored and peer-reviewed, curated collection of well-documented molecular reactions that span the gamut from simple intermediate metabolism to signaling pathways and complex cellular events. This information is supplemented with likely orthologous molecular reactions in mouse, rat, zebrafish, worm and other model organisms. This unit describes how to use the Reactome database to learn the steps of a biological pathway; navigate and browse through the Reactome database; identify the pathways in which a molecule of interest is involved; use the Pathway and Expression analysis tools to search the database for and visualize possible connections within user-supplied experimental data set and Reactome pathways; and the Species Comparison tool to compare human and model organism pathways. PMID:22700314

  18. NASA Records Database

    NASA Technical Reports Server (NTRS)

    Callac, Christopher; Lunsford, Michelle

    2005-01-01

    The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.

  19. PCEMCAN - Probabilistic Ceramic Matrix Composites Analyzer: User's Guide, Version 1.0

    NASA Technical Reports Server (NTRS)

    Shah, Ashwin R.; Mital, Subodh K.; Murthy, Pappu L. N.

    1998-01-01

    PCEMCAN (Probabalistic CEramic Matrix Composites ANalyzer) is an integrated computer code developed at NASA Lewis Research Center that simulates uncertainties associated with the constituent properties, manufacturing process, and geometric parameters of fiber reinforced ceramic matrix composites and quantifies their random thermomechanical behavior. The PCEMCAN code can perform the deterministic as well as probabilistic analyses to predict thermomechanical properties. This User's guide details the step-by-step procedure to create input file and update/modify the material properties database required to run PCEMCAN computer code. An overview of the geometric conventions, micromechanical unit cell, nonlinear constitutive relationship and probabilistic simulation methodology is also provided in the manual. Fast probability integration as well as Monte-Carlo simulation methods are available for the uncertainty simulation. Various options available in the code to simulate probabilistic material properties and quantify sensitivity of the primitive random variables have been described. The description of deterministic as well as probabilistic results have been described using demonstration problems. For detailed theoretical description of deterministic and probabilistic analyses, the user is referred to the companion documents "Computational Simulation of Continuous Fiber-Reinforced Ceramic Matrix Composite Behavior," NASA TP-3602, 1996 and "Probabilistic Micromechanics and Macromechanics for Ceramic Matrix Composites", NASA TM 4766, June 1997.

  20. Search algorithm complexity modeling with application to image alignment and matching

    NASA Astrophysics Data System (ADS)

    DelMarco, Stephen

    2014-05-01

    Search algorithm complexity modeling, in the form of penetration rate estimation, provides a useful way to estimate search efficiency in application domains which involve searching over a hypothesis space of reference templates or models, as in model-based object recognition, automatic target recognition, and biometric recognition. The penetration rate quantifies the expected portion of the database that must be searched, and is useful for estimating search algorithm computational requirements. In this paper we perform mathematical modeling to derive general equations for penetration rate estimates that are applicable to a wide range of recognition problems. We extend previous penetration rate analyses to use more general probabilistic modeling assumptions. In particular we provide penetration rate equations within the framework of a model-based image alignment application domain in which a prioritized hierarchical grid search is used to rank subspace bins based on matching probability. We derive general equations, and provide special cases based on simplifying assumptions. We show how previously-derived penetration rate equations are special cases of the general formulation. We apply the analysis to model-based logo image alignment in which a hierarchical grid search is used over a geometric misalignment transform hypothesis space. We present numerical results validating the modeling assumptions and derived formulation.

  1. Probabilistic Assessment of Cancer Risk for Astronauts on Lunar Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2009-01-01

    During future lunar missions, exposure to solar particle events (SPEs) is a major safety concern for crew members during extra-vehicular activities (EVAs) on the lunar surface or Earth-to-moon transit. NASA s new lunar program anticipates that up to 15% of crew time may be on EVA, with minimal radiation shielding. For the operational challenge to respond to events of unknown size and duration, a probabilistic risk assessment approach is essential for mission planning and design. Using the historical database of proton measurements during the past 5 solar cycles, a typical hazard function for SPE occurrence was defined using a non-homogeneous Poisson model as a function of time within a non-specific future solar cycle of 4000 days duration. Distributions ranging from the 5th to 95th percentile of particle fluences for a specified mission period were simulated. Organ doses corresponding to particle fluences at the median and at the 95th percentile for a specified mission period were assessed using NASA s baryon transport model, BRYNTRN. The cancer fatality risk for astronauts as functions of age, gender, and solar cycle activity were then analyzed. The probability of exceeding the NASA 30- day limit of blood forming organ (BFO) dose inside a typical spacecraft was calculated. Future work will involve using this probabilistic risk assessment approach to SPE forecasting, combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  2. Standardization of Keyword Search Mode

    ERIC Educational Resources Information Center

    Su, Di

    2010-01-01

    In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…

  3. A Database Selection Expert System Based on Reference Librarian's Database Selection Strategy: A Usability and Empirical Evaluation.

    ERIC Educational Resources Information Center

    Ma, Wei

    2002-01-01

    Describes the development of a prototype Web-based database selection expert system at the University of Illinois at Urbana-Champaign that is based on reference librarians' database selection strategy which allows users to simultaneously search all available databases to identify those most relevant to their search using free-text keywords or…

  4. Binary Encoded-Prototype Tree for Probabilistic Model Building GP

    NASA Astrophysics Data System (ADS)

    Yanase, Toshihiko; Hasegawa, Yoshihiko; Iba, Hitoshi

    In recent years, program evolution algorithms based on the estimation of distribution algorithm (EDA) have been proposed to improve search ability of genetic programming (GP) and to overcome GP-hard problems. One such method is the probabilistic prototype tree (PPT) based algorithm. The PPT based method explores the optimal tree structure by using the full tree whose number of child nodes is maximum among possible trees. This algorithm, however, suffers from problems arising from function nodes having different number of child nodes. These function nodes cause intron nodes, which do not affect the fitness function. Moreover, the function nodes having many child nodes increase the search space and the number of samples necessary for properly constructing the probabilistic model. In order to solve this problem, we propose binary encoding for PPT. In this article, we convert each function node to a subtree of binary nodes where the converted tree is correct in grammar. Our method reduces ineffectual search space, and the binary encoded tree is able to express the same tree structures as the original method. The effectiveness of the proposed method is demonstrated through the use of two computational experiments.

  5. Probabilistic Computational Methods in Structural Failure Analysis

    NASA Astrophysics Data System (ADS)

    Krejsa, Martin; Kralik, Juraj

    2015-12-01

    Probabilistic methods are used in engineering where a computational model contains random variables. Each random variable in the probabilistic calculations contains uncertainties. Typical sources of uncertainties are properties of the material and production and/or assembly inaccuracies in the geometry or the environment where the structure should be located. The paper is focused on methods for the calculations of failure probabilities in structural failure and reliability analysis with special attention on newly developed probabilistic method: Direct Optimized Probabilistic Calculation (DOProC), which is highly efficient in terms of calculation time and the accuracy of the solution. The novelty of the proposed method lies in an optimized numerical integration that does not require any simulation technique. The algorithm has been implemented in mentioned software applications, and has been used several times in probabilistic tasks and probabilistic reliability assessments.

  6. Probabilistic Aeroelastic Analysis of Turbomachinery Components

    NASA Technical Reports Server (NTRS)

    Reddy, T. S. R.; Mital, S. K.; Stefko, G. L.

    2004-01-01

    A probabilistic approach is described for aeroelastic analysis of turbomachinery blade rows. Blade rows with subsonic flow and blade rows with supersonic flow with subsonic leading edge are considered. To demonstrate the probabilistic approach, the flutter frequency, damping and forced response of a blade row representing a compressor geometry is considered. The analysis accounts for uncertainties in structural and aerodynamic design variables. The results are presented in the form of probabilistic density function (PDF) and sensitivity factors. For subsonic flow cascade, comparisons are also made with different probabilistic distributions, probabilistic methods, and Monte-Carlo simulation. The approach shows that the probabilistic approach provides a more realistic and systematic way to assess the effect of uncertainties in design variables on the aeroelastic instabilities and response.

  7. Probabilistic models of language processing and acquisition.

    PubMed

    Chater, Nick; Manning, Christopher D

    2006-07-01

    Probabilistic methods are providing new explanatory approaches to fundamental cognitive science questions of how humans structure, process and acquire language. This review examines probabilistic models defined over traditional symbolic structures. Language comprehension and production involve probabilistic inference in such models; and acquisition involves choosing the best model, given innate constraints and linguistic and other input. Probabilistic models can account for the learning and processing of language, while maintaining the sophistication of symbolic models. A recent burgeoning of theoretical developments and online corpus creation has enabled large models to be tested, revealing probabilistic constraints in processing, undermining acquisition arguments based on a perceived poverty of the stimulus, and suggesting fruitful links with probabilistic theories of categorization and ambiguity resolution in perception.

  8. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata.

    PubMed

    Inglis, Diane O; Arnaud, Martha B; Binkley, Jonathan; Shah, Prachi; Skrzypek, Marek S; Wymore, Farrell; Binkley, Gail; Miyasato, Stuart R; Simison, Matt; Sherlock, Gavin

    2012-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

  9. Physical Education Research--Computerized Databases in an Interdisciplinary Field.

    ERIC Educational Resources Information Center

    Clever, Elaine Cox; Dillard, David P.

    1993-01-01

    With the advent of online searching, the terminology available for topic searching has greatly expanded and deepened. CD-ROMs and computerized databases available through online search services offer a variety of approaches to research in physical education. The article explains the use of computerized databases in an interdisciplinary field. (SM)

  10. Typing mineral deposits using their associated rocks, grades and tonnages using a probabilistic neural network

    USGS Publications Warehouse

    Singer, D.A.

    2006-01-01

    A probabilistic neural network is employed to classify 1610 mineral deposits into 18 types using tonnage, average Cu, Mo, Ag, Au, Zn, and Pb grades, and six generalized rock types. The purpose is to examine whether neural networks might serve for integrating geoscience information available in large mineral databases to classify sites by deposit type. Successful classifications of 805 deposits not used in training - 87% with grouped porphyry copper deposits - and the nature of misclassifications demonstrate the power of probabilistic neural networks and the value of quantitative mineral-deposit models. The results also suggest that neural networks can classify deposits as well as experienced economic geologists. ?? International Association for Mathematical Geology 2006.

  11. Probabilistic Assessment of Cancer Risk from Solar Particle Events

    NASA Astrophysics Data System (ADS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    For long duration missions outside of the protection of the Earth's magnetic field, space radi-ation presents significant health risks including cancer mortality. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons (less than several hundred MeV); and galactic cosmic ray (GCR), which include high energy protons and heavy ions. While the frequency distribution of SPEs depends strongly upon the phase within the solar activity cycle, the individual SPE occurrences themselves are random in nature. We es-timated the probability of SPE occurrence using a non-homogeneous Poisson model to fit the historical database of proton measurements. Distributions of particle fluences of SPEs for a specified mission period were simulated ranging from its 5th to 95th percentile to assess the cancer risk distribution. Spectral variability of SPEs was also examined, because the detailed energy spectra of protons are important especially at high energy levels for assessing the cancer risk associated with energetic particles for large events. We estimated the overall cumulative probability of GCR environment for a specified mission period using a solar modulation model for the temporal characterization of the GCR environment represented by the deceleration po-tential (φ). Probabilistic assessment of cancer fatal risk was calculated for various periods of lunar and Mars missions. This probabilistic approach to risk assessment from space radiation is in support of mission design and operational planning for future manned space exploration missions. In future work, this probabilistic approach to the space radiation will be combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  12. Probabilistic Assessment of Cancer Risk from Solar Particle Events

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee Y.; Cucinotta, Francis A.

    2010-01-01

    For long duration missions outside of the protection of the Earth s magnetic field, space radiation presents significant health risks including cancer mortality. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons (less than several hundred MeV); and galactic cosmic ray (GCR), which include high energy protons and heavy ions. While the frequency distribution of SPEs depends strongly upon the phase within the solar activity cycle, the individual SPE occurrences themselves are random in nature. We estimated the probability of SPE occurrence using a non-homogeneous Poisson model to fit the historical database of proton measurements. Distributions of particle fluences of SPEs for a specified mission period were simulated ranging from its 5 th to 95th percentile to assess the cancer risk distribution. Spectral variability of SPEs was also examined, because the detailed energy spectra of protons are important especially at high energy levels for assessing the cancer risk associated with energetic particles for large events. We estimated the overall cumulative probability of GCR environment for a specified mission period using a solar modulation model for the temporal characterization of the GCR environment represented by the deceleration potential (^). Probabilistic assessment of cancer fatal risk was calculated for various periods of lunar and Mars missions. This probabilistic approach to risk assessment from space radiation is in support of mission design and operational planning for future manned space exploration missions. In future work, this probabilistic approach to the space radiation will be combined with a probabilistic approach to the radiobiological factors that contribute to the uncertainties in projecting cancer risks.

  13. Probabilistic cloning of three nonorthogonal states

    NASA Astrophysics Data System (ADS)

    Zhang, Wen; Rui, Pinshu; Yang, Qun; Zhao, Yan; Zhang, Ziyun

    2015-04-01

    We study the probabilistic cloning of three nonorthogonal states with equal success probabilities. For simplicity, we assume that the three states belong to a special set. Analytical form of the maximal success probability for probabilistic cloning is calculated. With the maximal success probability, we deduce the explicit form of probabilistic quantum cloning machine. In the case of cloning, we get the unambiguous form of the unitary operation. It is demonstrated that the upper bound for probabilistic quantum cloning machine in (Qiu in J Phys A 35:6931, 2002) can be reached only if the three states are equidistant.

  14. Solubility Database

    National Institute of Standards and Technology Data Gateway

    SRD 106 IUPAC-NIST Solubility Database (Web, free access)   These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.

  15. Probabilistic Planning with Imperfect Sensing Actions Using Hybrid Probabilistic Logic Programs

    NASA Astrophysics Data System (ADS)

    Saad, Emad

    Effective planning in uncertain environment is important to agents and multi-agents systems. In this paper, we introduce a new logic based approach to probabilistic contingent planning (probabilistic planning with imperfect sensing actions), by relating probabilistic contingent planning to normal hybrid probabilistic logic programs with probabilistic answer set semantics [24]. We show that any probabilistic contingent planning problem can be encoded as a normal hybrid probabilistic logic program. We formally prove the correctness of our approach. Moreover, we show that the complexity of finding a probabilistic contingent plan in our approach is NP-complete. In addition, we show that any probabilistic contingent planning problem, \\cal PP, can be encoded as a classical normal logic program with answer set semantics, whose answer sets corresponds to valid trajectories in \\cal PP. We show that probabilistic contingent planning problems can be encoded as SAT problems. We present a new high level probabilistic action description language that allows the representation of sensing actions with probabilistic outcomes.

  16. The EMBL nucleotide sequence database.

    PubMed Central

    Stoesser, G; Moseley, M A; Sleep, J; McGowran, M; Garcia-Pastor, M; Sterk, P

    1998-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl. html ) constitutes Europe's primary nucleotide sequence resource. DNA and RNA sequences are directly submitted from researchers and genome sequencing groups and collected from the scientific literature and patent applications (Fig. 1). In collaboration with DDBJ and GenBank the database is produced, maintained and distributed at the European Bioinformatics Institute. Database releases are produced quarterly and are distributed on CD-ROM. EBI's network services allow access to the most up-to-date data collection via Internet and World Wide Web interface, providing database searching and sequence similarity facilities plus access to a large number of additional databases. PMID:9399791

  17. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    ERIC Educational Resources Information Center

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  18. Probabilistic Simulation for Nanocomposite Fracture

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    2010-01-01

    A unique probabilistic theory is described to predict the uniaxial strengths and fracture properties of nanocomposites. The simulation is based on composite micromechanics with progressive substructuring down to a nanoscale slice of a nanofiber where all the governing equations are formulated. These equations have been programmed in a computer code. That computer code is used to simulate uniaxial strengths and fracture of a nanofiber laminate. The results are presented graphically and discussed with respect to their practical significance. These results show smooth distributions from low probability to high.

  19. Probabilistic risk assessment: Number 219

    SciTech Connect

    Bari, R.A.

    1985-11-13

    This report describes a methodology for analyzing the safety of nuclear power plants. A historical overview of plants in the US is provided, and past, present, and future nuclear safety and risk assessment are discussed. A primer on nuclear power plants is provided with a discussion of pressurized water reactors (PWR) and boiling water reactors (BWR) and their operation and containment. Probabilistic Risk Assessment (PRA), utilizing both event-tree and fault-tree analysis, is discussed as a tool in reactor safety, decision making, and communications. (FI)

  20. LQTS gene LOVD database.

    PubMed

    Zhang, Tao; Moss, Arthur; Cong, Peikuan; Pan, Min; Chang, Bingxi; Zheng, Liangrong; Fang, Quan; Zareba, Wojciech; Robinson, Jennifer; Lin, Changsong; Li, Zhongxiang; Wei, Junfang; Zeng, Qiang; Qi, Ming

    2010-11-01

    The Long QT Syndrome (LQTS) is a group of genetically heterogeneous disorders that predisposes young individuals to ventricular arrhythmias and sudden death. LQTS is mainly caused by mutations in genes encoding subunits of cardiac ion channels (KCNQ1, KCNH2,SCN5A, KCNE1, and KCNE2). Many other genes involved in LQTS have been described recently(KCNJ2, AKAP9, ANK2, CACNA1C, SCNA4B, SNTA1, and CAV3). We created an online database(http://www.genomed.org/LOVD/introduction.html) that provides information on variants in LQTS-associated genes. As of February 2010, the database contains 1738 unique variants in 12 genes. A total of 950 variants are considered pathogenic, 265 are possible pathogenic, 131 are unknown/unclassified, and 292 have no known pathogenicity. In addition to these mutations collected from published literature, we also submitted information on gene variants, including one possible novel pathogenic mutation in the KCNH2 splice site found in ten Chinese families with documented arrhythmias. The remote user is able to search the data and is encouraged to submit new mutations into the database. The LQTS database will become a powerful tool for both researchers and clinicians. PMID:20809527

  1. LQTS gene LOVD database.

    PubMed

    Zhang, Tao; Moss, Arthur; Cong, Peikuan; Pan, Min; Chang, Bingxi; Zheng, Liangrong; Fang, Quan; Zareba, Wojciech; Robinson, Jennifer; Lin, Changsong; Li, Zhongxiang; Wei, Junfang; Zeng, Qiang; Qi, Ming

    2010-11-01

    The Long QT Syndrome (LQTS) is a group of genetically heterogeneous disorders that predisposes young individuals to ventricular arrhythmias and sudden death. LQTS is mainly caused by mutations in genes encoding subunits of cardiac ion channels (KCNQ1, KCNH2,SCN5A, KCNE1, and KCNE2). Many other genes involved in LQTS have been described recently(KCNJ2, AKAP9, ANK2, CACNA1C, SCNA4B, SNTA1, and CAV3). We created an online database(http://www.genomed.org/LOVD/introduction.html) that provides information on variants in LQTS-associated genes. As of February 2010, the database contains 1738 unique variants in 12 genes. A total of 950 variants are considered pathogenic, 265 are possible pathogenic, 131 are unknown/unclassified, and 292 have no known pathogenicity. In addition to these mutations collected from published literature, we also submitted information on gene variants, including one possible novel pathogenic mutation in the KCNH2 splice site found in ten Chinese families with documented arrhythmias. The remote user is able to search the data and is encouraged to submit new mutations into the database. The LQTS database will become a powerful tool for both researchers and clinicians.

  2. Semantics for Biological Data Resource: Cell Image Database

    National Institute of Standards and Technology Data Gateway

    SRD 165 NIST Semantics for Biological Data Resource: Cell Image Database (Web, free access)   This Database is a prototype to test concepts for semantic searching of cell image data based on experimental details.

  3. Aggregated Interdisciplinary Databases and the Needs of Undergraduate Researchers

    ERIC Educational Resources Information Center

    Fister, Barbara; Gilbert, Julie; Fry, Amy Ray

    2008-01-01

    After seeing growing frustration among inexperienced undergraduate researchers searching a popular aggregated interdisciplinary database, the authors questioned whether the leading interdisciplinary databases are serving undergraduates' needs. As a preliminary exploration of this question, the authors queried vendors, analyzed their marketing…

  4. High Resolution Soil Water from Regional Databases and Satellite Images

    NASA Technical Reports Server (NTRS)

    Morris, Robin D.; Smelyanskly, Vadim N.; Coughlin, Joseph; Dungan, Jennifer; Clancy, Daniel (Technical Monitor)

    2002-01-01

    This viewgraph presentation provides information on the ways in which plant growth can be inferred from satellite data and can then be used to infer soil water. There are several steps in this process, the first of which is the acquisition of data from satellite observations and relevant information databases such as the State Soil Geographic Database (STATSGO). Then probabilistic analysis and inversion with the Bayes' theorem reveals sources of uncertainty. The Markov chain Monte Carlo method is also used.

  5. Software for Probabilistic Risk Reduction

    NASA Technical Reports Server (NTRS)

    Hensley, Scott; Michel, Thierry; Madsen, Soren; Chapin, Elaine; Rodriguez, Ernesto

    2004-01-01

    A computer program implements a methodology, denoted probabilistic risk reduction, that is intended to aid in planning the development of complex software and/or hardware systems. This methodology integrates two complementary prior methodologies: (1) that of probabilistic risk assessment and (2) a risk-based planning methodology, implemented in a prior computer program known as Defect Detection and Prevention (DDP), in which multiple requirements and the beneficial effects of risk-mitigation actions are taken into account. The present methodology and the software are able to accommodate both process knowledge (notably of the efficacy of development practices) and product knowledge (notably of the logical structure of a system, the development of which one seeks to plan). Estimates of the costs and benefits of a planned development can be derived. Functional and non-functional aspects of software can be taken into account, and trades made among them. It becomes possible to optimize the planning process in the sense that it becomes possible to select the best suite of process steps and design choices to maximize the expectation of success while remaining within budget.

  6. Probabilistic analysis of tsunami hazards

    USGS Publications Warehouse

    Geist, E.L.; Parsons, T.

    2006-01-01

    Determining the likelihood of a disaster is a key component of any comprehensive hazard assessment. This is particularly true for tsunamis, even though most tsunami hazard assessments have in the past relied on scenario or deterministic type models. We discuss probabilistic tsunami hazard analysis (PTHA) from the standpoint of integrating computational methods with empirical analysis of past tsunami runup. PTHA is derived from probabilistic seismic hazard analysis (PSHA), with the main difference being that PTHA must account for far-field sources. The computational methods rely on numerical tsunami propagation models rather than empirical attenuation relationships as in PSHA in determining ground motions. Because a number of source parameters affect local tsunami runup height, PTHA can become complex and computationally intensive. Empirical analysis can function in one of two ways, depending on the length and completeness of the tsunami catalog. For site-specific studies where there is sufficient tsunami runup data available, hazard curves can primarily be derived from empirical analysis, with computational methods used to highlight deficiencies in the tsunami catalog. For region-wide analyses and sites where there are little to no tsunami data, a computationally based method such as Monte Carlo simulation is the primary method to establish tsunami hazards. Two case studies that describe how computational and empirical methods can be integrated are presented for Acapulco, Mexico (site-specific) and the U.S. Pacific Northwest coastline (region-wide analysis).

  7. Is Probabilistic Evidence a Source of Knowledge?

    ERIC Educational Resources Information Center

    Friedman, Ori; Turri, John

    2015-01-01

    We report a series of experiments examining whether people ascribe knowledge for true beliefs based on probabilistic evidence. Participants were less likely to ascribe knowledge for beliefs based on probabilistic evidence than for beliefs based on perceptual evidence (Experiments 1 and 2A) or testimony providing causal information (Experiment 2B).…

  8. Probabilistic Cue Combination: Less Is More

    ERIC Educational Resources Information Center

    Yurovsky, Daniel; Boyer, Ty W.; Smith, Linda B.; Yu, Chen

    2013-01-01

    Learning about the structure of the world requires learning probabilistic relationships: rules in which cues do not predict outcomes with certainty. However, in some cases, the ability to track probabilistic relationships is a handicap, leading adults to perform non-normatively in prediction tasks. For example, in the "dilution effect,"…

  9. Error Discounting in Probabilistic Category Learning

    ERIC Educational Resources Information Center

    Craig, Stewart; Lewandowsky, Stephan; Little, Daniel R.

    2011-01-01

    The assumption in some current theories of probabilistic categorization is that people gradually attenuate their learning in response to unavoidable error. However, existing evidence for this error discounting is sparse and open to alternative interpretations. We report 2 probabilistic-categorization experiments in which we investigated error…

  10. Drinking Water Treatability Database (Database)

    EPA Science Inventory

    The drinking Water Treatability Database (TDB) will provide data taken from the literature on the control of contaminants in drinking water, and will be housed on an interactive, publicly-available USEPA web site. It can be used for identifying effective treatment processes, rec...

  11. Artificial Intelligence Databases: A Survey and Comparison.

    ERIC Educational Resources Information Center

    Stern, David

    1990-01-01

    Identifies and describes online databases containing references to materials on artificial intelligence, robotics, and expert systems, and compares them in terms of scope and usage. Recommendations for conducting online searches on artificial intelligence and related fields are offered. (CLB)

  12. Using Quick Search.

    ERIC Educational Resources Information Center

    Maxfield, Sandy, Ed.; Kabus, Karl, Ed.

    This document is a guide to the use of Quick Search, a library service that provides access to more than 100 databases which contain references to journal articles and other research materials through two commercial systems--BRS After/Dark and DIALOG's Knowledge Index. The guide is divided into five sections: (1) Using Quick Search; (2) The…

  13. Is probabilistic evidence a source of knowledge?

    PubMed

    Friedman, Ori; Turri, John

    2015-07-01

    We report a series of experiments examining whether people ascribe knowledge for true beliefs based on probabilistic evidence. Participants were less likely to ascribe knowledge for beliefs based on probabilistic evidence than for beliefs based on perceptual evidence (Experiments 1 and 2A) or testimony providing causal information (Experiment 2B). Denial of knowledge for beliefs based on probabilistic evidence did not arise because participants viewed such beliefs as unjustified, nor because such beliefs leave open the possibility of error. These findings rule out traditional philosophical accounts for why probabilistic evidence does not produce knowledge. The experiments instead suggest that people deny knowledge because they distrust drawing conclusions about an individual based on reasoning about the population to which it belongs, a tendency previously identified by "judgment and decision making" researchers. Consistent with this, participants were more willing to ascribe knowledge for beliefs based on probabilistic evidence that is specific to a particular case (Experiments 3A and 3B).

  14. A World Wide Web (WWW) server database engine for an organelle database, MitoDat.

    PubMed

    Lemkin, P F; Chipperfield, M; Merril, C; Zullo, S

    1996-03-01

    We describe a simple database search engine "dbEngine" which may be used to quickly create a searchable database on a World Wide Web (WWW) server. Data may be prepared from spreadsheet programs (such as Excel, etc.) or from tables exported from relationship database systems. This Common Gateway Interface (CGI-BIN) program is used with a WWW server such as available commercially, or from National Center for Supercomputer Algorithms (NCSA) or CERN. Its capabilities include: (i) searching records by combinations of terms connected with ANDs or ORs; (ii) returning search results as hypertext links to other WWW database servers; (iii) mapping lists of literature reference identifiers to the full references; (iv) creating bidirectional hypertext links between pictures and the database. DbEngine has been used to support the MitoDat database (Mendelian and non-Mendelian inheritance associated with the Mitochondrion) on the WWW.

  15. The YH database: the first Asian diploid genome database.

    PubMed

    Li, Guoqing; Ma, Lijia; Song, Chao; Yang, Zhentao; Wang, Xiulan; Huang, Hui; Li, Yingrui; Li, Ruiqiang; Zhang, Xiuqing; Yang, Huanming; Wang, Jian; Wang, Jun

    2009-01-01

    The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.

  16. Databases Online and on CD-ROM: How Do They Differ, Let Us Count the Ways.

    ERIC Educational Resources Information Center

    Nahl-Jakobovits, Diane; Tenopir, Carol

    1992-01-01

    Discussion of the use of CD-ROM databases for end-user searching versus online database searching in academic libraries focuses on a comparison of two databases, "Psychological Abstracts" and "Sociological Abstracts," that were searched in both formats. Factors of response time, coverage, content, and cost were investigated for each version. (12…

  17. Zero Result Searches. . . How to Minimize Them.

    ERIC Educational Resources Information Center

    Atkinson, Steve

    1986-01-01

    Based on manual observation of 187 zero result online searches at a university library, this article addresses three types of problems that can produce such search results: multiple database searching, topic negotiation, and database availability. A summary of conceptual and practical recommendations for searchers are provided. (6 references) (EJS)

  18. The Weaknesses of Full-Text Searching

    ERIC Educational Resources Information Center

    Beall, Jeffrey

    2008-01-01

    This paper provides a theoretical critique of the deficiencies of full-text searching in academic library databases. Because full-text searching relies on matching words in a search query with words in online resources, it is an inefficient method of finding information in a database. This matching fails to retrieve synonyms, and it also retrieves…

  19. Hierarchies of Indices for Text Searching.

    ERIC Educational Resources Information Center

    Baeza-Yates, Ricardo; And Others

    1996-01-01

    Discusses indexes for text databases and presents an efficient implementation of an index for text searching called PAT array, or suffix array, where the database is stored on secondary storage devices such as magnetic or optical disks. Additional hierarchical index structures and searching algorithms are proposed that improve searching time, and…

  20. A Probabilistic Model for Reducing Medication Errors

    PubMed Central

    Nguyen, Phung Anh; Syed-Abdul, Shabbir; Iqbal, Usman; Hsu, Min-Huei; Huang, Chen-Ling; Li, Hsien-Chang; Clinciu, Daniel Livius; Jian, Wen-Shan; Li, Yu-Chuan Jack

    2013-01-01

    Background Medication errors are common, life threatening, costly but preventable. Information technology and automated systems are highly efficient for preventing medication errors and therefore widely employed in hospital settings. The aim of this study was to construct a probabilistic model that can reduce medication errors by identifying uncommon or rare associations between medications and diseases. Methods and Finding(s) Association rules of mining techniques are utilized for 103.5 million prescriptions from Taiwan’s National Health Insurance database. The dataset included 204.5 million diagnoses with ICD9-CM codes and 347.7 million medications by using ATC codes. Disease-Medication (DM) and Medication-Medication (MM) associations were computed by their co-occurrence and associations’ strength were measured by the interestingness or lift values which were being referred as Q values. The DMQs and MMQs were used to develop the AOP model to predict the appropriateness of a given prescription. Validation of this model was done by comparing the results of evaluation performed by the AOP model and verified by human experts. The results showed 96% accuracy for appropriate and 45% accuracy for inappropriate prescriptions, with a sensitivity and specificity of 75.9% and 89.5%, respectively. Conclusions We successfully developed the AOP model as an efficient tool for automatic identification of uncommon or rare associations between disease-medication and medication-medication in prescriptions. The AOP model helps to reduce medication errors by alerting physicians, improving the patients’ safety and the overall quality of care. PMID:24312659

  1. Windows on the brain: the emerging role of atlases and databases in neuroscience

    NASA Technical Reports Server (NTRS)

    Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

    2002-01-01

    Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.

  2. Electronic Reference Library: Silverplatter's Database Networking Solution.

    ERIC Educational Resources Information Center

    Millea, Megan

    Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…

  3. Six Online Periodical Databases: A Librarian's View.

    ERIC Educational Resources Information Center

    Willems, Harry

    1999-01-01

    Compares the following World Wide Web-based periodical databases, focusing on their usefulness in K-12 school libraries: EBSCO, Electric Library, Facts on File, SIRS, Wilson, and UMI. Search interfaces, display options, help screens, printing, home access, copyright restrictions, database administration, and making a decision are discussed. A…

  4. Web Database Development: Implications for Academic Publishing.

    ERIC Educational Resources Information Center

    Fernekes, Bob

    This paper discusses the preliminary planning, design, and development of a pilot project to create an Internet accessible database and search tool for locating and distributing company data and scholarly work. Team members established four project objectives: (1) to develop a Web accessible database and decision tool that creates Web pages on the…

  5. RLIN Special Databases: Serving the Humanist.

    ERIC Educational Resources Information Center

    Muratori, Fred

    1990-01-01

    Describes online databases available through the Research Libraries Information Network (RLIN) that focus on research in the humanities. A survey of five databases (AVERY, ESTC, RIPD, SCIPIO, and AMC) is presented that includes contents, search capabilities, and record formats; and future plans are discussed. (12 references) (LRW)

  6. Probabilistic cloning of equidistant states

    SciTech Connect

    Jimenez, O.; Roa, Luis; Delgado, A.

    2010-08-15

    We study the probabilistic cloning of equidistant states. These states are such that the inner product between them is a complex constant or its conjugate. Thereby, it is possible to study their cloning in a simple way. In particular, we are interested in the behavior of the cloning probability as a function of the phase of the overlap among the involved states. We show that for certain families of equidistant states Duan and Guo's cloning machine leads to cloning probabilities lower than the optimal unambiguous discrimination probability of equidistant states. We propose an alternative cloning machine whose cloning probability is higher than or equal to the optimal unambiguous discrimination probability for any family of equidistant states. Both machines achieve the same probability for equidistant states whose inner product is a positive real number.

  7. Probabilistic Fatigue Damage Program (FATIG)

    NASA Technical Reports Server (NTRS)

    Michalopoulos, Constantine

    2012-01-01

    FATIG computes fatigue damage/fatigue life using the stress rms (root mean square) value, the total number of cycles, and S-N curve parameters. The damage is computed by the following methods: (a) traditional method using Miner s rule with stress cycles determined from a Rayleigh distribution up to 3*sigma; and (b) classical fatigue damage formula involving the Gamma function, which is derived from the integral version of Miner's rule. The integration is carried out over all stress amplitudes. This software solves the problem of probabilistic fatigue damage using the integral form of the Palmgren-Miner rule. The software computes fatigue life using an approach involving all stress amplitudes, up to N*sigma, as specified by the user. It can be used in the design of structural components subjected to random dynamic loading, or by any stress analyst with minimal training for fatigue life estimates of structural components.

  8. Introducing a probabilistic Budyko framework

    NASA Astrophysics Data System (ADS)

    Greve, P.; Gudmundsson, L.; Orlowsky, B.; Seneviratne, S. I.

    2015-04-01

    Water availability is of importance for a wide range of ecological, climatological, and socioeconomic applications. Over land, the partitioning of precipitation into evapotranspiration and runoff essentially determines the availability of water. At mean annual catchment scales, the widely used Budyko framework provides a simple, deterministic, first-order relationship to estimate this partitioning as a function of the prevailing climatic conditions. Here we extend the framework by introducing a method to specify probabilistic estimates of water availability that account for the nonlinearity of the underlying phase space. The new framework allows to evaluate the predictability of water availability that is related to varying catchment characteristics and conditional on the underlying climatic conditions. Corresponding results support the practical experience of low predictability of river runoff in transitional climates.

  9. Spectroscopic data for an astronomy database

    NASA Technical Reports Server (NTRS)

    Parkinson, W. H.; Smith, Peter L.

    1995-01-01

    Very few of the atomic and molecular data used in analyses of astronomical spectra are currently available in World Wide Web (WWW) databases that are searchable with hypertext browsers. We have begun to rectify this situation by making extensive atomic data files available with simple search procedures. We have also established links to other on-line atomic and molecular databases. All can be accessed from our database homepage with URL: http:// cfa-www.harvard.edu/ amp/ data/ amdata.html.

  10. The PIR-International Protein Sequence Database.

    PubMed

    Barker, W C; Garavelli, J S; McGarvey, P B; Marzec, C R; Orcutt, B C; Srinivasarao, G Y; Yeh, L S; Ledley, R S; Mewes, H W; Pfeiffer, F; Tsugita, A; Wu, C

    1999-01-01

    The Protein Information Resource (PIR; http://www-nbrf.georgetown. edu/pir/) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-organized and freely available protein sequence database. Since 1988 the database has been maintained collaboratively by PIR-International, an international association of data collection centers cooperating to develop this resource during a period of explosive growth in new sequence data and new computer technologies. The PIR Protein Sequence Database entries are classified into superfamilies, families and homology domains, for which sequence alignments are available. Full-scale family classification supports comparative genomics research, aids sequence annotation, assists database organization and improves database integrity. The PIR WWW server supports direct on-line sequence similarity searches, information retrieval, and knowledge discovery by providing the Protein Sequence Database and other supplementary databases. Sequence entries are extensively cross-referenced and hypertext-linked to major nucleic acid, literature, genome, structure, sequence alignment and family databases. The weekly release of the Protein Sequence Database can be accessed through the PIR Web site. The quarterly release of the database is freely available from our anonymous FTP server and is also available on CD-ROM with the accompanying ATLAS database search program.

  11. A database/knowledge structure for a robotics vision system

    NASA Technical Reports Server (NTRS)

    Dearholt, D. W.; Gonzales, N. N.

    1987-01-01

    Desirable properties of robotics vision database systems are given, and structures which possess properties appropriate for some aspects of such database systems are examined. Included in the structures discussed is a family of networks in which link membership is determined by measures of proximity between pairs of the entities stored in the database. This type of network is shown to have properties which guarantee that the search for a matching feature vector is monotonic. That is, the database can be searched with no backtracking, if there is a feature vector in the database which matches the feature vector of the external entity which is to be identified. The construction of the database is discussed, and the search procedure is presented. A section on the support provided by the database for description of the decision-making processes and the search path is also included.

  12. Development of probabilistic multimedia multipathway computer codes.

    SciTech Connect

    Yu, C.; LePoire, D.; Gnanapragasam, E.; Arnish, J.; Kamboj, S.; Biwer, B. M.; Cheng, J.-J.; Zielen, A. J.; Chen, S. Y.; Mo, T.; Abu-Eid, R.; Thaggard, M.; Sallo, A., III.; Peterson, H., Jr.; Williams, W. A.; Environmental Assessment; NRC; EM

    2002-01-01

    The deterministic multimedia dose/risk assessment codes RESRAD and RESRAD-BUILD have been widely used for many years for evaluation of sites contaminated with residual radioactive materials. The RESRAD code applies to the cleanup of sites (soils) and the RESRAD-BUILD code applies to the cleanup of buildings and structures. This work describes the procedure used to enhance the deterministic RESRAD and RESRAD-BUILD codes for probabilistic dose analysis. A six-step procedure was used in developing default parameter distributions and the probabilistic analysis modules. These six steps include (1) listing and categorizing parameters; (2) ranking parameters; (3) developing parameter distributions; (4) testing parameter distributions for probabilistic analysis; (5) developing probabilistic software modules; and (6) testing probabilistic modules and integrated codes. The procedures used can be applied to the development of other multimedia probabilistic codes. The probabilistic versions of RESRAD and RESRAD-BUILD codes provide tools for studying the uncertainty in dose assessment caused by uncertain input parameters. The parameter distribution data collected in this work can also be applied to other multimedia assessment tasks and multimedia computer codes.

  13. Search Engines for Tomorrow's Scholars

    ERIC Educational Resources Information Center

    Fagan, Jody Condit

    2011-01-01

    Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…

  14. (Meta)Search like Google

    ERIC Educational Resources Information Center

    Rochkind, Jonathan

    2007-01-01

    The ability to search and receive results in more than one database through a single interface--or metasearch--is something many users want. Google Scholar--the search engine of specifically scholarly content--and library metasearch products like Ex Libris's MetaLib, Serials Solution's Central Search, WebFeat, and products based on MuseGlobal used…

  15. Analytical Methods for Online Searching.

    ERIC Educational Resources Information Center

    Vigil, Peter J.

    1983-01-01

    Analytical methods for facilitating comparison of multiple sets during online searching are illustrated by description of specific searching methods that eliminate duplicate citations and a factoring procedure based on syntactic relationships that establishes ranked sets. Searches executed in National Center for Mental Health database on…

  16. Curcumin Resource Database

    PubMed Central

    Kumar, Anil; Chetia, Hasnahana; Sharma, Swagata; Kabiraj, Debajyoti; Talukdar, Narayan Chandra; Bora, Utpal

    2015-01-01

    Curcumin is one of the most intensively studied diarylheptanoid, Curcuma longa being its principal producer. This apart, a class of promising curcumin analogs has been generated in laboratories, aptly named as Curcuminoids which are showing huge potential in the fields of medicine, food technology, etc. The lack of a universal source of data on curcumin as well as curcuminoids has been felt by the curcumin research community for long. Hence, in an attempt to address this stumbling block, we have developed Curcumin Resource Database (CRDB) that aims to perform as a gateway-cum-repository to access all relevant data and related information on curcumin and its analogs. Currently, this database encompasses 1186 curcumin analogs, 195 molecular targets, 9075 peer reviewed publications, 489 patents and 176 varieties of C. longa obtained by extensive data mining and careful curation from numerous sources. Each data entry is identified by a unique CRDB ID (identifier). Furnished with a user-friendly web interface and in-built search engine, CRDB provides well-curated and cross-referenced information that are hyperlinked with external sources. CRDB is expected to be highly useful to the researchers working on structure as well as ligand-based molecular design of curcumin analogs. Database URL: http://www.crdb.in PMID:26220923

  17. Probabilistic population projections with migration uncertainty

    PubMed Central

    Azose, Jonathan J.; Ševčíková, Hana; Raftery, Adrian E.

    2016-01-01

    We produce probabilistic projections of population for all countries based on probabilistic projections of fertility, mortality, and migration. We compare our projections to those from the United Nations’ Probabilistic Population Projections, which uses similar methods for fertility and mortality but deterministic migration projections. We find that uncertainty in migration projection is a substantial contributor to uncertainty in population projections for many countries. Prediction intervals for the populations of Northern America and Europe are over 70% wider, whereas prediction intervals for the populations of Africa, Asia, and the world as a whole are nearly unchanged. Out-of-sample validation shows that the model is reasonably well calibrated. PMID:27217571

  18. Probabilistic cognition in two indigenous Mayan groups.

    PubMed

    Fontanari, Laura; Gonzalez, Michel; Vallortigara, Giorgio; Girotto, Vittorio

    2014-12-01

    Is there a sense of chance shared by all individuals, regardless of their schooling or culture? To test whether the ability to make correct probabilistic evaluations depends on educational and cultural guidance, we investigated probabilistic cognition in preliterate and prenumerate Kaqchikel and K'iche', two indigenous Mayan groups, living in remote areas of Guatemala. Although the tested individuals had no formal education, they performed correctly in tasks in which they had to consider prior and posterior information, proportions and combinations of possibilities. Their performance was indistinguishable from that of Mayan school children and Western controls. Our results provide evidence for the universal nature of probabilistic cognition.

  19. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  20. Probabilistic machine learning and artificial intelligence.

    PubMed

    Ghahramani, Zoubin

    2015-05-28

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery. PMID:26017444

  1. Probabilistic machine learning and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Ghahramani, Zoubin

    2015-05-01

    How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.

  2. Probabilistic cognition in two indigenous Mayan groups

    PubMed Central

    Fontanari, Laura; Gonzalez, Michel; Vallortigara, Giorgio; Girotto, Vittorio

    2014-01-01

    Is there a sense of chance shared by all individuals, regardless of their schooling or culture? To test whether the ability to make correct probabilistic evaluations depends on educational and cultural guidance, we investigated probabilistic cognition in preliterate and prenumerate Kaqchikel and K’iche’, two indigenous Mayan groups, living in remote areas of Guatemala. Although the tested individuals had no formal education, they performed correctly in tasks in which they had to consider prior and posterior information, proportions and combinations of possibilities. Their performance was indistinguishable from that of Mayan school children and Western controls. Our results provide evidence for the universal nature of probabilistic cognition. PMID:25368160

  3. Models And Results Database System.

    2001-03-27

    Version 00 MAR-D 4.16 is a program that is used primarily for Probabilistic Risk Assessment (PRA) data loading. This program defines a common relational database structure that is used by other PRA programs. This structure allows all of the software to access and manipulate data created by other software in the system without performing a lengthy conversion. The MAR-D program also provides the facilities for loading and unloading of PRA data from the relational databasemore » structure used to store the data to an ASCII format for interchange with other PRA software. The primary function of MAR-D is to create a data repository for NUREG-1150 and other permanent data by providing input, conversion, and output capabilities for data used by IRRAS, SARA, SETS and FRANTIC.« less

  4. ELNET--The Electronic Library Database System.

    ERIC Educational Resources Information Center

    King, Shirley V.

    1991-01-01

    ELNET (Electronic Library Network), a Japanese language database, allows searching of index terms and free text terms from articles and stores the full text of the articles on an optical disc system. Users can order fax copies of the text from the optical disc. This article also explains online searching and discusses machine translation. (LRW)

  5. Probabilistic and Non-probabilistic Synthetic Reliability Model for Space Structures

    NASA Astrophysics Data System (ADS)

    Hong, Dongpao; Hu, Xiao; Zhang, Jing

    2016-07-01

    As an alternative to reliability analysis, the non-probabilistic model is an effective supplement when the interval information exists. We describe the uncertain parameters of the structures with interval variables, and establish a non-probabilistic reliability model of structures. Then, we analyze the relation between the typical interference mode and the reliability according to the structure stress-strength interference model, and propose a new measure of structure non-probabilistic reliability. Furthermore we describe other uncertain parameters with random variables when probabilistic information also exists. For the complex structures including both random variables and interval variables, we propose a probabilistic and non-probabilistic synthetic reliability model. The illustrative example shows that the presented model is feasible for structure reliability analysis and design.

  6. The RIKEN integrated database of mammals

    PubMed Central

    Masuya, Hiroshi; Makita, Yuko; Kobayashi, Norio; Nishikata, Koro; Yoshida, Yuko; Mochizuki, Yoshiki; Doi, Koji; Takatsuki, Terue; Waki, Kazunori; Tanaka, Nobuhiko; Ishii, Manabu; Matsushima, Akihiro; Takahashi, Satoshi; Hijikata, Atsushi; Kozaki, Kouji; Furuichi, Teiichi; Kawaji, Hideya; Wakana, Shigeharu; Nakamura, Yukio; Yoshiki, Atsushi; Murata, Takehide; Fukami-Kobayashi, Kaoru; Mohan, Sujatha; Ohara, Osamu; Hayashizaki, Yoshihide; Mizoguchi, Riichiro; Obata, Yuichi; Toyoda, Tetsuro

    2011-01-01

    The RIKEN integrated database of mammals (http://scinets.org/db/mammal) is the official undertaking to integrate its mammalian databases produced from multiple large-scale programs that have been promoted by the institute. The database integrates not only RIKEN’s original databases, such as FANTOM, the ENU mutagenesis program, the RIKEN Cerebellar Development Transcriptome Database and the Bioresource Database, but also imported data from public databases, such as Ensembl, MGI and biomedical ontologies. Our integrated database has been implemented on the infrastructure of publication medium for databases, termed SciNetS/SciNeS, or the Scientists’ Networking System, where the data and metadata are structured as a semantic web and are downloadable in various standardized formats. The top-level ontology-based implementation of mammal-related data directly integrates the representative knowledge and individual data records in existing databases to ensure advanced cross-database searches and reduced unevenness of the data management operations. Through the development of this database, we propose a novel methodology for the development of standardized comprehensive management of heterogeneous data sets in multiple databases to improve the sustainability, accessibility, utility and publicity of the data of biomedical information. PMID:21076152

  7. The RIKEN integrated database of mammals.

    PubMed

    Masuya, Hiroshi; Makita, Yuko; Kobayashi, Norio; Nishikata, Koro; Yoshida, Yuko; Mochizuki, Yoshiki; Doi, Koji; Takatsuki, Terue; Waki, Kazunori; Tanaka, Nobuhiko; Ishii, Manabu; Matsushima, Akihiro; Takahashi, Satoshi; Hijikata, Atsushi; Kozaki, Kouji; Furuichi, Teiichi; Kawaji, Hideya; Wakana, Shigeharu; Nakamura, Yukio; Yoshiki, Atsushi; Murata, Takehide; Fukami-Kobayashi, Kaoru; Mohan, Sujatha; Ohara, Osamu; Hayashizaki, Yoshihide; Mizoguchi, Riichiro; Obata, Yuichi; Toyoda, Tetsuro

    2011-01-01

    The RIKEN integrated database of mammals (http://scinets.org/db/mammal) is the official undertaking to integrate its mammalian databases produced from multiple large-scale programs that have been promoted by the institute. The database integrates not only RIKEN's original databases, such as FANTOM, the ENU mutagenesis program, the RIKEN Cerebellar Development Transcriptome Database and the Bioresource Database, but also imported data from public databases, such as Ensembl, MGI and biomedical ontologies. Our integrated database has been implemented on the infrastructure of publication medium for databases, termed SciNetS/SciNeS, or the Scientists' Networking System, where the data and metadata are structured as a semantic web and are downloadable in various standardized formats. The top-level ontology-based implementation of mammal-related data directly integrates the representative knowledge and individual data records in existing databases to ensure advanced cross-database searches and reduced unevenness of the data management operations. Through the development of this database, we propose a novel methodology for the development of standardized comprehensive management of heterogeneous data sets in multiple databases to improve the sustainability, accessibility, utility and publicity of the data of biomedical information.

  8. Knowledge Abstraction in Chinese Chess Endgame Databases

    NASA Astrophysics Data System (ADS)

    Chen, Bo-Nian; Liu, Pangfeng; Hsu, Shun-Chin; Hsu, Tsan-Sheng

    Retrograde analysis is a well known approach to construct endgame databases. However, the size of the endgame databases are too large to be loaded into the main memory of a computer during tournaments. In this paper, a novel knowledge abstraction strategy is proposed to compress endgame databases. The goal is to obtain succinct knowledge for practical endgames. A specialized goal-oriented search method is described and applied on the important endgame KRKNMM. The method of combining a search algorithm with a small size of knowledge is used to handle endgame positions up to a limited depth, but with a high degree of correctness.

  9. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.

  10. The Eruption Forecasting Information System (EFIS) database project

    NASA Astrophysics Data System (ADS)

    Ogburn, Sarah; Harpel, Chris; Pesicek, Jeremy; Wellik, Jay; Pallister, John; Wright, Heather

    2016-04-01

    The Eruption Forecasting Information System (EFIS) project is a new initiative of the U.S. Geological Survey-USAID Volcano Disaster Assistance Program (VDAP) with the goal of enhancing VDAP's ability to forecast the outcome of volcanic unrest. The EFIS project seeks to: (1) Move away from relying on the collective memory to probability estimation using databases (2) Create databases useful for pattern recognition and for answering common VDAP questions; e.g. how commonly does unrest lead to eruption? how commonly do phreatic eruptions portend magmatic eruptions and what is the range of antecedence times? (3) Create generic probabilistic event trees using global data for different volcano 'types' (4) Create background, volcano-specific, probabilistic event trees for frequently active or particularly hazardous volcanoes in advance of a crisis (5) Quantify and communicate uncertainty in probabilities A major component of the project is the global EFIS relational database, which contains multiple modules designed to aid in the construction of probabilistic event trees and to answer common questions that arise during volcanic crises. The primary module contains chronologies of volcanic unrest, including the timing of phreatic eruptions, column heights, eruptive products, etc. and will be initially populated using chronicles of eruptive activity from Alaskan volcanic eruptions in the GeoDIVA database (Cameron et al. 2013). This database module allows us to query across other global databases such as the WOVOdat database of monitoring data and the Smithsonian Institution's Global Volcanism Program (GVP) database of eruptive histories and volcano information. The EFIS database is in the early stages of development and population; thus, this contribution also serves as a request for feedback from the community.

  11. Stackfile Database

    NASA Technical Reports Server (NTRS)

    deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher

    2013-01-01

    This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.

  12. YCRD: Yeast Combinatorial Regulation Database

    PubMed Central

    Wu, Wei-Sheng; Hsieh, Yen-Chen; Lai, Fu-Jou

    2016-01-01

    In eukaryotes, the precise transcriptional control of gene expression is typically achieved through combinatorial regulation using cooperative transcription factors (TFs). Therefore, a database which provides regulatory associations between cooperative TFs and their target genes is helpful for biologists to study the molecular mechanisms of transcriptional regulation of gene expression. Because there is no such kind of databases in the public domain, this prompts us to construct a database, called Yeast Combinatorial Regulation Database (YCRD), which deposits 434,197 regulatory associations between 2535 cooperative TF pairs and 6243 genes. The comprehensive collection of more than 2500 cooperative TF pairs was retrieved from 17 existing algorithms in the literature. The target genes of a cooperative TF pair (e.g. TF1-TF2) are defined as the common target genes of TF1 and TF2, where a TF’s experimentally validated target genes were downloaded from YEASTRACT database. In YCRD, users can (i) search the target genes of a cooperative TF pair of interest, (ii) search the cooperative TF pairs which regulate a gene of interest and (iii) identify important cooperative TF pairs which regulate a given set of genes. We believe that YCRD will be a valuable resource for yeast biologists to study combinatorial regulation of gene expression. YCRD is available at http://cosbi.ee.ncku.edu.tw/YCRD/ or http://cosbi2.ee.ncku.edu.tw/YCRD/. PMID:27392072

  13. YCRD: Yeast Combinatorial Regulation Database.

    PubMed

    Wu, Wei-Sheng; Hsieh, Yen-Chen; Lai, Fu-Jou

    2016-01-01

    In eukaryotes, the precise transcriptional control of gene expression is typically achieved through combinatorial regulation using cooperative transcription factors (TFs). Therefore, a database which provides regulatory associations between cooperative TFs and their target genes is helpful for biologists to study the molecular mechanisms of transcriptional regulation of gene expression. Because there is no such kind of databases in the public domain, this prompts us to construct a database, called Yeast Combinatorial Regulation Database (YCRD), which deposits 434,197 regulatory associations between 2535 cooperative TF pairs and 6243 genes. The comprehensive collection of more than 2500 cooperative TF pairs was retrieved from 17 existing algorithms in the literature. The target genes of a cooperative TF pair (e.g. TF1-TF2) are defined as the common target genes of TF1 and TF2, where a TF's experimentally validated target genes were downloaded from YEASTRACT database. In YCRD, users can (i) search the target genes of a cooperative TF pair of interest, (ii) search the cooperative TF pairs which regulate a gene of interest and (iii) identify important cooperative TF pairs which regulate a given set of genes. We believe that YCRD will be a valuable resource for yeast biologists to study combinatorial regulation of gene expression. YCRD is available at http://cosbi.ee.ncku.edu.tw/YCRD/ or http://cosbi2.ee.ncku.edu.tw/YCRD/. PMID:27392072

  14. COMMUNICATING PROBABILISTIC RISK OUTCOMES TO RISK MANAGERS

    EPA Science Inventory

    Increasingly, risk assessors are moving away from simple deterministic assessments to probabilistic approaches that explicitly incorporate ecological variability, measurement imprecision, and lack of knowledge (collectively termed "uncertainty"). While the new methods provide an...

  15. Probabilistic micromechanics for high-temperature composites

    NASA Technical Reports Server (NTRS)

    Reddy, J. N.

    1993-01-01

    The three-year program of research had the following technical objectives: the development of probabilistic methods for micromechanics-based constitutive and failure models, application of the probabilistic methodology in the evaluation of various composite materials and simulation of expected uncertainties in unidirectional fiber composite properties, and influence of the uncertainties in composite properties on the structural response. The first year of research was devoted to the development of probabilistic methodology for micromechanics models. The second year of research focused on the evaluation of the Chamis-Hopkins constitutive model and Aboudi constitutive model using the methodology developed in the first year of research. The third year of research was devoted to the development of probabilistic finite element analysis procedures for laminated composite plate and shell structures.

  16. Do probabilistic forecasts lead to better decisions?

    NASA Astrophysics Data System (ADS)

    Ramos, M. H.; van Andel, S. J.; Pappenberger, F.

    2013-06-01

    The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also started focusing attention on ways of communicating the probabilistic forecasts to decision-makers. Communicating probabilistic forecasts includes preparing tools and products for visualisation, but also requires understanding how decision-makers perceive and use uncertainty information in real time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision-makers. Answers were collected and analysed. In this paper, we present the results of this exercise and discuss if we indeed make better decisions on the basis of probabilistic forecasts.

  17. Non-unitary probabilistic quantum computing

    NASA Technical Reports Server (NTRS)

    Gingrich, Robert M.; Williams, Colin P.

    2004-01-01

    We present a method for designing quantum circuits that perform non-unitary quantum computations on n-qubit states probabilistically, and give analytic expressions for the success probability and fidelity.

  18. PROBABILISTIC MODELING FOR ADVANCED HUMAN EXPOSURE ASSESSMENT

    EPA Science Inventory

    Human exposures to environmental pollutants widely vary depending on the emission patterns that result in microenvironmental pollutant concentrations, as well as behavioral factors that determine the extent of an individual's contact with these pollutants. Probabilistic human exp...

  19. Probabilistic Assessment of Radiation Risk for Astronauts in Space Missions

    NASA Technical Reports Server (NTRS)

    Kim, Myung-Hee; DeAngelis, Giovanni; Cucinotta, Francis A.

    2009-01-01

    Accurate predictions of the health risks to astronauts from space radiation exposure are necessary for enabling future lunar and Mars missions. Space radiation consists of solar particle events (SPEs), comprised largely of medium energy protons, (less than 100 MeV); and galactic cosmic rays (GCR), which include protons and heavy ions of higher energies. While the expected frequency of SPEs is strongly influenced by the solar activity cycle, SPE occurrences themselves are random in nature. A solar modulation model has been developed for the temporal characterization of the GCR environment, which is represented by the deceleration potential, phi. The risk of radiation exposure from SPEs during extra-vehicular activities (EVAs) or in lightly shielded vehicles is a major concern for radiation protection, including determining the shielding and operational requirements for astronauts and hardware. To support the probabilistic risk assessment for EVAs, which would be up to 15% of crew time on lunar missions, we estimated the probability of SPE occurrence as a function of time within a solar cycle using a nonhomogeneous Poisson model to fit the historical database of measurements of protons with energy > 30 MeV, (phi)30. The resultant organ doses and dose equivalents, as well as effective whole body doses for acute and cancer risk estimations are analyzed for a conceptual habitat module and a lunar rover during defined space mission periods. This probabilistic approach to radiation risk assessment from SPE and GCR is in support of mission design and operational planning to manage radiation risks for space exploration.

  20. Variable stars in the MACHO Collaboration database

    SciTech Connect

    Cook, K.H.; Alcock, C.; Allsman, R.A.

    1995-02-01

    The MACHO Collaboration`s search for baryonic dark matter via its gravitational microlensing signature has generated a massive database of time ordered photometry of millions Of stars in the LMC and the bulge of the Milky Way. The search`s experimental design and capabilities are reviewed and the dark matter results are briefly noted. Preliminary analysis of the {approximately} 39,000 variable stars discovered in the LMC database is presented and examples of periodic variables are shown. A class of aperiodically variable Be stars is described which is the closest background to microlensing which has been found. Plans for future work on variable stars using the MACHO data are described.

  1. Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos.

    PubMed

    Demirkus, Meltem; Precup, Doina; Clark, James J; Arbel, Tal

    2016-06-01

    Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively.

  2. Hierarchical Spatio-Temporal Probabilistic Graphical Model with Multiple Feature Fusion for Binary Facial Attribute Classification in Real-World Face Videos.

    PubMed

    Demirkus, Meltem; Precup, Doina; Clark, James J; Arbel, Tal

    2016-06-01

    Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively. PMID:26415152

  3. Overview of selected molecular biological databases

    SciTech Connect

    Rayl, K.D.; Gaasterland, T.

    1994-11-01

    This paper presents an overview of the purpose, content, and design of a subset of the currently available biological databases, with an emphasis on protein databases. Databases included in this summary are 3D-ALI, Berlin RNA databank, Blocks, DSSP, EMBL Nucleotide Database, EMP, ENZYME, FSSP, GDB, GenBank, HSSP, LiMB, PDB, PIR, PKCDD, ProSite, and SWISS-PROT. The goal is to provide a starting point for researchers who wish to take advantage of the myriad available databases. Rather than providing a complete explanation of each database, we present its content and form by explaining the details of typical entries. Pointers to more complete ``user guides`` are included, along with general information on where to search for a new database.

  4. The GLIMS Glacier Database

    NASA Astrophysics Data System (ADS)

    Raup, B. H.; Khalsa, S. S.; Armstrong, R.

    2007-12-01

    The Global Land Ice Measurements from Space (GLIMS) project has built a geospatial and temporal database of glacier data, composed of glacier outlines and various scalar attributes. These data are being derived primarily from satellite imagery, such as from ASTER and Landsat. Each "snapshot" of a glacier is from a specific time, and the database is designed to store multiple snapshots representative of different times. We have implemented two web-based interfaces to the database; one enables exploration of the data via interactive maps (web map server), while the other allows searches based on text-field constraints. The web map server is an Open Geospatial Consortium (OGC) compliant Web Map Server (WMS) and Web Feature Server (WFS). This means that other web sites can display glacier layers from our site over the Internet, or retrieve glacier features in vector format. All components of the system are implemented using Open Source software: Linux, PostgreSQL, PostGIS (geospatial extensions to the database), MapServer (WMS and WFS), and several supporting components such as Proj.4 (a geographic projection library) and PHP. These tools are robust and provide a flexible and powerful framework for web mapping applications. As a service to the GLIMS community, the database contains metadata on all ASTER imagery acquired over glacierized terrain. Reduced-resolution of the images (browse imagery) can be viewed either as a layer in the MapServer application, or overlaid on the virtual globe within Google Earth. The interactive map application allows the user to constrain by time what data appear on the map. For example, ASTER or glacier outlines from 2002 only, or from Autumn in any year, can be displayed. The system allows users to download their selected glacier data in a choice of formats. The results of a query based on spatial selection (using a mouse) or text-field constraints can be downloaded in any of these formats: ESRI shapefiles, KML (Google Earth), Map

  5. Open geochemical database

    NASA Astrophysics Data System (ADS)

    Zhilin, Denis; Ilyin, Vladimir; Bashev, Anton

    2010-05-01

    We regard "geochemical data" as data on chemical parameters of the environment, linked with the geographical position of the corresponding point. Boosting development of global positioning system (GPS) and measuring instruments allows fast collecting of huge amounts of geochemical data. Presently they are published in scientific journals in text format, that hampers searching for information about particular places and meta-analysis of the data, collected by different researchers. Part of the information is never published. To make the data available and easy to find, it seems reasonable to elaborate an open database of geochemical information, accessible via Internet. It also seems reasonable to link the data with maps or space images, for example, from GoogleEarth service. For this purpose an open geochemical database is being elaborating (http://maps.sch192.ru). Any user after registration can upload geochemical data (position, type of parameter and value of the parameter) and edit them. Every user (including unregistered) can (a) extract the values of parameters, fulfilling desired conditions and (b) see the points, linked to GoogleEarth space image, colored according to a value of selected parameter. Then he can treat extracted values any way he likes. There are the following data types in the database: authors, points, seasons and parameters. Author is a person, who publishes the data. Every author can declare his own profile. A point is characterized by its geographical position and type of the object (i.e. river, lake etc). Value of parameters are linked to a point, an author and a season, when they were obtained. A user can choose a parameter to place on GoogleEarth space image and a scale to color the points on the image according to the value of a parameter. Currently (December, 2009) the database is under construction, but several functions (uploading data on pH and electrical conductivity and placing colored points onto GoogleEarth space image) are

  6. Probabilistic cloning of three symmetric states

    SciTech Connect

    Jimenez, O.; Bergou, J.; Delgado, A.

    2010-12-15

    We study the probabilistic cloning of three symmetric states. These states are defined by a single complex quantity, the inner product among them. We show that three different probabilistic cloning machines are necessary to optimally clone all possible families of three symmetric states. We also show that the optimal cloning probability of generating M copies out of one original can be cast as the quotient between the success probability of unambiguously discriminating one and M copies of symmetric states.

  7. Probabilistic structural analysis for nuclear thermal propulsion

    NASA Technical Reports Server (NTRS)

    Shah, Ashwin

    1993-01-01

    Viewgraphs of probabilistic structural analysis for nuclear thermal propulsion are presented. The objective of the study was to develop a methodology to certify Space Nuclear Propulsion System (SNPS) Nozzle with assured reliability. Topics covered include: advantage of probabilistic structural analysis; space nuclear propulsion system nozzle uncertainties in the random variables; SNPS nozzle natural frequency; and sensitivity of primitive variable uncertainties SNPS nozzle natural frequency and shell stress.

  8. Bioinformatics: searching the Net.

    PubMed

    Kastin, S; Wexler, J

    1998-04-01

    During the past 30 years, there has been an explosion in the volume of published medical information. As this volume has increased, so has the need for efficient methods for searching the data. MEDLINE, the primary medical database, is currently limited to abstracts of the medical literature. MEDLINE searches use AND/OR/NOT logical searching for keywords that have been assigned to each article and for textwords included in article abstracts. Recently, the complete text of some scientific journals, including figures and tables, has become accessible electronically. Keyword and textword searches can provide an overwhelming number of results. Search engines that use phrase searching, or searches that limit the number of words between two finds, improve the precision of search engines. The development of the Internet as a vehicle for worldwide communication, and the emergence of the World Wide Web (WWW) as a common vehicle for communication have made instantaneous access to much of the entire body of medical information an exciting possibility. There is more than one way to search the WWW for information. At the present time, two broad strategies have emerged for cataloging the WWW: directories and search engines. These allow more efficient searching of the WWW. Directories catalog WWW information by creating categories and subcategories of information and then publishing pointers to information within the category listings. Directories are analogous to yellow pages of the phone book. Search engines make no attempt to categorize information. They automatically scour the WWW looking for words and then automatically create an index of those words. When a specific search engine is used, its index is searched for a particular word. Usually, search engines are nonspecific and produce voluminous results. Use of AND/OR/NOT and "near" and "adjacent" search refinements greatly improve the results of a search. Search engines that limit their scope to specific sites, and

  9. Demystifying the Search Button

    PubMed Central

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  10. Multidimensional analysis and probabilistic model of volcanic and seismic activities

    NASA Astrophysics Data System (ADS)

    Fedorov, V.

    2009-04-01

    .I. Gushchenko, 1979) and seismological (database of USGS/NEIC Significant Worldwide Earthquakes, 2150 B.C.- 1994 A.D.) information which displays dynamics of endogenic relief-forming processes over a period of 1900 to 1994. In the course of the analysis, a substitution of calendar variable by a corresponding astronomical one has been performed and the epoch superposition method was applied. In essence, the method consists in that the massifs of information on volcanic eruptions (over a period of 1900 to 1977) and seismic events (1900-1994) are differentiated with respect to value of astronomical parameters which correspond to the calendar dates of the known eruptions and earthquakes, regardless of the calendar year. The obtained spectra of volcanic eruptions and violent earthquake distribution in the fields of the Earth orbital movement parameters were used as a basis for calculation of frequency spectra and diurnal probability of volcanic and seismic activity. The objective of the proposed investigations is a probabilistic model development of the volcanic and seismic events, as well as GIS designing for monitoring and forecast of volcanic and seismic activities. In accordance with the stated objective, three probability parameters have been found in the course of preliminary studies; they form the basis for GIS-monitoring and forecast development. 1. A multidimensional analysis of volcanic eruption and earthquakes (of magnitude 7) have been performed in terms of the Earth orbital movement. Probability characteristics of volcanism and seismicity have been defined for the Earth as a whole. Time intervals have been identified with a diurnal probability twice as great as the mean value. Diurnal probability of volcanic and seismic events has been calculated up to 2020. 2. A regularity is found in duration of dormant (repose) periods has been established. A relationship has been found between the distribution of the repose period probability density and duration of the period. 3

  11. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  12. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  13. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis.

    PubMed

    Kapp, Eugene A; Schütz, Frédéric; Connolly, Lisa M; Chakel, John A; Meza, Jose E; Miller, Christine A; Fenyo, David; Eng, Jimmy K; Adkins, Joshua N; Omenn, Gilbert S; Simpson, Richard J

    2005-08-01

    MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X!Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, PeptideProphet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X!Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of "consensus scoring", i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

  14. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and Specificity analysis.

    SciTech Connect

    Kapp, Eugene; Schutz, Frederick; Connolly, Lisa M.; Chakel, John A.; Meza, Jose E.; Miller, Christine A.; Fenyo, David; Eng, Jimmy K.; Adkins, Joshua N.; Omenn, Gilbert; Simpson, Richard

    2005-08-01

    MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X-Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, Peptide Prophet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X-Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of ''consensus scoring'', i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

  15. Sex determination using the Probabilistic Sex Diagnosis (DSP: Diagnose Sexuelle Probabiliste) tool in a virtual environment.

    PubMed

    Chapman, Tara; Lefevre, Philippe; Semal, Patrick; Moiseev, Fedor; Sholukha, Victor; Louryan, Stéphane; Rooze, Marcel; Van Sint Jan, Serge

    2014-01-01

    The hip bone is one of the most reliable indicators of sex in the human body due to the fact it is the most dimorphic bone. Probabilistic Sex Diagnosis (DSP: Diagnose Sexuelle Probabiliste) developed by Murail et al., in 2005, is a sex determination method based on a worldwide hip bone metrical database. Sex is determined by comparing specific measurements taken from each specimen using sliding callipers and computing the probability of specimens being female or male. In forensic science it is sometimes not possible to sex a body due to corpse decay or injury. Skeletalization and dissection of a body is a laborious process and desecrates the body. There were two aims to this study. The first aim was to examine the accuracy of the DSP method in comparison with a current visual sexing method on sex determination. A further aim was to see if it was possible to virtually utilise the DSP method on both the hip bone and the pelvic girdle in order to utilise this method for forensic sciences. For the first part of the study, forty-nine dry hip bones of unknown sex were obtained from the Body Donation Programme of the Université Libre de Bruxelles (ULB). A comparison was made between DSP analysis and visual sexing on dry bone by two researchers. CT scans of bones were then analysed to obtain three-dimensional (3D) virtual models and the method of DSP was analysed virtually by importing the models into a customised software programme called lhpFusionBox which was developed at ULB. The software enables DSP distances to be measured via virtually-palpated bony landmarks. There was found to be 100% agreement of sex between the manual and virtual DSP method. The second part of the study aimed to further validate the method by analysing thirty-nine supplementary pelvic girdles of known sex blind. There was found to be a 100% accuracy rate further demonstrating that the virtual DSP method is robust. Statistically significant differences were found in the identification of sex

  16. Subject Retrieval from Full-Text Databases in the Humanities

    ERIC Educational Resources Information Center

    East, John W.

    2007-01-01

    This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…

  17. Databases for Computer Science and Electronics: COMPENDEX, ELCOM, and INSPEC.

    ERIC Educational Resources Information Center

    Marsden, Tom; Laub, Barbara

    1981-01-01

    Describes the selection policies, subject access, search aids, indexing, coverage, and currency of three online databases in the fields of electronics and computer science: COMPENDEX, ELCOM, and INSPEC. Sample searches are displayed for each database. A bibliography cites five references. (FM)

  18. Integrating Variances into an Analytical Database

    NASA Technical Reports Server (NTRS)

    Sanchez, Carlos

    2010-01-01

    For this project, I enrolled in numerous SATERN courses that taught the basics of database programming. These include: Basic Access 2007 Forms, Introduction to Database Systems, Overview of Database Design, and others. My main job was to create an analytical database that can handle many stored forms and make it easy to interpret and organize. Additionally, I helped improve an existing database and populate it with information. These databases were designed to be used with data from Safety Variances and DCR forms. The research consisted of analyzing the database and comparing the data to find out which entries were repeated the most. If an entry happened to be repeated several times in the database, that would mean that the rule or requirement targeted by that variance has been bypassed many times already and so the requirement may not really be needed, but rather should be changed to allow the variance's conditions permanently. This project did not only restrict itself to the design and development of the database system, but also worked on exporting the data from the database to a different format (e.g. Excel or Word) so it could be analyzed in a simpler fashion. Thanks to the change in format, the data was organized in a spreadsheet that made it possible to sort the data by categories or types and helped speed up searches. Once my work with the database was done, the records of variances could be arranged so that they were displayed in numerical order, or one could search for a specific document targeted by the variances and restrict the search to only include variances that modified a specific requirement. A great part that contributed to my learning was SATERN, NASA's resource for education. Thanks to the SATERN online courses I took over the summer, I was able to learn many new things about computers and databases and also go more in depth into topics I already knew about.

  19. De novo protein conformational sampling using a probabilistic graphical model

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Debswapna; Cheng, Jianlin

    2015-11-01

    Efficient exploration of protein conformational space remains challenging especially for large proteins when assembling discretized structural fragments extracted from a protein structure data database. We propose a fragment-free probabilistic graphical model, FUSION, for conformational sampling in continuous space and assess its accuracy using ‘blind’ protein targets with a length up to 250 residues from the CASP11 structure prediction exercise. The method reduces sampling bottlenecks, exhibits strong convergence, and demonstrates better performance than the popular fragment assembly method, ROSETTA, on relatively larger proteins with a length of more than 150 residues in our benchmark set. FUSION is freely available through a web server at http://protein.rnet.missouri.edu/FUSION/.

  20. A probabilistic safety analysis of incidents in nuclear research reactors.

    PubMed

    Lopes, Valdir Maciel; Agostinho Angelo Sordi, Gian Maria; Moralles, Mauricio; Filho, Tufic Madi

    2012-06-01

    This work aims to evaluate the potential risks of incidents in nuclear research reactors. For its development, two databases of the International Atomic Energy Agency (IAEA) were used: the Research Reactor Data Base (RRDB) and the Incident Report System for Research Reactor (IRSRR). For this study, the probabilistic safety analysis (PSA) was used. To obtain the result of the probability calculations for PSA, the theory and equations in the paper IAEA TECDOC-636 were used. A specific program to analyse the probabilities was developed within the main program, Scilab 5.1.1. for two distributions, Fischer and chi-square, both with the confidence level of 90 %. Using Sordi equations, the maximum admissible doses to compare with the risk limits established by the International Commission on Radiological Protection (ICRP) were obtained. All results achieved with this probability analysis led to the conclusion that the incidents which occurred had radiation doses within the stochastic effects reference interval established by the ICRP-64.

  1. Database Marketplace 2002: The Database Universe.

    ERIC Educational Resources Information Center

    Tenopir, Carol; Baker, Gayle; Robinson, William

    2002-01-01

    Reviews the database industry over the past year, including new companies and services, company closures, popular database formats, popular access methods, and changes in existing products and services. Lists 33 firms and their database services; 33 firms and their database products; and 61 company profiles. (LRW)

  2. Symbolic representation of probabilistic worlds.

    PubMed

    Feldman, Jacob

    2012-04-01

    Symbolic representation of environmental variables is a ubiquitous and often debated component of cognitive science. Yet notwithstanding centuries of philosophical discussion, the efficacy, scope, and validity of such representation has rarely been given direct consideration from a mathematical point of view. This paper introduces a quantitative measure of the effectiveness of symbolic representation, and develops formal constraints under which such representation is in fact warranted. The effectiveness of symbolic representation hinges on the probabilistic structure of the environment that is to be represented. For arbitrary probability distributions (i.e., environments), symbolic representation is generally not warranted. But in modal environments, defined here as those that consist of mixtures of component distributions that are narrow ("spiky") relative to their spreads, symbolic representation can be shown to represent the environment with a relatively negligible loss of information. Modal environments support propositional forms, logical relations, and other familiar features of symbolic representation. Hence the assumption that our environment is, in fact, modal is a key tacit assumption underlying the use of symbols in cognitive science. PMID:22270145

  3. Probabilistic computation by neuromine networks.

    PubMed

    Hangartner, R D; Cull, P

    2000-01-01

    In this paper, we address the question, can biologically feasible neural nets compute more than can be computed by deterministic polynomial time algorithms? Since we want to maintain a claim of plausibility and reasonableness we restrict ourselves to algorithmically easy to construct nets and we rule out infinite precision in parameters and in any analog parts of the computation. Our approach is to consider the recent advances in randomized algorithms and see if such randomized computations can be described by neural nets. We start with a pair of neurons and show that by connecting them with reciprocal inhibition and some tonic input, then the steady-state will be one neuron ON and one neuron OFF, but which neuron will be ON and which neuron will be OFF will be chosen at random (perhaps, it would be better to say that microscopic noise in the analog computation will be turned into a megascale random bit). We then show that we can build a small network that uses this random bit process to generate repeatedly random bits. This random bit generator can then be connected with a neural net representing the deterministic part of randomized algorithm. We, therefore, demonstrate that these neural nets can carry out probabilistic computation and thus be less limited than classical neural nets.

  4. Probabilistic modeling of children's handwriting

    NASA Astrophysics Data System (ADS)

    Puri, Mukta; Srihari, Sargur N.; Hanson, Lisa

    2013-12-01

    There is little work done in the analysis of children's handwriting, which can be useful in developing automatic evaluation systems and in quantifying handwriting individuality. We consider the statistical analysis of children's handwriting in early grades. Samples of handwriting of children in Grades 2-4 who were taught the Zaner-Bloser style were considered. The commonly occurring word "and" written in cursive style as well as hand-print were extracted from extended writing. The samples were assigned feature values by human examiners using a truthing tool. The human examiners looked at how the children constructed letter formations in their writing, looking for similarities and differences from the instructions taught in the handwriting copy book. These similarities and differences were measured using a feature space distance measure. Results indicate that the handwriting develops towards more conformity with the class characteristics of the Zaner-Bloser copybook which, with practice, is the expected result. Bayesian networks were learnt from the data to enable answering various probabilistic queries, such as determining students who may continue to produce letter formations as taught during lessons in school and determining the students who will develop a different and/or variation of the those letter formations and the number of different types of letter formations.

  5. Representation of probabilistic scientific knowledge.

    PubMed

    Soldatova, Larisa N; Rzhetsky, Andrey; De Grave, Kurt; King, Ross D

    2013-04-15

    The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO. PMID:23734675

  6. Representation of probabilistic scientific knowledge

    PubMed Central

    2013-01-01

    The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO PMID:23734675

  7. Dynamical systems probabilistic risk assessment.

    SciTech Connect

    Denman, Matthew R.; Ames, Arlo Leroy

    2014-03-01

    Probabilistic Risk Assessment (PRA) is the primary tool used to risk-inform nuclear power regulatory and licensing activities. Risk-informed regulations are intended to reduce inherent conservatism in regulatory metrics (e.g., allowable operating conditions and technical specifications) which are built into the regulatory framework by quantifying both the total risk profile as well as the change in the risk profile caused by an event or action (e.g., in-service inspection procedures or power uprates). Dynamical Systems (DS) analysis has been used to understand unintended time-dependent feedbacks in both industrial and organizational settings. In dynamical systems analysis, feedback loops can be characterized and studied as a function of time to describe the changes to the reliability of plant Structures, Systems and Components (SSCs). While DS has been used in many subject areas, some even within the PRA community, it has not been applied toward creating long-time horizon, dynamic PRAs (with time scales ranging between days and decades depending upon the analysis). Understanding slowly developing dynamic effects, such as wear-out, on SSC reliabilities may be instrumental in ensuring a safely and reliably operating nuclear fleet. Improving the estimation of a plant's continuously changing risk profile will allow for more meaningful risk insights, greater stakeholder confidence in risk insights, and increased operational flexibility.

  8. Probabilistic risk assessment familiarization training

    SciTech Connect

    Phillabaum, J.L.

    1989-01-01

    Philadelphia Electric Company (PECo) created a Nuclear Group Risk and Reliability Assessment Program Plan in order to focus the utilization of probabilistic risk assessment (PRA) in support of Limerick Generating Station and Peach Bottom Atomic Power Station. The continuation of a PRA program was committed by PECo to the U.S. Nuclear Regulatory Commission (NRC) prior to be the issuance of an operating license for Limerick Unit 1. It is believed that increased use of PRA techniques to support activities at Limerick and Peach Bottom will enhance PECo's overall nuclear excellence. Training for familiarization with PRA is designed for attendance once by all nuclear group personnel to understand PRA and its potential effect on their jobs. The training content describes the history of PRA and how it applies to PECo's nuclear activities. Key PRA concepts serve as the foundation for the familiarization training. These key concepts are covered in all classes to facilitate an appreciation of the remaining material, which is tailored to the audience. Some of the concepts covered are comparison of regulatory philosophy to PRA techniques, fundamentals of risk/success, risk equation/risk summation, and fault trees and event trees. Building on the concepts, PRA insights and applications are then described that are tailored to the audience.

  9. Probabilistic description of traffic flow

    NASA Astrophysics Data System (ADS)

    Mahnke, R.; Kaupužs, J.; Lubashevsky, I.

    2005-03-01

    A stochastic description of traffic flow, called probabilistic traffic flow theory, is developed. The general master equation is applied to relatively simple models to describe the formation and dissolution of traffic congestions. Our approach is mainly based on spatially homogeneous systems like periodically closed circular rings without on- and off-ramps. We consider a stochastic one-step process of growth or shrinkage of a car cluster (jam). As generalization we discuss the coexistence of several car clusters of different sizes. The basic problem is to find a physically motivated ansatz for the transition rates of the attachment and detachment of individual cars to a car cluster consistent with the empirical observations in real traffic. The emphasis is put on the analogy with first-order phase transitions and nucleation phenomena in physical systems like supersaturated vapour. The results are summarized in the flux-density relation, the so-called fundamental diagram of traffic flow, and compared with empirical data. Different regimes of traffic flow are discussed: free flow, congested mode as stop-and-go regime, and heavy viscous traffic. The traffic breakdown is studied based on the master equation as well as the Fokker-Planck approximation to calculate mean first passage times or escape rates. Generalizations are developed to allow for on-ramp effects. The calculated flux-density relation and characteristic breakdown times coincide with empirical data measured on highways. Finally, a brief summary of the stochastic cellular automata approach is given.

  10. MOND using a probabilistic approach

    NASA Astrophysics Data System (ADS)

    Raut, Usha

    2009-05-01

    MOND has been proposed as a viable alternative to the dark matter hypothesis. In the original MOND formulation [1], a modification of Newtonian Dynamics was brought about by postulating new equations of particle motion at extremely low accelerations, as a possible explanation for the flat rotation curves of spiral galaxies. In this paper, we attempt a different approach to modify the usual force laws by trying to link gravity with the probabilistic aspects of quantum mechanics [2]. In order to achieve this, one starts by replacing the classical notion of a continuous distance between two elementary particles with a statistical probability function, π. The gravitational force between two elementary particles then can be interpreted in terms of the probability of interaction between them. We attempt to show that such a modified gravitational force would fall off a lot slower than the usual inverse square law predicts, leading to revised MOND equations. In the limit that the statistical aggregate of the probabilities becomes equal to the usual inverse square law force, we recover Newtonian/Einstein gravity.[3pt] [1] Milgrom, M. 1983, ApJ, 270, 365 [2] Goradia, S. 2002, .org/pdf/physics/0210040

  11. Development and use of a train-level probabilistic risk assessment

    SciTech Connect

    Smith, C.L.; Fowler, R.D.; Wolfram, L.M. )

    1993-04-01

    The Idaho National Engineering Laboratory examined the potential for the development of train-level probabilistic risk assessment (PRA) databases. These train-level databases will allow the Nuclear Regulatory Commission to investigate effects on plant core damage frequency (CDF) given a train is failed or taken out of service. The intent of this task was to develop user-friendly databases that required a minimalamount of personnel involvement to be usable. It was originally intended that the train-level models would not be expanded to include basic events below the top gate of a train, with the possible exception of including some of the major train-related components (e.g., important pumps and motor-operated valves). It was found that a database similar to the original plant PRA provided the accuracy needed to measure the changes in plant CDF. The Peach Bottom Unit 2 NUREG-1150 PRA (a large fault tree model) and the Beaver Valley Unit 2 IPE (a large event tree model) were selected to demonstrate the feasibility of developing train-level databases. Five different methods for developing train-level databases were hypothesized and are examined. Ultimately, two train-level databases were developed using the Peach Bottom Unit 2 PRA and onetrain-level database was developed using the Beaver Valley Unit 2 IPE. The development, use, limitations, and results of these train-level databases are discussed.

  12. Interactive Graphical Queries for Bibliographic Search.

    ERIC Educational Resources Information Center

    Brooks, Martin; Campbell, Jennifer

    1999-01-01

    Presents "Islands," an interactive graphical interface for construction, modification, and management of queries during a search session on a bibliographic database. Discusses motivation and bibliographic search semantics and compares the Islands interface to the Dialog interface. (Author/LRW)

  13. CD-ROM-aided Databases

    NASA Astrophysics Data System (ADS)

    Hasegawa, Tamae; Osanai, Masaaki

    This paper focuses on practical examples for using the CD-ROM version of Books In Print Plus, a database of book information produced by R. R. Bowker. The paper details the contents, installation and operation procedures, hardware requirements, search functions, search items, print commands, and special features of Books in Print Plus. The paper also includes an evaluation of this product based on four examples from actual office use. The paper concludes with a brief introduction to Ulrich’s Plus, a similar CD-ROM product for periodical information.

  14. Citation Searching: Search Smarter & Find More

    ERIC Educational Resources Information Center

    Hammond, Chelsea C.; Brown, Stephanie Willen

    2008-01-01

    The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…

  15. Construction of Database for Pulsating Variable Stars

    NASA Astrophysics Data System (ADS)

    Chen, B. Q.; Yang, M.; Jiang, B. W.

    2011-07-01

    A database for the pulsating variable stars is constructed for Chinese astronomers to study the variable stars conveniently. The database includes about 230000 variable stars in the Galactic bulge, LMC and SMC observed by the MACHO (MAssive Compact Halo Objects) and OGLE (Optical Gravitational Lensing Experiment) projects at present. The software used for the construction is LAMP, i.e., Linux+Apache+MySQL+PHP. A web page is provided to search the photometric data and the light curve in the database through the right ascension and declination of the object. More data will be incorporated into the database.

  16. DEPOT database: Reference manual and user's guide

    SciTech Connect

    Clancey, P.; Logg, C.

    1991-03-01

    DEPOT has been developed to provide tracking for the Stanford Linear Collider (SLC) control system equipment. For each piece of equipment entered into the database, complete location, service, maintenance, modification, certification, and radiation exposure histories can be maintained. To facilitate data entry accuracy, efficiency, and consistency, barcoding technology has been used extensively. DEPOT has been an important tool in improving the reliability of the microsystems controlling SLC. This document describes the components of the DEPOT database, the elements in the database records, and the use of the supporting programs for entering data, searching the database, and producing reports from the information.

  17. Integration of Evidence Base into a Probabilistic Risk Assessment

    NASA Technical Reports Server (NTRS)

    Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei

    2011-01-01

    INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list

  18. On the probabilistic optimization of spiking neural networks.

    PubMed

    Schliebs, Stefan; Kasabov, Nikola; Defoin-Platel, Michaël

    2010-12-01

    The construction of a Spiking Neural Network (SNN), i.e. the choice of an appropriate topology and the configuration of its internal parameters, represents a great challenge for SNN based applications. Evolutionary Algorithms (EAs) offer an elegant solution for these challenges and methods capable of exploring both types of search spaces simultaneously appear to be the most promising ones. A variety of such heterogeneous optimization algorithms have emerged recently, in particular in the field of probabilistic optimization. In this paper, a literature review on heterogeneous optimization algorithms is presented and an example of probabilistic optimization of SNN is discussed in detail. The paper provides an experimental analysis of a novel Heterogeneous Multi-Model Estimation of Distribution Algorithm (hMM-EDA). First, practical guidelines for configuring the method are derived and then the performance of hMM-EDA is compared to state-of-the-art optimization algorithms. Results show hMM-EDA as a light-weight, fast and reliable optimization method that requires the configuration of only very few parameters. Its performance on a synthetic heterogeneous benchmark problem is highly competitive and suggests its suitability for the optimization of SNN.

  19. Robust Control Design for Systems With Probabilistic Uncertainty

    NASA Technical Reports Server (NTRS)

    Crespo, Luis G.; Kenny, Sean P.

    2005-01-01

    This paper presents a reliability- and robustness-based formulation for robust control synthesis for systems with probabilistic uncertainty. In a reliability-based formulation, the probability of violating design requirements prescribed by inequality constraints is minimized. In a robustness-based formulation, a metric which measures the tendency of a random variable/process to cluster close to a target scalar/function is minimized. A multi-objective optimization procedure, which combines stability and performance requirements in time and frequency domains, is used to search for robustly optimal compensators. Some of the fundamental differences between the proposed strategy and conventional robust control methods are: (i) unnecessary conservatism is eliminated since there is not need for convex supports, (ii) the most likely plants are favored during synthesis allowing for probabilistic robust optimality, (iii) the tradeoff between robust stability and robust performance can be explored numerically, (iv) the uncertainty set is closely related to parameters with clear physical meaning, and (v) compensators with improved robust characteristics for a given control structure can be synthesized.

  20. The new international GLE database

    NASA Astrophysics Data System (ADS)

    Duldig, M. L.; Watts, D. J.

    2001-08-01

    The Australian Antarctic Division has agreed to host the international GLE database. Access to the database is via a world-wide-web interface and initially covers all GLEs since the start of the 22nd solar cycle. Access restriction for recent events is controlled by password protection and these data are available only to those groups contributing data to the database. The restrictions to data will be automatically removed for events older than 2 years, in accordance with the data exchange provisions of the Antarctic Treaty. Use of the data requires acknowledgment of the database as the source of the data and acknowledgment of the specific groups that provided the data used. Furthermore, some groups that provide data to the database have specific acknowledgment requirements or wording. A new submission format has been developed that will allow easier exchange of data, although the old format will be acceptable for some time. Data download options include direct web based download and email. Data may also be viewed as listings or plots with web browsers. Search options have also been incorporated. Development of the database will be ongoing with extension to viewing and delivery options, addition of earlier data and the development of mirror sites. It is expected that two mirror sites, one in North America and one in Europe, will be developed to enable fast access for the whole cosmic ray community.