Sample records for querying biomedical terminologies

  1. Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies.

    PubMed

    Dinh, Duy; Tamine, Lynda; Boubekeur, Fatiha

    2013-02-01

    The aim of this work is to evaluate a set of indexing and retrieval strategies based on the integration of several biomedical terminologies on the available TREC Genomics collections for an ad hoc information retrieval (IR) task. We propose a multi-terminology based concept extraction approach to selecting best concepts from free text by means of voting techniques. We instantiate this general approach on four terminologies (MeSH, SNOMED, ICD-10 and GO). We particularly focus on the effect of integrating terminologies into a biomedical IR process, and the utility of using voting techniques for combining the extracted concepts from each document in order to provide a list of unique concepts. Experimental studies conducted on the TREC Genomics collections show that our multi-terminology IR approach based on voting techniques are statistically significant compared to the baseline. For example, tested on the 2005 TREC Genomics collection, our multi-terminology based IR approach provides an improvement rate of +6.98% in terms of MAP (mean average precision) (p<0.05) compared to the baseline. In addition, our experimental results show that document expansion using preferred terms in combination with query expansion using terms from top ranked expanded documents improve the biomedical IR effectiveness. We have evaluated several voting models for combining concepts issued from multiple terminologies. Through this study, we presented many factors affecting the effectiveness of biomedical IR system including term weighting, query expansion, and document expansion models. The appropriate combination of those factors could be useful to improve the IR performance. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Auditing the NCI Thesaurus with Semantic Web Technologies

    PubMed Central

    Mougin, Fleur; Bodenreider, Olivier

    2008-01-01

    Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple store. By querying this store, we assessed the consistency of both hierarchical and associative relations from the NCI thesaurus among themselves and with corresponding relations in the UMLS Semantic Network. We show that the consistency is better for associative relations than for hierarchical relations. Causes for inconsistency and benefits from using Semantic Web technologies for auditing purposes are discussed. PMID:18999265

  3. Auditing the NCI thesaurus with semantic web technologies.

    PubMed

    Mougin, Fleur; Bodenreider, Olivier

    2008-11-06

    Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple store. By querying this store, we assessed the consistency of both hierarchical and associative relations from the NCI thesaurus among themselves and with corresponding relations in the UMLS Semantic Network. We show that the consistency is better for associative relations than for hierarchical relations. Causes for inconsistency and benefits from using Semantic Web technologies for auditing purposes are discussed.

  4. Heterogeneous database integration in biomedicine.

    PubMed

    Sujansky, W

    2001-08-01

    The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.

  5. Cross-terminology mapping challenges: a demonstration using medication terminological systems.

    PubMed

    Saitwal, Himali; Qing, David; Jones, Stephen; Bernstam, Elmer V; Chute, Christopher G; Johnson, Todd R

    2012-08-01

    Standardized terminological systems for biomedical information have provided considerable benefits to biomedical applications and research. However, practical use of this information often requires mapping across terminological systems-a complex and time-consuming process. This paper demonstrates the complexity and challenges of mapping across terminological systems in the context of medication information. It provides a review of medication terminological systems and their linkages, then describes a case study in which we mapped proprietary medication codes from an electronic health record to SNOMED CT and the UMLS Metathesaurus. The goal was to create a polyhierarchical classification system for querying an i2b2 clinical data warehouse. We found that three methods were required to accurately map the majority of actively prescribed medications. Only 62.5% of source medication codes could be mapped automatically. The remaining codes were mapped using a combination of semi-automated string comparison with expert selection, and a completely manual approach. Compound drugs were especially difficult to map: only 7.5% could be mapped using the automatic method. General challenges to mapping across terminological systems include (1) the availability of up-to-date information to assess the suitability of a given terminological system for a particular use case, and to assess the quality and completeness of cross-terminology links; (2) the difficulty of correctly using complex, rapidly evolving, modern terminologies; (3) the time and effort required to complete and evaluate the mapping; (4) the need to address differences in granularity between the source and target terminologies; and (5) the need to continuously update the mapping as terminological systems evolve. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Cross-terminology mapping challenges: A demonstration using medication terminological systems

    PubMed Central

    Saitwal, Himali; Qing, David; Jones, Stephen; Bernstam, Elmer; Chute, Christopher G.; Johnson, Todd R.

    2015-01-01

    Standardized terminological systems for biomedical information have provided considerable benefits to biomedical applications and research. However, practical use of this information often requires mapping across terminological systems—a complex and time-consuming process. This paper demonstrates the complexity and challenges of mapping across terminological systems in the context of medication information. It provides a review of medication terminological systems and their linkages, then describes a case study in which we mapped proprietary medication codes from an electronic health record to SNOMED-CT and the UMLS Metathesaurus. The goal was to create a polyhierarchical classification system for querying an i2b2 clinical data warehouse. We found that three methods were required to accurately map the majority of actively prescribed medications. Only 62.5% of source medication codes could be mapped automatically. The remaining codes were mapped using a combination of semi-automated string comparison with expert selection, and a completely manual approach. Compound drugs were especially difficult to map: only 7.5% could be mapped using the automatic method. General challenges to mapping across terminological systems include (1) the availability of up-to-date information to assess the suitability of a given terminological system for a particular use case, and to assess the quality and completeness of cross-terminology links; (2) the difficulty of correctly using complex, rapidly evolving, modern terminologies; (3) the time and effort required to complete and evaluate the mapping; (4) the need to address differences in granularity between the source and target terminologies; and (5) the need to continuously update the mapping as terminological systems evolve. PMID:22750536

  7. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    PubMed

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  8. Improve Biomedical Information Retrieval using Modified Learning to Rank Methods.

    PubMed

    Xu, Bo; Lin, Hongfei; Lin, Yuan; Ma, Yunlong; Yang, Liang; Wang, Jian; Yang, Zhihao

    2016-06-14

    In these years, the number of biomedical articles has increased exponentially, which becomes a problem for biologists to capture all the needed information manually. Information retrieval technologies, as the core of search engines, can deal with the problem automatically, providing users with the needed information. However, it is a great challenge to apply these technologies directly for biomedical retrieval, because of the abundance of domain specific terminologies. To enhance biomedical retrieval, we propose a novel framework based on learning to rank. Learning to rank is a series of state-of-the-art information retrieval techniques, and has been proved effective in many information retrieval tasks. In the proposed framework, we attempt to tackle the problem of the abundance of terminologies by constructing ranking models, which focus on not only retrieving the most relevant documents, but also diversifying the searching results to increase the completeness of the resulting list for a given query. In the model training, we propose two novel document labeling strategies, and combine several traditional retrieval models as learning features. Besides, we also investigate the usefulness of different learning to rank approaches in our framework. Experimental results on TREC Genomics datasets demonstrate the effectiveness of our framework for biomedical information retrieval.

  9. Olelo: a web application for intuitive exploration of biomedical literature

    PubMed Central

    Niedermeier, Julian; Jankrift, Marcel; Tietböhl, Sören; Stachewicz, Toni; Folkerts, Hendrik; Uflacker, Matthias; Neves, Mariana

    2017-01-01

    Abstract Researchers usually query the large biomedical literature in PubMed via keywords, logical operators and filters, none of which is very intuitive. Question answering systems are an alternative to keyword searches. They allow questions in natural language as input and results reflect the given type of question, such as short answers and summaries. Few of those systems are available online but they experience drawbacks in terms of long response times and they support a limited amount of question and result types. Additionally, user interfaces are usually restricted to only displaying the retrieved information. For our Olelo web application, we combined biomedical literature and terminologies in a fast in-memory database to enable real-time responses to researchers’ queries. Further, we extended the built-in natural language processing features of the database with question answering and summarization procedures. Combined with a new explorative approach of document filtering and a clean user interface, Olelo enables a fast and intelligent search through the ever-growing biomedical literature. Olelo is available at http://www.hpi.de/plattner/olelo. PMID:28472397

  10. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

    PubMed Central

    Gururaj, Anupama E.; Chen, Xiaoling; Pournejati, Saeid; Alter, George; Hersh, William R.; Demner-Fushman, Dina; Ohno-Machado, Lucila

    2017-01-01

    Abstract The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. Database URL: https://biocaddie.org/benchmark-data PMID:29220453

  11. Improving average ranking precision in user searches for biomedical research datasets

    PubMed Central

    Gobeill, Julien; Gaudinat, Arnaud; Vachon, Thérèse; Ruch, Patrick

    2017-01-01

    Abstract Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorization method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries, and provided competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP, being +22.3% higher than the median infAP of the participant’s best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system’s performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. The similarity measure algorithm showed robust performance in different training conditions, with small performance variations compared to the Divergence from Randomness framework. Finally, the result categorization did not have significant impact on the system’s performance. We believe that our solution could be used to enhance biomedical dataset management systems. The use of data driven expansion methods, such as those based on word embeddings, could be an alternative to the complexity of biomedical terminologies. Nevertheless, due to the limited size of the assessment set, further experiments need to be performed to draw conclusive results. Database URL: https://biocaddie.org/benchmark-data PMID:29220475

  12. CoMetaR: A Collaborative Metadata Repository for Biomedical Research Networks.

    PubMed

    Stöhr, Mark R; Helm, Gudrun; Majeed, Raphael W; Günther, Andreas

    2017-01-01

    The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. To perform consortium-wide queries through one single interface, it requires a uniform conceptual structure. No single terminology covers all our concepts. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management "CoMetaR". Anyone can browse and discuss the ontology while editing can be performed by authenticated users.

  13. A Query Integrator and Manager for the Query Web

    PubMed Central

    Brinkley, James F.; Detwiler, Landon T.

    2012-01-01

    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831

  14. SEMCARE: Multilingual Semantic Search in Semi-Structured Clinical Data.

    PubMed

    López-García, Pablo; Kreuzthaler, Markus; Schulz, Stefan; Scherr, Daniel; Daumke, Philipp; Markó, Kornél; Kors, Jan A; van Mulligen, Erik M; Wang, Xinkai; Gonna, Hanney; Behr, Elijah; Honrado, Ángel

    2016-01-01

    The vast amount of clinical data in electronic health records constitutes a great potential for secondary use. However, most of this content consists of unstructured or semi-structured texts, which is difficult to process. Several challenges are still pending: medical language idiosyncrasies in different natural languages, and the large variety of medical terminology systems. In this paper we present SEMCARE, a European initiative designed to minimize these problems by providing a multi-lingual platform (English, German, and Dutch) that allows users to express complex queries and obtain relevant search results from clinical texts. SEMCARE is based on a selection of adapted biomedical terminologies, together with Apache UIMA and Apache Solr as open source state-of-the-art natural language pipeline and indexing technologies. SEMCARE has been deployed and is currently being tested at three medical institutions in the UK, Austria, and the Netherlands, showing promising results in a cardiology use case.

  15. The National Center for Biomedical Ontology

    PubMed Central

    Noy, Natalya F; Shah, Nigam H; Whetzel, Patricia L; Chute, Christopher G; Story, Margaret-Anne; Smith, Barry

    2011-01-01

    The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data. PMID:22081220

  16. Ontological Approach to Military Knowledge Modeling and Management

    DTIC Science & Technology

    2004-03-01

    federated search mechanism has to reformulate user queries (expressed using the ontology) in the query languages of the different sources (e.g. SQL...ontologies as a common terminology – Unified query to perform federated search • Query processing – Ontology mapping to sources reformulate queries

  17. Where to search top-K biomedical ontologies?

    PubMed

    Oliveira, Daniela; Butt, Anila Sahar; Haller, Armin; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh

    2018-03-20

    Searching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements. We have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries. The main outcome of this work is the identification of key search factors for biomedical ontologies together with search requirements and a set of recommendations that will help biomedical experts and ontology engineers to select the best-suited retrieval mechanism in their search scenarios. We expect that this evaluation will allow researchers and practitioners to apply the current search techniques more reliably and that it will help them to select the right solution for their daily work. The source code (of seven ranking algorithms), ground truths and experimental results are available at https://github.com/danielapoliveira/bioont-search-benchmark.

  18. An alternative database approach for management of SNOMED CT and improved patient data queries.

    PubMed

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Multi-field query expansion is effective for biomedical dataset retrieval.

    PubMed

    Bouadjenek, Mohamed Reda; Verspoor, Karin

    2017-01-01

    In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. © The Author(s) 2017. Published by Oxford University Press.

  20. Multi-field query expansion is effective for biomedical dataset retrieval

    PubMed Central

    2017-01-01

    Abstract In the context of the bioCADDIE challenge addressing information retrieval of biomedical datasets, we propose a method for retrieval of biomedical data sets with heterogenous schemas through query reformulation. In particular, the method proposed transforms the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets. We compare and evaluate two query expansion strategies, one based on the Rocchio method and another based on a biomedical lexicon. We then perform a comprehensive comparative evaluation of our method on the bioCADDIE dataset collection for biomedical retrieval. We demonstrate the effectiveness of our multi-field query method compared to two baselines, with MAP improved from 0.2171 and 0.2669 to 0.2996. We also show the benefits of query expansion, where the Rocchio expanstion method improves the MAP for our two baselines from 0.2171 and 0.2669 to 0.335. We show that the Rocchio query expansion method slightly outperforms the one based on the biomedical lexicon as a source of terms, with an improvement of roughly 3% for MAP. However, the query expansion method based on the biomedical lexicon is much less resource intensive since it does not require computation of any relevance feedback set or any initial execution of the query. Hence, in term of trade-off between efficiency, execution time and retrieval accuracy, we argue that the query expansion method based on the biomedical lexicon offers the best performance for a prototype biomedical data search engine intended to be used at a large scale. In the official bioCADDIE challenge results, although our approach is ranked seventh in terms of the infNDCG evaluation metric, it ranks second in term of P@10 and NDCG. Hence, the method proposed here provides overall good retrieval performance in relation to the approaches of other competitors. Consequently, the observations made in this paper should benefit the development of a Data Discovery Index prototype or the improvement of the existing one. PMID:29220457

  1. Customization of biomedical terminologies.

    PubMed

    Homo, Julien; Dupuch, Laëtitia; Benbrahim, Allel; Grabar, Natalia; Dupuch, Marie

    2012-01-01

    Within the biomedical area over one hundred terminologies exist and are merged in the Unified Medical Language System Metathesaurus, which gives over 1 million concepts. When such huge terminological resources are available, the users must deal with them and specifically they must deal with irrelevant parts of these terminologies. We propose to exploit seed terms and semantic distance algorithms in order to customize the terminologies and to limit within them a semantically homogeneous space. An evaluation performed by a medical expert indicates that the proposed approach is relevant for the customization of terminologies and that the extracted terms are mostly relevant to the seeds. It also indicates that different algorithms provide with similar or identical results within a given terminology. The difference is due to the terminologies exploited. A special attention must be paid to the definition of optimal association between the semantic similarity algorithms and the thresholds specific to a given terminology.

  2. Use of controlled vocabularies to improve biomedical information retrieval tasks.

    PubMed

    Pasche, Emilie; Gobeill, Julien; Vishnyakova, Dina; Ruch, Patrick; Lovis, Christian

    2013-01-01

    The high heterogeneity of biomedical vocabulary is a major obstacle for information retrieval in large biomedical collections. Therefore, using biomedical controlled vocabularies is crucial for managing these contents. We investigate the impact of query expansion based on controlled vocabularies to improve the effectiveness of two search engines. Our strategy relies on the enrichment of users' queries with additional terms, directly derived from such vocabularies applied to infectious diseases and chemical patents. We observed that query expansion based on pathogen names resulted in improvements of the top-precision of our first search engine, while the normalization of diseases degraded the top-precision. The expansion of chemical entities, which was performed on the second search engine, positively affected the mean average precision. We have shown that query expansion of some types of biomedical entities has a great potential to improve search effectiveness; therefore a fine-tuning of query expansion strategies could help improving the performances of search engines.

  3. Environmental/Biomedical Terminology Index

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huffstetler, J.K.; Dailey, N.S.; Rickert, L.W.

    1976-12-01

    The Information Center Complex (ICC), a centrally administered group of information centers, provides information support to environmental and biomedical research groups and others within and outside Oak Ridge National Laboratory. In-house data base building and development of specialized document collections are important elements of the ongoing activities of these centers. ICC groups must be concerned with language which will adequately classify and insure retrievability of document records. Language control problems are compounded when the complexity of modern scientific problem solving demands an interdisciplinary approach. Although there are several word lists, indexes, and thesauri specific to various scientific disciplines usually groupedmore » as Environmental Sciences, no single generally recognized authority can be used as a guide to the terminology of all environmental science. If biomedical terminology for the description of research on environmental effects is also needed, the problem becomes even more complex. The building of a word list which can be used as a general guide to the environmental/biomedical sciences has been a continuing activity of the Information Center Complex. This activity resulted in the publication of the Environmental Biomedical Terminology Index (EBTI).« less

  4. Biomedical Terminology Mapper for UML projects.

    PubMed

    Thibault, Julien C; Frey, Lewis

    2013-01-01

    As the biomedical community collects and generates more and more data, the need to describe these datasets for exchange and interoperability becomes crucial. This paper presents a mapping algorithm that can help developers expose local implementations described with UML through standard terminologies. The input UML class or attribute name is first normalized and tokenized, then lookups in a UMLS-based dictionary are performed. For the evaluation of the algorithm 142 UML projects were extracted from caGrid and automatically mapped to National Cancer Institute (NCI) terminology concepts. Resulting mappings at the UML class and attribute levels were compared to the manually curated annotations provided in caGrid. Results are promising and show that this type of algorithm could speed-up the tedious process of mapping local implementations to standard biomedical terminologies.

  5. Biomedical Terminology Mapper for UML projects

    PubMed Central

    Thibault, Julien C.; Frey, Lewis

    As the biomedical community collects and generates more and more data, the need to describe these datasets for exchange and interoperability becomes crucial. This paper presents a mapping algorithm that can help developers expose local implementations described with UML through standard terminologies. The input UML class or attribute name is first normalized and tokenized, then lookups in a UMLS-based dictionary are performed. For the evaluation of the algorithm 142 UML projects were extracted from caGrid and automatically mapped to National Cancer Institute (NCI) terminology concepts. Resulting mappings at the UML class and attribute levels were compared to the manually curated annotations provided in caGrid. Results are promising and show that this type of algorithm could speed-up the tedious process of mapping local implementations to standard biomedical terminologies. PMID:24303278

  6. Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges

    PubMed Central

    Dos Reis, J. C.; Pruski, C.

    2015-01-01

    Summary Objectives Controlled terminologies and their dependent artefacts provide a consensual understanding of a domain while reducing ambiguities and enabling reasoning. However, the evolution of a domain’s knowledge directly impacts these terminologies and generates inconsistencies in the underlying biomedical information systems. In this article, we review existing work addressing the dynamic aspect of terminologies as well as their effects on mappings and semantic annotations. Methods We investigate approaches related to the identification, characterization and propagation of changes in terminologies, mappings and semantic annotations including techniques to update their content. Results and conclusion Based on the explored issues and existing methods, we outline open research challenges requiring investigation in the near future. PMID:26293859

  7. Clinical data integration model. Core interoperability ontology for research using primary care data.

    PubMed

    Ethier, J-F; Curcin, V; Barton, A; McGilchrist, M M; Bastiaens, H; Andreasson, A; Rossiter, J; Zhao, L; Arvanitis, T N; Taweel, A; Delaney, B C; Burgun, A

    2015-01-01

    This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". Primary care data is the single richest source of routine health care data. However its use, both in research and clinical work, often requires data from multiple clinical sites, clinical trials databases and registries. Data integration and interoperability are therefore of utmost importance. TRANSFoRm's general approach relies on a unified interoperability framework, described in a previous paper. We developed a core ontology for an interoperability framework based on data mediation. This article presents how such an ontology, the Clinical Data Integration Model (CDIM), can be designed to support, in conjunction with appropriate terminologies, biomedical data federation within TRANSFoRm, an EU FP7 project that aims to develop the digital infrastructure for a learning healthcare system in European Primary Care. TRANSFoRm utilizes a unified structural / terminological interoperability framework, based on the local-as-view mediation paradigm. Such an approach mandates the global information model to describe the domain of interest independently of the data sources to be explored. Following a requirement analysis process, no ontology focusing on primary care research was identified and, thus we designed a realist ontology based on Basic Formal Ontology to support our framework in collaboration with various terminologies used in primary care. The resulting ontology has 549 classes and 82 object properties and is used to support data integration for TRANSFoRm's use cases. Concepts identified by researchers were successfully expressed in queries using CDIM and pertinent terminologies. As an example, we illustrate how, in TRANSFoRm, the Query Formulation Workbench can capture eligibility criteria in a computable representation, which is based on CDIM. A unified mediation approach to semantic interoperability provides a flexible and extensible framework for all types of interaction between health record systems and research systems. CDIM, as core ontology of such an approach, enables simplicity and consistency of design across the heterogeneous software landscape and can support the specific needs of EHR-driven phenotyping research using primary care data.

  8. [Big data, medical language and biomedical terminology systems].

    PubMed

    Schulz, Stefan; López-García, Pablo

    2015-08-01

    A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.

  9. The caBIG Terminology Review Process

    PubMed Central

    Cimino, James J.; Hayamizu, Terry F.; Bodenreider, Olivier; Davis, Brian; Stafford, Grace A.; Ringwald, Martin

    2009-01-01

    The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG®), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort. PMID:19154797

  10. DataMed - an open source discovery index for finding biomedical datasets.

    PubMed

    Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua

    2018-01-13

    Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Terminology representation guidelines for biomedical ontologies in the semantic web notations.

    PubMed

    Tao, Cui; Pathak, Jyotishman; Solbrig, Harold R; Wei, Wei-Qi; Chute, Christopher G

    2013-02-01

    Terminologies and ontologies are increasingly prevalent in healthcare and biomedicine. However they suffer from inconsistent renderings, distribution formats, and syntax that make applications through common terminologies services challenging. To address the problem, one could posit a shared representation syntax, associated schema, and tags. We identified a set of commonly-used elements in biomedical ontologies and terminologies based on our experience with the Common Terminology Services 2 (CTS2) Specification as well as the Lexical Grid (LexGrid) project. We propose guidelines for precisely such a shared terminology model, and recommend tags assembled from SKOS, OWL, Dublin Core, RDF Schema, and DCMI meta-terms. We divide these guidelines into lexical information (e.g. synonyms, and definitions) and semantic information (e.g. hierarchies). The latter we distinguish for use by informal terminologies vs. formal ontologies. We then evaluate the guidelines with a spectrum of widely used terminologies and ontologies to examine how the lexical guidelines are implemented, and whether our proposed guidelines would enhance interoperability. Copyright © 2012 Elsevier Inc. All rights reserved.

  12. Terminology issues in user access to Web-based medical information.

    PubMed Central

    McCray, A. T.; Loane, R. F.; Browne, A. C.; Bangalore, A. K.

    1999-01-01

    We conducted a study of user queries to the National Library of Medicine Web site over a three month period. Our purpose was to study the nature and scope of these queries in order to understand how to improve users' access to the information they are seeking on our site. The results show that the queries are primarily medical in content (94%), with only a small percentage (5.5%) relating to library services, and with a very small percentage (.5%) not being medically relevant at all. We characterize the data set, and conclude with a discussion of our plans to develop a UMLS-based terminology server to assist NLM Web users. Images Figure 1 PMID:10566330

  13. Searching PubMed for studies on bacteremia, bloodstream infection, septicemia, or whatever the best term is: a note of caution.

    PubMed

    Søgaard, Mette; Andersen, Jens P; Schønheyder, Henrik C

    2012-04-01

    There is inconsistency in the terminology used to describe bacteremia. To demonstrate the impact on information retrieval, we compared the yield of articles from PubMed MEDLINE using the terms "bacteremia," "bloodstream infection," and "septicemia." We searched for articles published between 1966 and 2009, and depicted the relationships among queries graphically. To examine the content of the retrieved articles, we extracted all Medical Subject Headings (MeSH) terms and compared topic similarity using a cosine measure. The recovered articles differed greatly by term, and only 53 articles were captured by all terms. Of the articles retrieved by the "bacteremia" query, 21,438 (84.1%) were not captured when searching for "bloodstream infection" or "septicemia." Likewise, only 2,243 of the 11,796 articles recovered by free-text query for "bloodstream infection" were retrieved by the "bacteremia" query (19%). Entering "bloodstream infection" as a phrase, 46.1% of the records overlapped with the "bacteremia" query. Similarity measures ranged from 0.52 to 0.78 and were lowest for "bloodstream infection" as a phrase compared with "septicemia." Inconsistent terminology has a major impact on the yield of queries. Agreement on terminology should be sought and promoted by scientific journals. An immediate solution is to add "bloodstream infection" as entry term for bacteremia in the MeSH vocabulary. Copyright © 2012 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.

  14. Translating standards into practice - one Semantic Web API for Gene Expression.

    PubMed

    Deus, Helena F; Prud'hommeaux, Eric; Miller, Michael; Zhao, Jun; Malone, James; Adamusiak, Tomasz; McCusker, Jim; Das, Sudeshna; Rocca Serra, Philippe; Fox, Ronan; Marshall, M Scott

    2012-08-01

    Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-throughput experiment results in formats that can be interpreted by both researchers and analytical tools. Increasing adoption of Semantic Web and Linked Data technologies for the integration of heterogeneous and distributed health care and life sciences (HCLSs) datasets has made the reuse of standards even more pressing; dynamic semantic query federation can be used for integrative bioinformatics when ontologies and identifiers are reused across data instances. We present here a methodology to integrate the results and experimental context of three different representations of microarray-based transcriptomic experiments: the Gene Expression Atlas, the W3C BioRDF task force approach to reporting Provenance of Microarray Experiments, and the HSCI blood genomics project. Our approach does not attempt to improve the expressivity of existing standards for genomics but, instead, to enable integration of existing datasets published from microarray-based transcriptomic experiments. SPARQL Construct is used to create a posteriori mappings of concepts and properties and linking rules that match entities based on query constraints. We discuss how our integrative approach can encourage reuse of the Experimental Factor Ontology (EFO) and the Ontology for Biomedical Investigations (OBIs) for the reporting of experimental context and results of gene expression studies. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. Fostering Multilinguality in the UMLS: A Computational Approach to Terminology Expansion for Multiple Languages

    PubMed Central

    Hellrich, Johannes; Hahn, Udo

    2014-01-01

    We here report on efforts to computationally support the maintenance and extension of multilingual biomedical terminology resources. Our main idea is to treat term acquisition as a classification problem guided by term alignment in parallel multilingual corpora, using termhood information coming from of a named entity recognition system as a novel feature. We report on experiments for Spanish, French, German and Dutch parts of a multilingual UMLS-derived biomedical terminology. These efforts yielded 19k, 18k, 23k and 12k new terms and synonyms, respectively, from which about half relate to concepts without a previously available term label for these non-English languages. Based on expert assessment of a novel German terminology sample, 80% of the newly acquired terms were judged as reasonable additions to the terminology. PMID:25954371

  16. A Review of Auditing Methods Applied to the Content of Controlled Biomedical Terminologies

    PubMed Central

    Zhu, Xinxin; Fan, Jung-Wei; Baorto, David M.; Weng, Chunhua; Cimino, James J.

    2012-01-01

    Although controlled biomedical terminologies have been with us for centuries, it is only in the last couple of decades that close attention has been paid to the quality of these terminologies. The result of this attention has been the development of auditing methods that apply formal methods to assessing whether terminologies are complete and accurate. We have performed an extensive literature review to identify published descriptions of these methods and have created a framework for characterizing them. The framework considers manual, systematic and heuristic methods that use knowledge (within or external to the terminology) to measure quality factors of different aspects of the terminology content (terms, semantic classification, and semantic relationships). The quality factors examined included concept orientation, consistency, non-redundancy, soundness and comprehensive coverage. We reviewed 130 studies that were retrieved based on keyword search on publications in PubMed, and present our assessment of how they fit into our framework. We also identify which terminologies have been audited with the methods and provide examples to illustrate each part of the framework. PMID:19285571

  17. Terminology development towards harmonizing multiple clinical neuroimaging research repositories.

    PubMed

    Turner, Jessica A; Pasquerello, Danielle; Turner, Matthew D; Keator, David B; Alpert, Kathryn; King, Margaret; Landis, Drew; Calhoun, Vince D; Potkin, Steven G; Tallis, Marcelo; Ambite, Jose Luis; Wang, Lei

    2015-07-01

    Data sharing and mediation across disparate neuroimaging repositories requires extensive effort to ensure that the different domains of data types are referred to by commonly agreed upon terms. Within the SchizConnect project, which enables querying across decentralized databases of neuroimaging, clinical, and cognitive data from various studies of schizophrenia, we developed a model for each data domain, identified common usable terms that could be agreed upon across the repositories, and linked them to standard ontological terms where possible. We had the goal of facilitating both the current user experience in querying and future automated computations and reasoning regarding the data. We found that existing terminologies are incomplete for these purposes, even with the history of neuroimaging data sharing in the field; and we provide a model for efforts focused on querying multiple clinical neuroimaging repositories.

  18. Terminology development towards harmonizing multiple clinical neuroimaging research repositories

    PubMed Central

    Turner, Jessica A.; Pasquerello, Danielle; Turner, Matthew D.; Keator, David B.; Alpert, Kathryn; King, Margaret; Landis, Drew; Calhoun, Vince D.; Potkin, Steven G.; Tallis, Marcelo; Ambite, Jose Luis; Wang, Lei

    2015-01-01

    Data sharing and mediation across disparate neuroimaging repositories requires extensive effort to ensure that the different domains of data types are referred to by commonly agreed upon terms. Within the SchizConnect project, which enables querying across decentralized databases of neuroimaging, clinical, and cognitive data from various studies of schizophrenia, we developed a model for each data domain, identified common usable terms that could be agreed upon across the repositories, and linked them to standard ontological terms where possible. We had the goal of facilitating both the current user experience in querying and future automated computations and reasoning regarding the data. We found that existing terminologies are incomplete for these purposes, even with the history of neuroimaging data sharing in the field; and we provide a model for efforts focused on querying multiple clinical neuroimaging repositories. PMID:26688838

  19. Modeling a description logic vocabulary for cancer research.

    PubMed

    Hartel, Frank W; de Coronado, Sherri; Dionne, Robert; Fragoso, Gilberto; Golbeck, Jennifer

    2005-04-01

    The National Cancer Institute has developed the NCI Thesaurus, a biomedical vocabulary for cancer research, covering terminology across a wide range of cancer research domains. A major design goal of the NCI Thesaurus is to facilitate translational research. We describe: the features of Ontylog, a description logic used to build NCI Thesaurus; our methodology for enhancing the terminology through collaboration between ontologists and domain experts, and for addressing certain real world challenges arising in modeling the Thesaurus; and finally, we describe the conversion of NCI Thesaurus from Ontylog into Web Ontology Language Lite. Ontylog has proven well suited for constructing big biomedical vocabularies. We have capitalized on the Ontylog constructs Kind and Role in the collaboration process described in this paper to facilitate communication between ontologists and domain experts. The artifacts and processes developed by NCI for collaboration may be useful in other biomedical terminology development efforts.

  20. A prototype system to support evidence-based practice.

    PubMed

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E; Clayton, Jennifer; Thoma, George R

    2008-11-06

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans.

  1. A Prototype System to Support Evidence-based Practice

    PubMed Central

    Demner-Fushman, Dina; Seckman, Charlotte; Fisher, Cheryl; Hauser, Susan E.; Clayton, Jennifer; Thoma, George R.

    2008-01-01

    Translating evidence into clinical practice is a complex process that depends on the availability of evidence, the environment into which the research evidence is translated, and the system that facilitates the translation. This paper presents InfoBot, a system designed for automatic delivery of patient-specific information from evidence-based resources. A prototype system has been implemented to support development of individualized patient care plans. The prototype explores possibilities to automatically extract patients’ problems from the interdisciplinary team notes and query evidence-based resources using the extracted terms. Using 4,335 de-identified interdisciplinary team notes for 525 patients, the system automatically extracted biomedical terminology from 4,219 notes and linked resources to 260 patient records. Sixty of those records (15 each for Pediatrics, Oncology & Hematology, Medical & Surgical, and Behavioral Health units) have been selected for an ongoing evaluation of the quality of automatically proactively delivered evidence and its usefulness in development of care plans. PMID:18998835

  2. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  3. Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

    PubMed

    Ji, Yanqing; Ying, Hao; Tran, John; Dews, Peter; Massanari, R Michael

    2016-07-19

    Finding highly relevant articles from biomedical databases is challenging not only because it is often difficult to accurately express a user's underlying intention through keywords but also because a keyword-based query normally returns a long list of hits with many citations being unwanted by the user. This paper proposes a novel biomedical literature search system, called BiomedSearch, which supports complex queries and relevance feedback. The system employed association mining techniques to build a k-profile representing a user's relevance feedback. More specifically, we developed a weighted interest measure and an association mining algorithm to find the strength of association between a query and each concept in the article(s) selected by the user as feedback. The top concepts were utilized to form a k-profile used for the next-round search. BiomedSearch relies on Unified Medical Language System (UMLS) knowledge sources to map text files to standard biomedical concepts. It was designed to support queries with any levels of complexity. A prototype of BiomedSearch software was made and it was preliminarily evaluated using the Genomics data from TREC (Text Retrieval Conference) 2006 Genomics Track. Initial experiment results indicated that BiomedSearch increased the mean average precision (MAP) for a set of queries. With UMLS and association mining techniques, BiomedSearch can effectively utilize users' relevance feedback to improve the performance of biomedical literature search.

  4. Semantator: semantic annotator for converting biomedical text to linked data.

    PubMed

    Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

    2013-10-01

    More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Supporting infobuttons with terminological knowledge.

    PubMed Central

    Cimino, J. J.; Elhanan, G.; Zeng, Q.

    1997-01-01

    We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration. PMID:9357682

  6. Supporting infobuttons with terminological knowledge.

    PubMed

    Cimino, J J; Elhanan, G; Zeng, Q

    1997-01-01

    We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration.

  7. Facilitating biomedical researchers' interrogation of electronic health record data: Ideas from outside of biomedical informatics.

    PubMed

    Hruby, Gregory W; Matsoukas, Konstantina; Cimino, James J; Weng, Chunhua

    2016-04-01

    Electronic health records (EHR) are a vital data resource for research uses, including cohort identification, phenotyping, pharmacovigilance, and public health surveillance. To realize the promise of EHR data for accelerating clinical research, it is imperative to enable efficient and autonomous EHR data interrogation by end users such as biomedical researchers. This paper surveys state-of-art approaches and key methodological considerations to this purpose. We adapted a previously published conceptual framework for interactive information retrieval, which defines three entities: user, channel, and source, by elaborating on channels for query formulation in the context of facilitating end users to interrogate EHR data. We show the current progress in biomedical informatics mainly lies in support for query execution and information modeling, primarily due to emphases on infrastructure development for data integration and data access via self-service query tools, but has neglected user support needed during iteratively query formulation processes, which can be costly and error-prone. In contrast, the information science literature has offered elaborate theories and methods for user modeling and query formulation support. The two bodies of literature are complementary, implying opportunities for cross-disciplinary idea exchange. On this basis, we outline the directions for future informatics research to improve our understanding of user needs and requirements for facilitating autonomous interrogation of EHR data by biomedical researchers. We suggest that cross-disciplinary translational research between biomedical informatics and information science can benefit our research in facilitating efficient data access in life sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Faceted Visualization of Three Dimensional Neuroanatomy By Combining Ontology with Faceted Search

    PubMed Central

    Veeraraghavan, Harini; Miller, James V.

    2013-01-01

    In this work, we present a faceted-search based approach for visualization of anatomy by combining a three dimensional digital atlas with an anatomy ontology. Specifically, our approach provides a drill-down search interface that exposes the relevant pieces of information (obtained by searching the ontology) for a user query. Hence, the user can produce visualizations starting with minimally specified queries. Furthermore, by automatically translating the user queries into the controlled terminology our approach eliminates the need for the user to use controlled terminology. We demonstrate the scalability of our approach using an abdominal atlas and the same ontology. We implemented our visualization tool on the opensource 3D Slicer software. We present results of our visualization approach by combining a modified Foundational Model of Anatomy (FMA) ontology with the Surgical Planning Laboratory (SPL) Brain 3D digital atlas, and geometric models specific to patients computed using the SPL brain tumor dataset. PMID:24006207

  9. Faceted visualization of three dimensional neuroanatomy by combining ontology with faceted search.

    PubMed

    Veeraraghavan, Harini; Miller, James V

    2014-04-01

    In this work, we present a faceted-search based approach for visualization of anatomy by combining a three dimensional digital atlas with an anatomy ontology. Specifically, our approach provides a drill-down search interface that exposes the relevant pieces of information (obtained by searching the ontology) for a user query. Hence, the user can produce visualizations starting with minimally specified queries. Furthermore, by automatically translating the user queries into the controlled terminology our approach eliminates the need for the user to use controlled terminology. We demonstrate the scalability of our approach using an abdominal atlas and the same ontology. We implemented our visualization tool on the opensource 3D Slicer software. We present results of our visualization approach by combining a modified Foundational Model of Anatomy (FMA) ontology with the Surgical Planning Laboratory (SPL) Brain 3D digital atlas, and geometric models specific to patients computed using the SPL brain tumor dataset.

  10. Development of a Pediatric Adverse Events Terminology

    PubMed Central

    Gipson, Debbie S.; Kirkendall, Eric S.; Gumbs-Petty, Brenda; Quinn, Theresa; Steen, A.; Hicks, Amanda; McMahon, Ann; Nicholas, Savian; Zhao-Wong, Anna; Taylor-Zapata, Perdita; Turner, Mark; Herreshoff, Emily; Jones, Charlotte; Davis, Jonathan M.; Haber, Margaret; Hirschfeld, Steven

    2017-01-01

    In 2009, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) established the Pediatric Terminology Harmonization Initiative to establish a core library of terms to facilitate the acquisition and sharing of knowledge between pediatric clinical research, practice, and safety reporting. A coalition of partners established a Pediatric Terminology Adverse Event Working Group in 2013 to develop a specific terminology relevant to international pediatric adverse event (AE) reporting. Pediatric specialists with backgrounds in clinical care, research, safety reporting, or informatics, supported by biomedical terminology experts from the National Cancer Institute’s Enterprise Vocabulary Services participated. The multinational group developed a working definition of AEs and reviewed concepts (terms, synonyms, and definitions) from 16 pediatric clinical domains. The resulting AE terminology contains >1000 pediatric diseases, disorders, or clinical findings. The terms were tested for proof of concept use in 2 different settings: hospital readmissions and the NICU. The advantages of the AE terminology include ease of adoption due to integration with well-established and internationally accepted biomedical terminologies, a uniquely temporal focus on pediatric health and disease from conception through adolescence, and terms that could be used in both well- and underresourced environments. The AE terminology is available for use without restriction through the National Cancer Institute’s Enterprise Vocabulary Services and is fully compatible with, and represented in, the Medical Dictionary for Regulatory Activities. The terminology is intended to mature with use, user feedback, and optimization. PMID:28028203

  11. Development of a Pediatric Adverse Events Terminology.

    PubMed

    Gipson, Debbie S; Kirkendall, Eric S; Gumbs-Petty, Brenda; Quinn, Theresa; Steen, A; Hicks, Amanda; McMahon, Ann; Nicholas, Savian; Zhao-Wong, Anna; Taylor-Zapata, Perdita; Turner, Mark; Herreshoff, Emily; Jones, Charlotte; Davis, Jonathan M; Haber, Margaret; Hirschfeld, Steven

    2017-01-01

    In 2009, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) established the Pediatric Terminology Harmonization Initiative to establish a core library of terms to facilitate the acquisition and sharing of knowledge between pediatric clinical research, practice, and safety reporting. A coalition of partners established a Pediatric Terminology Adverse Event Working Group in 2013 to develop a specific terminology relevant to international pediatric adverse event (AE) reporting. Pediatric specialists with backgrounds in clinical care, research, safety reporting, or informatics, supported by biomedical terminology experts from the National Cancer Institute's Enterprise Vocabulary Services participated. The multinational group developed a working definition of AEs and reviewed concepts (terms, synonyms, and definitions) from 16 pediatric clinical domains. The resulting AE terminology contains >1000 pediatric diseases, disorders, or clinical findings. The terms were tested for proof of concept use in 2 different settings: hospital readmissions and the NICU. The advantages of the AE terminology include ease of adoption due to integration with well-established and internationally accepted biomedical terminologies, a uniquely temporal focus on pediatric health and disease from conception through adolescence, and terms that could be used in both well- and underresourced environments. The AE terminology is available for use without restriction through the National Cancer Institute's Enterprise Vocabulary Services and is fully compatible with, and represented in, the Medical Dictionary for Regulatory Activities. The terminology is intended to mature with use, user feedback, and optimization. Copyright © 2017 by the American Academy of Pediatrics.

  12. Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of Cerebrotendinous xanthomatosis.

    PubMed

    Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J

    2012-07-31

    Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.

  13. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  14. A bayesian translational framework for knowledge propagation, discovery, and integration under specific contexts.

    PubMed

    Deng, Michelle; Zollanvari, Amin; Alterovitz, Gil

    2012-01-01

    The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts-rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well.

  15. A Bayesian Translational Framework for Knowledge Propagation, Discovery, and Integration Under Specific Contexts

    PubMed Central

    Deng, Michelle; Zollanvari, Amin; Alterovitz, Gil

    2012-01-01

    The immense corpus of biomedical literature existing today poses challenges in information search and integration. Many links between pieces of knowledge occur or are significant only under certain contexts—rather than under the entire corpus. This study proposes using networks of ontology concepts, linked based on their co-occurrences in annotations of abstracts of biomedical literature and descriptions of experiments, to draw conclusions based on context-specific queries and to better integrate existing knowledge. In particular, a Bayesian network framework is constructed to allow for the linking of related terms from two biomedical ontologies under the queried context concept. Edges in such a Bayesian network allow associations between biomedical concepts to be quantified and inference to be made about the existence of some concepts given prior information about others. This approach could potentially be a powerful inferential tool for context-specific queries, applicable to ontologies in other fields as well. PMID:22779044

  16. Spatial and symbolic queries for 3D image data

    NASA Astrophysics Data System (ADS)

    Benson, Daniel C.; Zick, Gregory L.

    1992-04-01

    We present a query system for an object-oriented biomedical imaging database containing 3-D anatomical structures and their corresponding 2-D images. The graphical interface facilitates the formation of spatial queries, nonspatial or symbolic queries, and combined spatial/symbolic queries. A query editor is used for the creation and manipulation of 3-D query objects as volumes, surfaces, lines, and points. Symbolic predicates are formulated through a combination of text fields and multiple choice selections. Query results, which may include images, image contents, composite objects, graphics, and alphanumeric data, are displayed in multiple views. Objects returned by the query may be selected directly within the views for further inspection or modification, or for use as query objects in subsequent queries. Our image database query system provides visual feedback and manipulation of spatial query objects, multiple views of volume data, and the ability to combine spatial and symbolic queries. The system allows for incremental enhancement of existing objects and the addition of new objects and spatial relationships. The query system is designed for databases containing symbolic and spatial data. This paper discuses its application to data acquired in biomedical 3- D image reconstruction, but it is applicable to other areas such as CAD/CAM, geographical information systems, and computer vision.

  17. PharmARTS: terminology web services for drug safety data coding and retrieval.

    PubMed

    Alecu, Iulian; Bousquet, Cédric; Degoulet, Patrice; Jaulent, Marie-Christine

    2007-01-01

    MedDRA and WHO-ART are the terminologies used to encode drug safety reports. The standardisation achieved with these terminologies facilitates: 1) The sharing of safety databases; 2) Data mining for the continuous reassessment of benefit-risk ratio at national or international level or in the pharmaceutical industry. There is some debate about the capacity of these terminologies for retrieving case reports related to similar medical conditions. We have developed a resource that allows grouping similar medical conditions more effectively than WHO-ART and MedDRA. We describe here a software tool facilitating the use of this terminological resource thanks to an RDF framework with support for RDF Schema inferencing and querying. This tool eases coding and data retrieval in drug safety.

  18. The Use of the Medical Dictionary for Regulatory Activities in the Identification of Mitochondrial Dysfunction in HIV-Infected Children.

    PubMed

    Chernoff, Miriam; Ford-Chatterton, Heather; Crain, Marilyn J

    2012-10-01

    To demonstrate the utility of a medical terminology-based method for identifying cases of possible mitochondrial dysfunction (MD) in a large cohort of youths with perinatal HIV infection and to describe the scoring algorithms. Medical Dictionary for Regulatory Activities (MedDRA) ® version 6 terminology was used to query clinical criteria for mitochondrial dysfunction by two published classifications, the Enquête Périnatale Française (EPF) and the Mitochondrial Disease Classification (MDC). Data from 2,931 participants with perinatal HIV infection on PACTG 219/219C were analyzed. Data were qualified for severity and persistence, after which clinical reviews of MedDRA-coded and other study data were performed. Of 14,000 data records captured by the EPF MedDRA query, there were 3,331 singular events. Of 18,000 captured by the MDC query, there were 3,841 events. Ten clinicians blindly reviewed non MedDRA-coded supporting data for 15 separate clinical conditions. We used the Statistical Analysis System (SAS) language to code scoring algorithms. 768 participants (26%) met the EPF case definition of possible MD; 694 (24%) met the MDC case definition, and 480 (16%) met both definitions. Subjective application of codes could have affected our results. MedDRA terminology does not include indicators of severity or persistence. Version 6.0 of MedDRA did not include Standard MedDRA Queries, which would have reduced the time needed to map MedDRA terms to EPF and MDC criteria. Together with a computer-coded scoring algorithm, MedDRA terminology enabled identification of potential MD based on clinical data from almost 3000 children with substantially less effort than a case by case review. The article is accessible to readers with a background in statistical hypothesis testing. An exposure to public health issues is useful but not strictly necessary.

  19. The Use of the Medical Dictionary for Regulatory Activities in the Identification of Mitochondrial Dysfunction in HIV-Infected Children

    PubMed Central

    Chernoff, Miriam; Ford-Chatterton, Heather; Crain, Marilyn J.

    2012-01-01

    Objective To demonstrate the utility of a medical terminology-based method for identifying cases of possible mitochondrial dysfunction (MD) in a large cohort of youths with perinatal HIV infection and to describe the scoring algorithms. Methods Medical Dictionary for Regulatory Activities (MedDRA)® version 6 terminology was used to query clinical criteria for mitochondrial dysfunction by two published classifications, the Enquête Périnatale Française (EPF) and the Mitochondrial Disease Classification (MDC). Data from 2,931 participants with perinatal HIV infection on PACTG 219/219C were analyzed. Data were qualified for severity and persistence, after which clinical reviews of MedDRA-coded and other study data were performed. Results Of 14,000 data records captured by the EPF MedDRA query, there were 3,331 singular events. Of 18,000 captured by the MDC query, there were 3,841 events. Ten clinicians blindly reviewed non MedDRA-coded supporting data for 15 separate clinical conditions. We used the Statistical Analysis System (SAS) language to code scoring algorithms. 768 participants (26%) met the EPF case definition of possible MD; 694 (24%) met the MDC case definition, and 480 (16%) met both definitions. Limitations Subjective application of codes could have affected our results. MedDRA terminology does not include indicators of severity or persistence. Version 6.0 of MedDRA did not include Standard MedDRA Queries, which would have reduced the time needed to map MedDRA terms to EPF and MDC criteria. Conclusion Together with a computer-coded scoring algorithm, MedDRA terminology enabled identification of potential MD based on clinical data from almost 3000 children with substantially less effort than a case by case review. The article is accessible to readers with a background in statistical hypothesis testing. An exposure to public health issues is useful but not strictly necessary. PMID:23797349

  20. Comparative effectiveness research designs: an analysis of terms and coverage in Medical Subject Headings (MeSH) and Emtree*†

    PubMed Central

    Bekhuis, Tanja; Demner-Fushman, Dina; Crowley, Rebecca S.

    2013-01-01

    Objectives: We analyzed the extent to which comparative effectiveness research (CER) organizations share terms for designs, analyzed coverage of CER designs in Medical Subject Headings (MeSH) and Emtree, and explored whether scientists use CER design terms. Methods: We developed local terminologies (LTs) and a CER design terminology by extracting terms in documents from five organizations. We defined coverage as the distribution over match type in MeSH and Emtree. We created a crosswalk by recording terms to which design terms mapped in both controlled vocabularies. We analyzed the hits for queries restricted to titles and abstracts to explore scientists' language. Results: Pairwise LT overlap ranged from 22.64% (12/53) to 75.61% (31/41). The CER design terminology (n = 78 terms) consisted of terms for primary study designs and a few terms useful for evaluating evidence, such as opinion paper and systematic review. Patterns of coverage were similar in MeSH and Emtree (gamma = 0.581, P = 0.002). Conclusions: Stakeholder terminologies vary, and terms are inconsistently covered in MeSH and Emtree. The CER design terminology and crosswalk may be useful for expert searchers. For partially mapped terms, queries could consist of free text for modifiers such as nonrandomized or interrupted added to broad or related controlled terms. PMID:23646024

  1. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    PubMed

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  2. In defense of the Desiderata.

    PubMed

    Cimino, James J

    2006-06-01

    A 1998 paper that delineated desirable characteristics, or desiderata for controlled medical terminologies attempted to summarize emerging consensus regarding structural issues of such terminologies. Among the Desiderata was a call for terminologies to be "concept oriented." Since then, research has trended toward the extension of terminologies into ontologies. A paper by Smith, entitled "From Concepts to Clinical Reality: An Essay on the Benchmarking of Biomedical Terminologies" urges a realist approach that seeks terminologies composed of universals, rather than concepts. The current paper addresses issues raised by Smith and attempts to extend the Desiderata, not away from concepts, but towards recognition that concepts and universals must both be embraced and can coexist peaceably in controlled terminologies. To that end, additional Desiderata are defined that deal with the purpose, rather than the structure, of controlled medical terminologies.

  3. An Examination of Natural Language as a Query Formation Tool for Retrieving Information on E-Health from Pub Med.

    ERIC Educational Resources Information Center

    Peterson, Gabriel M.; Su, Kuichun; Ries, James E.; Sievert, Mary Ellen C.

    2002-01-01

    Discussion of Internet use for information searches on health-related topics focuses on a study that examined complexity and variability of natural language in using search terms that express the concept of electronic health (e-health). Highlights include precision of retrieved information; shift in terminology; and queries using the Pub Med…

  4. From concepts to clinical reality: an essay on the benchmarking of biomedical terminologies.

    PubMed

    Smith, Barry

    2006-06-01

    It is only by fixing on agreed meanings of terms in biomedical terminologies that we will be in a position to achieve that accumulation and integration of knowledge that is indispensable to progress at the frontiers of biomedicine. Standardly, the goal of fixing meanings is seen as being realized through the alignment of terms on what are called 'concepts.' Part I addresses three versions of the concept-based approach--by Cimino, by Wüster, and by Campbell and associates--and surveys some of the problems to which they give rise, all of which have to do with a failure to anchor the terms in terminologies to corresponding referents in reality. Part II outlines a new, realist solution to this anchorage problem, which sees terminology construction as being motivated by the goal of alignment not on concepts but on the universals (kinds, types) in reality and thereby also on the corresponding instances (individuals, tokens). We outline the realist approach and show how on its basis we can provide a benchmark of correctness for terminologies which will at the same time allow a new type of integration of terminologies and electronic health records. We conclude by outlining ways in which the framework thus defined might be exploited for purposes of diagnostic decision-support.

  5. Querying phenotype-genotype relationships on patient datasets using semantic web technology: the example of cerebrotendinous xanthomatosis

    PubMed Central

    2012-01-01

    Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591

  6. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  7. Using a terminology server and consumer search phrases to help patients find physicians with particular expertise.

    PubMed

    Cole, Curtis L; Kanter, Andrew S; Cummens, Michael; Vostinar, Sean; Naeymi-Rad, Frank

    2004-01-01

    To design and implement a real world application using a terminology server to assist patients and physicians who use common language search terms to find specialist physicians with a particular clinical expertise. Terminology servers have been developed to help users encoding of information using complicated structured vocabulary during data entry tasks, such as recording clinical information. We describe a methodology using Personal Health Terminology trade mark and a SNOMED CT-based hierarchical concept server. Construction of a pilot mediated-search engine to assist users who use vernacular speech in querying data which is more technical than vernacular. This approach, which combines theoretical and practical requirements, provides a useful example of concept-based searching for physician referrals.

  8. A Domain-Specific Terminology for Retinopathy of Prematurity and Its Applications in Clinical Settings.

    PubMed

    Zhang, Yinsheng; Zhang, Guoming

    2018-01-01

    A terminology (or coding system) is a formal set of controlled vocabulary in a specific domain. With a well-defined terminology, each concept in the target domain is assigned with a unique code, which can be identified and processed across different medical systems in an unambiguous way. Though there are lots of well-known biomedical terminologies, there is currently no domain-specific terminology for ROP (retinopathy of prematurity). Based on a collection of historical ROP patients' data in the electronic medical record system, we extracted the most frequent terms in the domain and organized them into a hierarchical coding system-ROP Minimal Standard Terminology, which contains 62 core concepts in 4 categories. This terminology has been successfully used to provide highly structured and semantic-rich clinical data in several ROP-related applications.

  9. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities.

    PubMed

    Amith, Muhammad; He, Zhe; Bian, Jiang; Lossio-Ventura, Juan Antonio; Tao, Cui

    2018-04-01

    With the proliferation of heterogeneous health care data in the last three decades, biomedical ontologies and controlled biomedical terminologies play a more and more important role in knowledge representation and management, data integration, natural language processing, as well as decision support for health information systems and biomedical research. Biomedical ontologies and controlled terminologies are intended to assure interoperability. Nevertheless, the quality of biomedical ontologies has hindered their applicability and subsequent adoption in real-world applications. Ontology evaluation is an integral part of ontology development and maintenance. In the biomedicine domain, ontology evaluation is often conducted by third parties as a quality assurance (or auditing) effort that focuses on identifying modeling errors and inconsistencies. In this work, we first organized four categorical schemes of ontology evaluation methods in the existing literature to create an integrated taxonomy. Further, to understand the ontology evaluation practice in the biomedicine domain, we reviewed a sample of 200 ontologies from the National Center for Biomedical Ontology (NCBO) BioPortal-the largest repository for biomedical ontologies-and observed that only 15 of these ontologies have documented evaluation in their corresponding inception papers. We then surveyed the recent quality assurance approaches for biomedical ontologies and their use. We also mapped these quality assurance approaches to the ontology evaluation criteria. It is our anticipation that ontology evaluation and quality assurance approaches will be more widely adopted in the development life cycle of biomedical ontologies. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. Impact of Scientific Versus Emotional Wording of Patient Questions on Doctor-Patient Communication in an Internet Forum: A Randomized Controlled Experiment with Medical Students.

    PubMed

    Bientzle, Martina; Griewatz, Jan; Kimmerle, Joachim; Küppers, Julia; Cress, Ulrike; Lammerding-Koeppel, Maria

    2015-11-25

    Medical expert forums on the Internet play an increasing role in patient counseling. Therefore, it is important to understand how doctor-patient communication is influenced in such forums both by features of the patients or advice seekers, as expressed in their forum queries, and by characteristics of the medical experts involved. In this experimental study, we aimed to examine in what way (1) the particular wording of patient queries and (2) medical experts' therapeutic health concepts (for example, beliefs around adhering to a distinctly scientific understanding of diagnosis and treatment and a clear focus on evidence-based medicine) impact communication behavior of the medical experts in an Internet forum. Advanced medical students (in their ninth semester of medical training) were recruited as participants. Participation in the online forum was part of a communication training embedded in a gynecology course. We first measured their biomedical therapeutic health concept (hereinafter called "biomedical concept"). Then they participated in an online forum where they answered fictitious patient queries about mammography screening that either included scientific or emotional wording in a between-group design. We analyzed participants' replies with regard to the following dimensions: their use of scientific or emotional wording, the amount of communicated information, and their attempt to build a positive doctor-patient relationship. This study was carried out with 117 medical students (73 women, 41 men, 3 did not indicate their sex). We found evidence that both the wording of patient queries and the participants' biomedical concept influenced participants' response behavior. They answered emotional patient queries in a more emotional way (mean 0.92, SD 1.02) than scientific patient queries (mean 0.26, SD 0.55; t74=3.48, P<.001, d=0.81). We also found a significant interaction effect between participants' use of scientific or emotional wording and type of patient query (F2,74=10.29, P<.01, partial η(2)=0.12) indicating that participants used scientific wording independently of the type of patient query, whereas they used emotional wording particularly when replying to emotional patient queries. In addition, the more pronounced the medical experts' biomedical concept was, the more scientifically (adjusted β=.20; F1,75=2.95, P=.045) and the less emotionally (adjusted β=-.22; F1,74=3.66, P=.03) they replied to patient queries. Finally, we found that participants' biomedical concept predicted their engagement in relationship building (adjusted β=-.26): The more pronounced their biomedical concept was, the less they attempted to build a positive doctor-patient relationship (F1,74=5.39, P=.02). Communication training for medical experts could aim to address this issue of recognizing patients' communication styles and needs in certain situations in order to teach medical experts how to take those aspects adequately into account. In addition, communication training should also make medical experts aware of their individual therapeutic health concepts and the consequential implications in communication situations.

  11. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    PubMed Central

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie PMID:29688379

  12. Automated mapping of clinical terms into SNOMED-CT. An application to codify procedures in pathology.

    PubMed

    Allones, J L; Martinez, D; Taboada, M

    2014-10-01

    Clinical terminologies are considered a key technology for capturing clinical data in a precise and standardized manner, which is critical to accurately exchange information among different applications, medical records and decision support systems. An important step to promote the real use of clinical terminologies, such as SNOMED-CT, is to facilitate the process of finding mappings between local terms of medical records and concepts of terminologies. In this paper, we propose a mapping tool to discover text-to-concept mappings in SNOMED-CT. Name-based techniques were combined with a query expansion system to generate alternative search terms, and with a strategy to analyze and take advantage of the semantic relationships of the SNOMED-CT concepts. The developed tool was evaluated and compared to the search services provided by two SNOMED-CT browsers. Our tool automatically mapped clinical terms from a Spanish glossary of procedures in pathology with 88.0% precision and 51.4% recall, providing a substantial improvement of recall (28% and 60%) over other publicly accessible mapping services. The improvements reached by the mapping tool are encouraging. Our results demonstrate the feasibility of accurately mapping clinical glossaries to SNOMED-CT concepts, by means a combination of structural, query expansion and named-based techniques. We have shown that SNOMED-CT is a great source of knowledge to infer synonyms for the medical domain. Results show that an automated query expansion system overcomes the challenge of vocabulary mismatch partially.

  13. Development of a Model for the Representation of Nanotechnology-Specific Terminology

    PubMed Central

    Bailey, LeeAnn O.; Kennedy, Christopher H.; Fritts, Martin J.; Hartel, Francis W.

    2006-01-01

    Nanotechnology is an important, rapidly-evolving, multidisciplinary field [1]. The tremendous growth in this area necessitates the establishment of a common, open-source terminology to support the diverse biomedical applications of nanotechnology. Currently, the consensus process to define and categorize conceptual entities pertaining to nanotechnology is in a rudimentary stage. We have constructed a nanotechnology-specific conceptual hierarchy that can be utilized by end users to retrieve accurate, controlled terminology regarding emerging nanotechnology and corresponding clinical applications. PMID:17238469

  14. Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

    DTIC Science & Technology

    2014-11-01

    created to serve as idealized representations of actual medical records, and include information such as medical history , current symptoms, diagnosis...NLM Medical Text Indexer (MTI).3 MeSH, or Medical Subject Headings, are terminology used by the NLM to index articles, catalog books, and searching...MeSH- indexed databases such as PubMed. However, since many medical conditions may be expressed in varying terminology , a single representation of a

  15. Clinical terminology support for a national ambulatory practice outcomes research network.

    PubMed

    Ricciardi, Thomas N; Lieberman, Michael I; Kahn, Michael G; Masarie, F E

    2005-01-01

    The Medical Quality Improvement Consortium (MQIC) is a nationwide collaboration of 74 healthcare delivery systems, consisting of 3755 clinicians, who contribute de-identified clinical data from the same commercial electronic medical record (EMR) for quality reporting, outcomes research and clinical research in public health and practice benchmarking. Despite the existence of a common, centrally-managed, shared terminology for core concepts (medications, problem lists, observation names), a substantial "back-end" information management process is required to ensure terminology and data harmonization for creating multi-facility clinically-acceptable queries and comparable results. We describe the information architecture created to support terminology harmonization across this data-sharing consortium and discuss the implications for large scale data sharing envisioned by proponents for the national adoption of ambulatory EMR systems.

  16. Clinical Terminology Support for a National Ambulatory Practice Outcomes Research Network

    PubMed Central

    Ricciardi, Thomas N.; Lieberman, Michael I.; Kahn, Michael G.; Masarie, F.E. “Chip”

    2005-01-01

    The Medical Quality Improvement Consortium (MQIC) is a nationwide collaboration of 74 healthcare delivery systems, consisting of 3755 clinicians, who contribute de-identified clinical data from the same commercial electronic medical record (EMR) for quality reporting, outcomes research and clinical research in public health and practice benchmarking. Despite the existence of a common, centrally-managed, shared terminology for core concepts (medications, problem lists, observation names), a substantial “back-end” information management process is required to ensure terminology and data harmonization for creating multi-facility clinically-acceptable queries and comparable results. We describe the information architecture created to support terminology harmonization across this data-sharing consortium and discuss the implications for large scale data sharing envisioned by proponents for the national adoption of ambulatory EMR systems. PMID:16779116

  17. Learning Terminology in Order to Become an Active Agent in the Development of Basque Biomedical Registers

    ERIC Educational Resources Information Center

    Zabala Unzalu, Igone; San Martin Egia, Itziar; Lersundi Ayestaran, Mikel

    2016-01-01

    The aim of this article is to describe some theoretical and methodological bases underpinning the design of the course Health Communication in Basque (HCB) at the University of the Basque Country (UPV/EHU). Based on some relevant theoretical tenets of the socioterminologic and communicative approaches to Terminology, the authors assume that…

  18. Cluster-Based Query Expansion Using Language Modeling for Biomedical Literature Retrieval

    ERIC Educational Resources Information Center

    Xu, Xuheng

    2011-01-01

    The tremendously huge volume of biomedical literature, scientists' specific information needs, long terms of multiples words, and fundamental problems of synonym and polysemy have been challenging issues facing the biomedical information retrieval community researchers. Search engines have significantly improved the efficiency and effectiveness of…

  19. Understanding PubMed user search behavior through log analysis.

    PubMed

    Islamaj Dogan, Rezarta; Murray, G Craig; Névéol, Aurélie; Lu, Zhiyong

    2009-01-01

    This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.

  20. UCSC genome browser: deep support for molecular biomedical research.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C

    2008-01-01

    The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).

  1. Building Interoperable FHIR-Based Vocabulary Mapping Services: A Case Study of OHDSI Vocabularies and Mappings.

    PubMed

    Jiang, Guoqian; Kiefer, Richard; Prud'hommeaux, Eric; Solbrig, Harold R

    2017-01-01

    The OHDSI Common Data Model (CDM) is a deep information model, in which its vocabulary component plays a critical role in enabling consistent coding and query of clinical data. The objective of the study is to create methods and tools to expose the OHDSI vocabularies and mappings as the vocabulary mapping services using two HL7 FHIR core terminology resources ConceptMap and ValueSet. We discuss the benefits and challenges in building the FHIR-based terminology services.

  2. G-Bean: an ontology-graph based web tool for biomedical literature retrieval

    PubMed Central

    2014-01-01

    Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588

  3. G-Bean: an ontology-graph based web tool for biomedical literature retrieval.

    PubMed

    Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S

    2014-01-01

    Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

  4. Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes

    PubMed Central

    Ganesan, Kavita; Lloyd, Shane; Sarkar, Vikren

    2016-01-01

    The ability to find highly related clinical concepts is essential for many applications such as for hypothesis generation, query expansion for medical literature search, search results filtering, ICD-10 code filtering and many other applications. While manually constructed medical terminologies such as SNOMED CT can surface certain related concepts, these terminologies are inadequate as they depend on expertise of several subject matter experts making the terminology curation process open to geographic and language bias. In addition, these terminologies also provide no quantifiable evidence on how related the concepts are. In this work, we explore an unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes. Our evaluation shows that we are able to use a data driven approach to discovering highly related concepts for various search terms including medications, symptoms and diseases. PMID:27656096

  5. Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine

    ERIC Educational Resources Information Center

    Alghoson, Abdullah

    2017-01-01

    Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…

  6. Ontology-Based Vaccine Adverse Event Representation and Analysis.

    PubMed

    Xie, Jiangan; He, Yongqun

    2017-01-01

    Vaccine is the one of the greatest inventions of modern medicine that has contributed most to the relief of human misery and the exciting increase in life expectancy. In 1796, an English country physician, Edward Jenner, discovered that inoculating mankind with cowpox can protect them from smallpox (Riedel S, Edward Jenner and the history of smallpox and vaccination. Proceedings (Baylor University. Medical Center) 18(1):21, 2005). Based on the vaccination worldwide, we finally succeeded in the eradication of smallpox in 1977 (Henderson, Vaccine 29:D7-D9, 2011). Other disabling and lethal diseases, like poliomyelitis and measles, are targeted for eradication (Bonanni, Vaccine 17:S120-S125, 1999).Although vaccine development and administration are tremendously successful and cost-effective practices to human health, no vaccine is 100% safe for everyone because each person reacts to vaccinations differently given different genetic background and health conditions. Although all licensed vaccines are generally safe for the majority of people, vaccinees may still suffer adverse events (AEs) in reaction to various vaccines, some of which can be serious or even fatal (Haber et al., Drug Saf 32(4):309-323, 2009). Hence, the double-edged sword of vaccination remains a concern.To support integrative AE data collection and analysis, it is critical to adopt an AE normalization strategy. In the past decades, different controlled terminologies, including the Medical Dictionary for Regulatory Activities (MedDRA) (Brown EG, Wood L, Wood S, et al., Drug Saf 20(2):109-117, 1999), the Common Terminology Criteria for Adverse Events (CTCAE) (NCI, The Common Terminology Criteria for Adverse Events (CTCAE). Available from: http://evs.nci.nih.gov/ftp1/CTCAE/About.html . Access on 7 Oct 2015), and the World Health Organization (WHO) Adverse Reactions Terminology (WHO-ART) (WHO, The WHO Adverse Reaction Terminology - WHO-ART. Available from: https://www.umc-products.com/graphics/28010.pdf ), have been developed with a specific aim to standardize AE categorization. However, these controlled terminologies have many drawbacks, such as lack of textual definitions, poorly defined hierarchies, and lack of semantic axioms that provide logical relations among terms. A biomedical ontology is a set of consensus-based and computer and human interpretable terms and relations that represent entities in a specific biomedical domain and how they relate each other. To represent and analyze vaccine adverse events (VAEs), our research group has initiated and led the development of a community-based ontology: the Ontology of Adverse Events (OAE) (He et al., J Biomed Semant 5:29, 2014). The OAE has been found to have advantages to overcome the drawbacks of those controlled terminologies (He et al., Curr Pharmacol Rep :1-16. doi:10.1007/s40495-016-0055-0, 2014). By expanding the OAE and the community-based Vaccine Ontology (VO) (He et al., VO: vaccine ontology. In The 1st International Conference on Biomedical Ontology (ICBO-2009). Nature Precedings, Buffalo. http://precedings.nature.com/documents/3552/version/1 ; J Biomed Semant 2(Suppl 2):S8; J Biomed Semant 3(1):17, 2009; Ozgur et al., J Biomed Semant 2(2):S8, 2011; Lin Y, He Y, J Biomed Semant 3(1):17, 2012), we have also developed the Ontology of Vaccine Adverse Events (OVAE) to represent known VAEs associated with licensed vaccines (Marcos E, Zhao B, He Y, J Biomed Semant 4:40, 2013).In this book chapter, we will first introduce the basic information of VAEs, VAE safety surveillance systems, and how to specifically query and analyze VAEs using the US VAE database VAERS (Chen et al., Vaccine 12(10):960-960, 1994). In the second half of the chapter, we will introduce the development and applications of the OAE and OVAE. Throughout this chapter, we will use the influenza vaccine Flublok as the vaccine example to launch the corresponding elaboration (Huber VC, McCullers JA, Curr Opin Mol Ther 10(1):75-85, 2008). Flublok is a recombinant hemagglutinin influenza vaccine indicated for active immunization against disease caused by influenza virus subtypes A and type B. On January 16, 2013, Flublok was approved by the FDA for the prevention of seasonal influenza in people 18 years and older in the USA. Now, more than 3 years later, an exploration of the reported AEs associated with this vaccine is urgently needed.

  7. A comparison of two methods for retrieving ICD-9-CM data: the effect of using an ontology-based method for handling terminology changes.

    PubMed

    Yu, Alexander C; Cimino, James J

    2011-04-01

    Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information systems. However, they suffer from a number of limitations that make data reuse difficult. Relatively recently, it has been proposed that formal ontological methods can be applied to some of the problems of terminological design. Biomedical ontologies organize concepts (embodiments of knowledge about biomedical reality) whereas terminologies organize terms (what is used to code patient data at a certain point in time, based on the particular terminology version). However, the application of these methods to existing terminologies is not straightforward. The use of these terminologies is firmly entrenched in many systems, and what might seem to be a simple option of replacing these terminologies is not possible. Moreover, these terminologies evolve over time in order to suit the needs of users. Any methodology must therefore take these constraints into consideration, hence the need for formal methods of managing changes. Along these lines, we have developed a formal representation of the concept-term relation, around which we have also developed a methodology for management of terminology changes. The objective of this study was to determine whether our methodology would result in improved retrieval of data. Comparison of two methods for retrieving data encoded with terms from the International Classification of Diseases (ICD-9-CM), based on their recall when retrieving data for ICD-9-CM terms whose codes had changed but which had retained their original meaning (code change). Recall and interclass correlation coefficient. Statistically significant differences were detected (p<0.05) with the McNemar test for two terms whose codes had changed. Furthermore, when all the cases are combined in an overall category, our method also performs statistically significantly better (p<0.05). Our study shows that an ontology-based ICD-9-CM data retrieval method that takes into account the effects of terminology changes performs better on recall than one that does not in the retrieval of data for terms whose codes had changed but which retained their original meaning. Copyright © 2011 Elsevier Inc. All rights reserved.

  8. A Comparison of Two Methods for Retrieving ICD-9-CM data: The Effect of Using an Ontology-based Method for Handling Terminology Changes

    PubMed Central

    Yu, Alexander C.; Cimino, James J.

    2012-01-01

    Objective Most existing controlled terminologies can be characterized as collections of terms, wherein the terms are arranged in a simple list or organized in a hierarchy. These kinds of terminologies are considered useful for standardizing terms and encoding data and are currently used in many existing information systems. However, they suffer from a number of limitations that make data reuse difficult. Relatively recently, it has been proposed that formal ontological methods can be applied to some of the problems of terminological design. Biomedical ontologies organize concepts (embodiments of knowledge about biomedical reality) whereas terminologies organize terms (what is used to code patient data at a certain point in time, based on the particular terminology version). However, the application of these methods to existing terminologies is not straightforward. The use of these terminologies is firmly entrenched in many systems, and what might seem to be a simple option of replacing these terminologies is not possible. Moreover, these terminologies evolve over time in order to suit the needs of users. Any methodology must therefore take these constraints into consideration, hence the need for formal methods of managing changes. Along these lines, we have developed a formal representation of the concept-term relation, around which we have also developed a methodology for management of terminology changes. The objective of this study was to determine whether our methodology would result in improved retrieval of data. Design Comparison of two methods for retrieving data encoded with terms from the International Classification of Diseases (ICD-9-CM), based on their recall when retrieving data for ICD-9-CM terms whose codes had changed but which had retained their original meaning (code change). Measurements Recall and interclass correlation coefficient. Results Statistically significant differences were detected (p<0.05) with the McNemar test for two terms whose codes had changed. Furthermore, when all the cases are combined in an overall category, our method also performs statistically significantly better (p < 0.05). Conclusion Our study shows that an ontology-based ICD-9-CM data retrieval method that takes into account the effects of terminology changes performs better on recall than one that does not in the retrieval of data for terms whose codes had changed but which retained their original meaning. PMID:21262390

  9. Improving information retrieval with multiple health terminologies in a quality-controlled gateway.

    PubMed

    Soualmia, Lina F; Sakji, Saoussen; Letord, Catherine; Rollin, Laetitia; Massari, Philippe; Darmoni, Stéfan J

    2013-01-01

    The Catalog and Index of French-language Health Internet resources (CISMeF) is a quality-controlled health gateway, primarily for Web resources in French (n=89,751). Recently, we achieved a major improvement in the structure of the catalogue by setting-up multiple terminologies, based on twelve health terminologies available in French, to overcome the potential weakness of the MeSH thesaurus, which is the main and pivotal terminology we use for indexing and retrieval since 1995. The main aim of this study was to estimate the added-value of exploiting several terminologies and their semantic relationships to improve Web resource indexing and retrieval in CISMeF, in order to provide additional health resources which meet the users' expectations. Twelve terminologies were integrated into the CISMeF information system to set up multiple-terminologies indexing and retrieval. The same sets of thirty queries were run: (i) by exploiting the hierarchical structure of the MeSH, and (ii) by exploiting the additional twelve terminologies and their semantic links. The two search modes were evaluated and compared. The overall coverage of the multiple-terminologies search mode was improved by comparison to the coverage of using the MeSH (16,283 vs. 14,159) (+15%). These additional findings were estimated at 56.6% relevant results, 24.7% intermediate results and 18.7% irrelevant. The multiple-terminologies approach improved information retrieval. These results suggest that integrating additional health terminologies was able to improve recall. Since performing the study, 21 other terminologies have been added which should enable us to make broader studies in multiple-terminologies information retrieval.

  10. Developing tools and resources for the biomedical domain of the Greek language.

    PubMed

    Vagelatos, Aristides; Mantzari, Elena; Pantazara, Mavina; Tsalidis, Christos; Kalamara, Chryssoula

    2011-06-01

    This paper presents the design and implementation of terminological and specialized textual resources that were produced in the framework of the Greek research project "IATROLEXI". The aim of the project was to create the critical infrastructure for the Greek language, i.e. linguistic resources and tools for use in high level Natural Language Processing (NLP) applications in the domain of biomedicine. The project was built upon existing resources developed by the project partners and further enhanced within its framework, i.e. a Greek morphological lexicon of about 100,000 words, and language processing tools such as a lemmatiser and a morphosyntactic tagger. Christos Tsalidis, Additionally, it developed new assets, such as a specialized corpus of biomedical texts and an ontology of medical terminology.

  11. BioTCM-SE: a semantic search engine for the information retrieval of modern biology and traditional Chinese medicine.

    PubMed

    Chen, Xi; Chen, Huajun; Bi, Xuan; Gu, Peiqin; Chen, Jiaoyan; Wu, Zhaohui

    2014-01-01

    Understanding the functional mechanisms of the complex biological system as a whole is drawing more and more attention in global health care management. Traditional Chinese Medicine (TCM), essentially different from Western Medicine (WM), is gaining increasing attention due to its emphasis on individual wellness and natural herbal medicine, which satisfies the goal of integrative medicine. However, with the explosive growth of biomedical data on the Web, biomedical researchers are now confronted with the problem of large-scale data analysis and data query. Besides that, biomedical data also has a wide coverage which usually comes from multiple heterogeneous data sources and has different taxonomies, making it hard to integrate and query the big biomedical data. Embedded with domain knowledge from different disciplines all regarding human biological systems, the heterogeneous data repositories are implicitly connected by human expert knowledge. Traditional search engines cannot provide accurate and comprehensive search results for the semantically associated knowledge since they only support keywords-based searches. In this paper, we present BioTCM-SE, a semantic search engine for the information retrieval of modern biology and TCM, which provides biologists with a comprehensive and accurate associated knowledge query platform to greatly facilitate the implicit knowledge discovery between WM and TCM.

  12. BioTCM-SE: A Semantic Search Engine for the Information Retrieval of Modern Biology and Traditional Chinese Medicine

    PubMed Central

    Chen, Xi; Chen, Huajun; Bi, Xuan; Gu, Peiqin; Chen, Jiaoyan; Wu, Zhaohui

    2014-01-01

    Understanding the functional mechanisms of the complex biological system as a whole is drawing more and more attention in global health care management. Traditional Chinese Medicine (TCM), essentially different from Western Medicine (WM), is gaining increasing attention due to its emphasis on individual wellness and natural herbal medicine, which satisfies the goal of integrative medicine. However, with the explosive growth of biomedical data on the Web, biomedical researchers are now confronted with the problem of large-scale data analysis and data query. Besides that, biomedical data also has a wide coverage which usually comes from multiple heterogeneous data sources and has different taxonomies, making it hard to integrate and query the big biomedical data. Embedded with domain knowledge from different disciplines all regarding human biological systems, the heterogeneous data repositories are implicitly connected by human expert knowledge. Traditional search engines cannot provide accurate and comprehensive search results for the semantically associated knowledge since they only support keywords-based searches. In this paper, we present BioTCM-SE, a semantic search engine for the information retrieval of modern biology and TCM, which provides biologists with a comprehensive and accurate associated knowledge query platform to greatly facilitate the implicit knowledge discovery between WM and TCM. PMID:24772189

  13. [Presence and characteristics of nursing terminology in Wikipedia].

    PubMed

    Sanz-Lorente, María; Guardiola-Wanden-Berghe, Rocío; Wanden-Berghe, Carmina; Sanz-Valero, Javier

    2013-10-01

    To determine the presence and consultations with nurse terminology in the Spanish edition of Wikipedia, and to analyze the differences with the English edition. We confirmed the existence of terminology via the Internet by the access to the Spanish and English editions of Wikipedia. We calculated the study sample (n = 386) from the 1840 nursery terms. 337 were found in the Spanish edition and 350 in the English. We found significant differences between the two editions (p < 0.001). Also differences were winched on to the number of references in terms (p < 0.001). However, there were not differences in the update/obsolescence of information, neither in the number of queries. The entries (articles) on nursing terminology in the Spanish edition of Wikipedia, has not yet reached an optimum level. Differences between Spanish and English editions of Wikipedia are more related to term existence than adequacy of information.

  14. Issues in the design of a pilot concept-based query interface for the neuroinformatics information framework.

    PubMed

    Marenco, Luis; Li, Yuli; Martone, Maryann E; Sternberg, Paul W; Shepherd, Gordon M; Miller, Perry L

    2008-09-01

    This paper describes a pilot query interface that has been constructed to help us explore a "concept-based" approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface.

  15. Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

    PubMed Central

    Li, Yuli; Martone, Maryann E.; Sternberg, Paul W.; Shepherd, Gordon M.; Miller, Perry L.

    2009-01-01

    This paper describes a pilot query interface that has been constructed to help us explore a “concept-based” approach for searching the Neuroscience Information Framework (NIF). The query interface is concept-based in the sense that the search terms submitted through the interface are selected from a standardized vocabulary of terms (concepts) that are structured in the form of an ontology. The NIF contains three primary resources: the NIF Resource Registry, the NIF Document Archive, and the NIF Database Mediator. These NIF resources are very different in their nature and therefore pose challenges when designing a single interface from which searches can be automatically launched against all three resources simultaneously. The paper first discusses briefly several background issues involving the use of standardized biomedical vocabularies in biomedical information retrieval, and then presents a detailed example that illustrates how the pilot concept-based query interface operates. The paper concludes by discussing certain lessons learned in the development of the current version of the interface. PMID:18953674

  16. Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies

    PubMed Central

    Fan, Jung-Wei; Friedman, Carol

    2011-01-01

    Biomedical natural language processing (BioNLP) is a useful technique that unlocks valuable information stored in textual data for practice and/or research. Syntactic parsing is a critical component of BioNLP applications that rely on correctly determining the sentence and phrase structure of free text. In addition to dealing with the vast amount of domain-specific terms, a robust biomedical parser needs to model the semantic grammar to obtain viable syntactic structures. With either a rule-based or corpus-based approach, the grammar engineering process requires substantial time and knowledge from experts, and does not always yield a semantically transferable grammar. To reduce the human effort and to promote semantic transferability, we propose an automated method for deriving a probabilistic grammar based on a training corpus consisting of concept strings and semantic classes from the Unified Medical Language System (UMLS), a comprehensive terminology resource widely used by the community. The grammar is designed to specify noun phrases only due to the nominal nature of the majority of biomedical terminological concepts. Evaluated on manually parsed clinical notes, the derived grammar achieved a recall of 0.644, precision of 0.737, and average cross-bracketing of 0.61, which demonstrated better performance than a control grammar with the semantic information removed. Error analysis revealed shortcomings that could be addressed to improve performance. The results indicated the feasibility of an approach which automatically incorporates terminology semantics in the building of an operational grammar. Although the current performance of the unsupervised solution does not adequately replace manual engineering, we believe once the performance issues are addressed, it could serve as an aide in a semi-supervised solution. PMID:21549857

  17. Accessing Biomedical Literature in the Current Information Landscape

    PubMed Central

    Khare, Ritu; Leaman, Robert; Lu, Zhiyong

    2015-01-01

    i. Summary Biomedical and life sciences literature is unique because of its exponentially increasing volume and interdisciplinary nature. Biomedical literature access is essential for several types of users including biomedical researchers, clinicians, database curators, and bibliometricians. In the past few decades, several online search tools and literature archives, generic as well as biomedicine-specific, have been developed. We present this chapter in the light of three consecutive steps of literature access: searching for citations, retrieving full-text, and viewing the article. The first section presents the current state of practice of biomedical literature access, including an analysis of the search tools most frequently used by the users, including PubMed, Google Scholar, Web of Science, Scopus, and Embase, and a study on biomedical literature archives such as PubMed Central. The next section describes current research and the state-of-the-art systems motivated by the challenges a user faces during query formulation and interpretation of search results. The research solutions are classified into five key areas related to text and data mining, text similarity search, semantic search, query support, relevance ranking, and clustering results. Finally, the last section describes some predicted future trends for improving biomedical literature access, such as searching and reading articles on portable devices, and adoption of the open access policy. PMID:24788259

  18. Toward a Cognitive Task Analysis for Biomedical Query Mediation

    PubMed Central

    Hruby, Gregory W.; Cimino, James J.; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM. PMID:25954589

  19. Toward a cognitive task analysis for biomedical query mediation.

    PubMed

    Hruby, Gregory W; Cimino, James J; Patel, Vimla; Weng, Chunhua

    2014-01-01

    In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.

  20. Enabling online studies of conceptual relationships between medical terms: developing an efficient web platform.

    PubMed

    Albin, Aaron; Ji, Xiaonan; Borlawsky, Tara B; Ye, Zhan; Lin, Simon; Payne, Philip Ro; Huang, Kun; Xiang, Yang

    2014-10-07

    The Unified Medical Language System (UMLS) contains many important ontologies in which terms are connected by semantic relations. For many studies on the relationships between biomedical concepts, the use of transitively associated information from ontologies and the UMLS has been shown to be effective. Although there are a few tools and methods available for extracting transitive relationships from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources. Our goal was to design an efficient online platform that enables efficient studies on the conceptual relationships between any medical terms. To overcome the restrictions of available methods and to facilitate studies on the conceptual relationships between medical terms, we developed a Web platform, onGrid, that supports efficient transitive queries and conceptual relationship studies using the UMLS. This framework uses the latest technique in converting natural language queries into UMLS concepts, performs efficient transitive queries, and visualizes the result paths. It also dynamically builds a relationship matrix for two sets of input biomedical terms. We are thus able to perform effective studies on conceptual relationships between medical terms based on their relationship matrix. The advantage of onGrid is that it can be applied to study any two sets of biomedical concept relations and the relations within one set of biomedical concepts. We use onGrid to study the disease-disease relationships in the Online Mendelian Inheritance in Man (OMIM). By crossvalidating our results with an external database, the Comparative Toxicogenomics Database (CTD), we demonstrated that onGrid is effective for the study of conceptual relationships between medical terms. onGrid is an efficient tool for querying the UMLS for transitive relations, studying the relationship between medical terms, and generating hypotheses.

  1. Semi-Automated Annotation of Biobank Data Using Standard Medical Terminologies in a Graph Database.

    PubMed

    Hofer, Philipp; Neururer, Sabrina; Goebel, Georg

    2016-01-01

    Data describing biobank resources frequently contains unstructured free-text information or insufficient coding standards. (Bio-) medical ontologies like Orphanet Rare Diseases Ontology (ORDO) or the Human Disease Ontology (DOID) provide a high number of concepts, synonyms and entity relationship properties. Such standard terminologies increase quality and granularity of input data by adding comprehensive semantic background knowledge from validated entity relationships. Moreover, cross-references between terminology concepts facilitate data integration across databases using different coding standards. In order to encourage the use of standard terminologies, our aim is to identify and link relevant concepts with free-text diagnosis inputs within a biobank registry. Relevant concepts are selected automatically by lexical matching and SPARQL queries against a RDF triplestore. To ensure correctness of annotations, proposed concepts have to be confirmed by medical data administration experts before they are entered into the registry database. Relevant (bio-) medical terminologies describing diseases and phenotypes were identified and stored in a graph database which was tied to a local biobank registry. Concept recommendations during data input trigger a structured description of medical data and facilitate data linkage between heterogeneous systems.

  2. Graphical Methods for Reducing, Visualizing and Analyzing Large Data Sets Using Hierarchical Terminologies

    PubMed Central

    Jing, Xia; Cimino, James J.

    2011-01-01

    Objective: To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology. Methods: We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as well as contributions of child level nodes to parent level nodes. Results: We found that our methods can reduce large data sets to manageable size and highlight the differences among graphs. The thresholds used as filters to reduce the data set can be used alone or in combination. We applied our methods to two data sets containing information about how nurses and physicians query online knowledge resources. The reduced graphs make the differences between the two groups readily apparent. Conclusions: This is a new approach to reduce size and complexity of large data sets and to simplify visualization. This approach can be applied to any data sets that are coded with hierarchical terminologies. PMID:22195119

  3. A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.

    PubMed

    Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-12-01

    PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.

  4. A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

    PubMed Central

    Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-01-01

    Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528

  5. Using the Abstraction Network in Complement to Description Logics for Quality Assurance in Biomedical Terminologies - A Case Study in SNOMED CT

    PubMed Central

    Wei, Duo; Bodenreider, Olivier

    2015-01-01

    Objectives To investigate errors identified in SNOMED CT by human reviewers with help from the Abstraction Network methodology and examine why they had escaped detection by the Description Logic (DL) classifier. Case study; Two examples of errors are presented in detail (one missing IS-A relation and one duplicate concept). After correction, SNOMED CT is reclassified to ensure that no new inconsistency was introduced. Conclusions DL-based auditing techniques built in terminology development environments ensure the logical consistency of the terminology. However, complementary approaches are needed for identifying and addressing other types of errors. PMID:20841848

  6. Using the abstraction network in complement to description logics for quality assurance in biomedical terminologies - a case study in SNOMED CT.

    PubMed

    Wei, Duo; Bodenreider, Olivier

    2010-01-01

    To investigate errors identified in SNOMED CT by human reviewers with help from the Abstraction Network methodology and examine why they had escaped detection by the Description Logic (DL) classifier. Case study; Two examples of errors are presented in detail (one missing IS-A relation and one duplicate concept). After correction, SNOMED CT is reclassified to ensure that no new inconsistency was introduced. DL-based auditing techniques built in terminology development environments ensure the logical consistency of the terminology. However, complementary approaches are needed for identifying and addressing other types of errors.

  7. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    PubMed

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  8. KaBOB: ontology-based semantic integration of biomedical databases.

    PubMed

    Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

    2015-04-23

    The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.

  9. Biotea: semantics for Pubmed Central.

    PubMed

    Garcia, Alexander; Lopez, Federico; Garcia, Leyla; Giraldo, Olga; Bucheli, Victor; Dumontier, Michel

    2018-01-01

    A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.

  10. Essie: A Concept-based Search Engine for Structured Biomedical Text

    PubMed Central

    Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina

    2007-01-01

    This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729

  11. Recommending images of user interests from the biomedical literature

    NASA Astrophysics Data System (ADS)

    Clukey, Steven; Xu, Songhua

    2013-03-01

    Every year hundreds of thousands of biomedical images are published in journals and conferences. Consequently, finding images relevant to one's interests becomes an ever daunting task. This vast amount of literature creates a need for intelligent and easy-to-use tools that can help researchers effectively navigate through the content corpus and conveniently locate materials of their interests. Traditionally, literature search tools allow users to query content using topic keywords. However, manual query composition is often time and energy consuming. A better system would be one that can automatically deliver relevant content to a researcher without having the end user manually manifest one's search intent and interests via search queries. Such a computer-aided assistance for information access can be provided by a system that first determines a researcher's interests automatically and then recommends images relevant to the person's interests accordingly. The technology can greatly improve a researcher's ability to stay up to date in their fields of study by allowing them to efficiently browse images and documents matching their needs and interests among the vast amount of the biomedical literature. A prototype system implementation of the technology can be accessed via http://www.smartdataware.com.

  12. Biomedical Requirements for High Productivity Computing Systems

    DTIC Science & Technology

    2005-04-01

    server at http://www.ncbi.nlm.nih.gov/BLAST/. There are many variants of BLAST, including: 1. BLASTN - Compares a DNA query to a DNA database. Searches ...database (3 reading frames from each strand of the DNA) searching . 13 4. TBLASTN - Compares a protein query to a DNA database, in the 6 possible...the molecular during this phase. After eliminating molecules that could not match the query , an atom-by-atom search for the molecules in conducted

  13. TopFed: TCGA tailored federated query processing and linking to LOD.

    PubMed

    Saleem, Muhammad; Padmanabhuni, Shanmukha S; Ngomo, Axel-Cyrille Ngonga; Iqbal, Aftab; Almeida, Jonas S; Decker, Stefan; Deus, Helena F

    2014-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.

  14. MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks.

    PubMed

    Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans; Swertz, Morris A

    2016-07-15

    While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect : m.a.swertz@rug.nl Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  15. MOLGENIS/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks

    PubMed Central

    Pang, Chao; van Enckevort, David; de Haan, Mark; Kelpin, Fleur; Jetten, Jonathan; Hendriksen, Dennis; de Boer, Tommy; Charbon, Bart; Winder, Erwin; van der Velde, K. Joeri; Doiron, Dany; Fortier, Isabel; Hillege, Hans

    2016-01-01

    Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration. Results: To address this challenge, we developed MOLGENIS/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data. Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http://github.com/molgenis/molgenis and www.molgenis.org/connect. Contact: m.a.swertz@rug.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153686

  16. Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data

    PubMed Central

    Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T.; Girard, F. Pascal; Stephens, Robert M.

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework. PMID:24312478

  17. Knowledge and theme discovery across very large biological data sets using distributed queries: a prototype combining unstructured and structured data.

    PubMed

    Mudunuri, Uma S; Khouja, Mohamad; Repetski, Stephen; Venkataraman, Girish; Che, Anney; Luke, Brian T; Girard, F Pascal; Stephens, Robert M

    2013-01-01

    As the discipline of biomedical science continues to apply new technologies capable of producing unprecedented volumes of noisy and complex biological data, it has become evident that available methods for deriving meaningful information from such data are simply not keeping pace. In order to achieve useful results, researchers require methods that consolidate, store and query combinations of structured and unstructured data sets efficiently and effectively. As we move towards personalized medicine, the need to combine unstructured data, such as medical literature, with large amounts of highly structured and high-throughput data such as human variation or expression data from very large cohorts, is especially urgent. For our study, we investigated a likely biomedical query using the Hadoop framework. We ran queries using native MapReduce tools we developed as well as other open source and proprietary tools. Our results suggest that the available technologies within the Big Data domain can reduce the time and effort needed to utilize and apply distributed queries over large datasets in practical clinical applications in the life sciences domain. The methodologies and technologies discussed in this paper set the stage for a more detailed evaluation that investigates how various data structures and data models are best mapped to the proper computational framework.

  18. Knowledge Representation and Management, It's Time to Integrate!

    PubMed

    Dhombres, F; Charlet, J

    2017-08-01

    Objectives: To select, present, and summarize the best papers published in 2016 in the field of Knowledge Representation and Management (KRM). Methods: A comprehensive and standardized review of the medical informatics literature was performed based on a PubMed query. Results: Among the 1,421 retrieved papers, the review process resulted in the selection of four best papers focused on the integration of heterogeneous data via the development and the alignment of terminological resources. In the first article, the authors provide a curated and standardized version of the publicly available US FDA Adverse Event Reporting System. Such a resource will improve the quality of the underlying data, and enable standardized analyses using common vocabularies. The second article describes a project developed in order to facilitate heterogeneous data integration in the i2b2 framework. The originality is to allow users integrate the data described in different terminologies and to build a new repository, with a unique model able to support the representation of the various data. The third paper is dedicated to model the association between multiple phenotypic traits described within the Human Phenotype Ontology (HPO) and the corresponding genotype in the specific context of rare diseases (rare variants). Finally, the fourth paper presents solutions to annotation-ontology mapping in genome-scale data. Of particular interest in this work is the Experimental Factor Ontology (EFO) and its generic association model, the Ontology of Biomedical AssociatioN (OBAN). Conclusion: Ontologies have started to show their efficiency to integrate medical data for various tasks in medical informatics: electronic health records data management, clinical research, and knowledge-based systems development. Georg Thieme Verlag KG Stuttgart.

  19. Multi-terminology indexing for the assignment of MeSH descriptors to medical abstracts in French.

    PubMed

    Pereira, Suzanne; Sakji, Saoussen; Névéol, Aurélie; Kergourlay, Ivan; Kerdelhué, Gaétan; Serrot, Elisabeth; Joubert, Michel; Darmoni, Stéfan J

    2009-11-14

    To facilitate information retrieval in the biomedical domain, a system for the automatic assignment of Medical Subject Headings to documents curated by an online quality-controlled health gateway was implemented. The French Multi-Terminology Indexer (F-MTI) implements a multiterminology approach using nine main medical terminologies in French and the mappings between them. This paper presents recent efforts to assess the added value of (a) integrating four new terminologies (Orphanet, ATC, drug names, MeSH supplementary concepts) into F-MTI's knowledge sources and (b) performing the automatic indexing on the titles and abstracts (vs. title only) of the online health resources. F-MTI was evaluated on a CISMeF corpus comprising 18,161 manually indexed resources. The performance of F-MTI including nine health terminologies on CISMeF resources with Title only was 27.9% precision and 19.7% recall, while the performance on CISMeF resources with Title and Abstract is 14.9 % precision (-13.0%) and 25.9% recall (+6.2%). In a few weeks, CISMeF will launch the indexing of resources based on title and abstract, using nine terminologies.

  20. Partial automation of database processing of simulation outputs from L-systems models of plant morphogenesis.

    PubMed

    Chen, Yi- Ping Phoebe; Hanan, Jim

    2002-01-01

    Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.

  1. Environmental/chemical thesaurus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shriner, C.R.; Dailey, N.S.; Jordan, A.C.

    The Environmental/Chemical Thesaurus approaches scientific language control problems from a multidisciplinary view. The Environmental/Biomedical Terminology Index (EBTI) was used as a base for the present thesaurus. The Environmental/Chemical Thesaurus, funded by the Environmental Protection Agency, used as its source of new terms those major terms found in 13 Environmental Protection Agency data bases. The scope of this thesaurus includes not only environmental and biomedical sciences, but also the physical sciences with emphasis placed on chemistry. Specific chemical compounds are not included; only classes of chemicals are given. To adhere to this level of classification, drugs and pesticides are identified bymore » class rather than by specific chemical name. An attempt was also made to expand the areas of sociology and economics. Terminology dealing with law, demography, and geography was expanded. Proper names of languages and races were excluded. Geographic terms were expanded to include proper names for oceans, continents, major lakes, rivers, and islands. Political divisions were added to allow for proper names of countries and states. With such a broad scope, terminology for specific sciences does not provide for indexing to the lowest levels in plant, animal, or chemical classifications.« less

  2. KISTI at TREC 2014 Clinical Decision Support Track: Concept-based Document Re-ranking to Biomedical Information Retrieval

    DTIC Science & Technology

    2014-11-01

    sematic type. Injury or Poisoning inpo T037 Anatomical Abnormality anab T190 Given a document D, a concept vector = {1, 2, … , ...integrating biomedical terminology . Nucleic acids research 32, Database issue (2004), 267–270. 5. Chapman, W.W., Hillert, D., Velupillai, S., et...Conference (TREC), (2011). 9. Koopman, B. and Zuccon, G. Understanding negation and family history to improve clinical information retrieval. Proceedings

  3. ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.

    PubMed

    Kamdar, Maulik R; Zeginis, Dimitris; Hasnain, Ali; Decker, Stefan; Deus, Helena F

    2014-02-01

    Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Master's level education in biomedical optics: four-year experience at the University of Latvia

    NASA Astrophysics Data System (ADS)

    Spigulis, Janis

    2000-06-01

    Pilot program for Master's studies on Biomedical Optics has been developed and launched at University of Latvia in 1995. The Curriculum contains several basic subjects like Fundamentals of Biomedical Optics, Medical Lightguides, Anatomy and Physiology, Lasers and Non-coherent Light Sources, Optical Instrumentation for Healthcare, Optical Methods for Patient Treatment, Basic Physics, etc. Special English Terminology and Laboratory-Clinical Praxis are also involved, and the Master Theses is the final step for the degree award. Following one four-year teaching experience, some observations, conclusions and eventual future activities are discussed.

  5. BOSS: context-enhanced search for biomedical objects

    PubMed Central

    2012-01-01

    Background There exist many academic search solutions and most of them can be put on either ends of spectrum: general-purpose search and domain-specific "deep" search systems. The general-purpose search systems, such as PubMed, offer flexible query interface, but churn out a list of matching documents that users have to go through the results in order to find the answers to their queries. On the other hand, the "deep" search systems, such as PPI Finder and iHOP, return the precompiled results in a structured way. Their results, however, are often found only within some predefined contexts. In order to alleviate these problems, we introduce a new search engine, BOSS, Biomedical Object Search System. Methods Unlike the conventional search systems, BOSS indexes segments, rather than documents. A segment refers to a Maximal Coherent Semantic Unit (MCSU) such as phrase, clause or sentence that is semantically coherent in the given context (e.g., biomedical objects or their relations). For a user query, BOSS finds all matching segments, identifies the objects appearing in those segments, and aggregates the segments for each object. Finally, it returns the ranked list of the objects along with their matching segments. Results The working prototype of BOSS is available at http://boss.korea.ac.kr. The current version of BOSS has indexed abstracts of more than 20 million articles published during last 16 years from 1996 to 2011 across all science disciplines. Conclusion BOSS fills the gap between either ends of the spectrum by allowing users to pose context-free queries and by returning a structured set of results. Furthermore, BOSS exhibits the characteristic of good scalability, just as with conventional document search engines, because it is designed to use a standard document-indexing model with minimal modifications. Considering the features, BOSS notches up the technological level of traditional solutions for search on biomedical information. PMID:22595092

  6. Framing Electronic Medical Records as Polylingual Documents in Query Expansion

    PubMed Central

    Huang, Edward W; Wang, Sheng; Lee, Doris Jung-Lin; Zhang, Runshun; Liu, Baoyan; Zhou, Xuezhong; Zhai, ChengXiang

    2017-01-01

    We present a study of electronic medical record (EMR) retrieval that emulates situations in which a doctor treats a new patient. Given a query consisting of a new patient’s symptoms, the retrieval system returns the set of most relevant records of previously treated patients. However, due to semantic, functional, and treatment synonyms in medical terminology, queries are often incomplete and thus require enhancement. In this paper, we present a topic model that frames symptoms and treatments as separate languages. Our experimental results show that this method improves retrieval performance over several baselines with statistical significance. These baselines include methods used in prior studies as well as state-of-the-art embedding techniques. Finally, we show that our proposed topic model discovers all three types of synonyms to improve medical record retrieval. PMID:29854161

  7. Multi-terminology indexing for the assignment of MeSH descriptors to medical abstracts in French

    PubMed Central

    Pereira, Suzanne; Sakji, Saoussen; Névéol, Aurélie; Kergourlay, Ivan; Kerdelhué, Gaétan; Serrot, Elisabeth; Joubert, Michel; Darmoni, Stéfan J.

    2009-01-01

    Background: To facilitate information retrieval in the biomedical domain, a system for the automatic assignment of Medical Subject Headings to documents curated by an online quality-controlled health gateway was implemented. The French Multi-Terminology Indexer (F-MTI) implements a multiterminology approach using nine main medical terminologies in French and the mappings between them. Objective: This paper presents recent efforts to assess the added value of (a) integrating four new terminologies (Orphanet, ATC, drug names, MeSH supplementary concepts) into F-MTI’s knowledge sources and (b) performing the automatic indexing on the titles and abstracts (vs. title only) of the online health resources. Methods: F-MTI was evaluated on a CISMeF corpus comprising 18,161 manually indexed resources. Results: The performance of F-MTI including nine health terminologies on CISMeF resources with Title only was 27.9% precision and 19.7% recall, while the performance on CISMeF resources with Title and Abstract is 14.9 % precision (−13.0%) and 25.9% recall (+6.2%). Conclusion: In a few weeks, CISMeF will launch the indexing of resources based on title and abstract, using nine terminologies. PMID:20351910

  8. Query Log Analysis of an Electronic Health Record Search Engine

    PubMed Central

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  9. Optimizing a Query by Transformation and Expansion.

    PubMed

    Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank

    2017-01-01

    In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.

  10. Integrating systems biology models and biomedical ontologies

    PubMed Central

    2011-01-01

    Background Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. Results We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. Conclusions We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms. PMID:21835028

  11. A study on PubMed search tag usage pattern: association rule mining of a full-day PubMed query log.

    PubMed

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi

    2013-01-09

    The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.

  12. A Study on Pubmed Search Tag Usage Pattern: Association Rule Mining of a Full-day Pubmed Query Log

    PubMed Central

    2013-01-01

    Background The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search. Methods A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm. Results The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches. Conclusions The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed’s Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE. PMID:23302604

  13. Using Google Blogs and Discussions to Recommend Biomedical Resources: A Case Study

    PubMed Central

    Reed, Robyn B.; Chattopadhyay, Ansuman; Iwema, Carrie L.

    2013-01-01

    This case study investigated whether data gathered from discussions within the social media provide a reliable basis for a biomedical resources recommendation system. Using a search query to mine text from Google Blogs and Discussions, a ranking of biomedical resources was determined based on those most frequently mentioned. To establish quality, these results were compared to rankings by subject experts. An overall agreement between the frequency of social media discussions and subject expert recommendations was observed when identifying key bioinformatics and consumer health resources. Testing the method in more than one biomedical area implies this procedure could be employed across different subjects. PMID:24180648

  14. Integration of relational and textual biomedical sources. A pilot experiment using a semi-automated method for logical schema acquisition.

    PubMed

    García-Remesal, M; Maojo, V; Billhardt, H; Crespo, J

    2010-01-01

    Bringing together structured and text-based sources is an exciting challenge for biomedical informaticians, since most relevant biomedical sources belong to one of these categories. In this paper we evaluate the feasibility of integrating relational and text-based biomedical sources using: i) an original logical schema acquisition method for textual databases developed by the authors, and ii) OntoFusion, a system originally designed by the authors for the integration of relational sources. We conducted an integration experiment involving a test set of seven differently structured sources covering the domain of genetic diseases. We used our logical schema acquisition method to generate schemas for all textual sources. The sources were integrated using the methods and tools provided by OntoFusion. The integration was validated using a test set of 500 queries. A panel of experts answered a questionnaire to evaluate i) the quality of the extracted schemas, ii) the query processing performance of the integrated set of sources, and iii) the relevance of the retrieved results. The results of the survey show that our method extracts coherent and representative logical schemas. Experts' feedback on the performance of the integrated system and the relevance of the retrieved results was also positive. Regarding the validation of the integration, the system successfully provided correct results for all queries in the test set. The results of the experiment suggest that text-based sources including a logical schema can be regarded as equivalent to structured databases. Using our method, previous research and existing tools designed for the integration of structured databases can be reused - possibly subject to minor modifications - to integrate differently structured sources.

  15. Image acquisition context: procedure description attributes for clinically relevant indexing and selective retrieval of biomedical images.

    PubMed

    Bidgood, W D; Bray, B; Brown, N; Mori, A R; Spackman, K A; Golichowski, A; Jones, R H; Korman, L; Dove, B; Hildebrand, L; Berg, M

    1999-01-01

    To support clinically relevant indexing of biomedical images and image-related information based on the attributes of image acquisition procedures and the judgments (observations) expressed by observers in the process of image interpretation. The authors introduce the notion of "image acquisition context," the set of attributes that describe image acquisition procedures, and present a standards-based strategy for utilizing the attributes of image acquisition context as indexing and retrieval keys for digital image libraries. The authors' indexing strategy is based on an interdependent message/terminology architecture that combines the Digital Imaging and Communication in Medicine (DICOM) standard, the SNOMED (Systematized Nomenclature of Human and Veterinary Medicine) vocabulary, and the SNOMED DICOM microglossary. The SNOMED DICOM microglossary provides context-dependent mapping of terminology to DICOM data elements. The capability of embedding standard coded descriptors in DICOM image headers and image-interpretation reports improves the potential for selective retrieval of image-related information. This favorably affects information management in digital libraries.

  16. Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach.

    PubMed

    de Lusignan, Simon; Liaw, Siaw-Teng; Michalakidis, Georgios; Jones, Simon

    2011-01-01

    The burden of chronic disease is increasing, and research and quality improvement will be less effective if case finding strategies are suboptimal. To describe an ontology-driven approach to case finding in chronic disease and how this approach can be used to create a data dictionary and make the codes used in case finding transparent. A five-step process: (1) identifying a reference coding system or terminology; (2) using an ontology-driven approach to identify cases; (3) developing metadata that can be used to identify the extracted data; (4) mapping the extracted data to the reference terminology; and (5) creating the data dictionary. Hypertension is presented as an exemplar. A patient with hypertension can be represented by a range of codes including diagnostic, history and administrative. Metadata can link the coding system and data extraction queries to the correct data mapping and translation tool, which then maps it to the equivalent code in the reference terminology. The code extracted, the term, its domain and subdomain, and the name of the data extraction query can then be automatically grouped and published online as a readily searchable data dictionary. An exemplar online is: www.clininf.eu/qickd-data-dictionary.html Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data. It would offer considerable advantages over using limited datasets to define cases. This approach should be considered by those involved in research and quality improvement projects which utilise routine data.

  17. HPC AND GRID COMPUTING FOR INTEGRATIVE BIOMEDICAL RESEARCH

    PubMed Central

    Kurc, Tahsin; Hastings, Shannon; Kumar, Vijay; Langella, Stephen; Sharma, Ashish; Pan, Tony; Oster, Scott; Ervin, David; Permar, Justin; Narayanan, Sivaramakrishnan; Gil, Yolanda; Deelman, Ewa; Hall, Mary; Saltz, Joel

    2010-01-01

    Integrative biomedical research projects query, analyze, and integrate many different data types and make use of datasets obtained from measurements or simulations of structure and function at multiple biological scales. With the increasing availability of high-throughput and high-resolution instruments, the integrative biomedical research imposes many challenging requirements on software middleware systems. In this paper, we look at some of these requirements using example research pattern templates. We then discuss how middleware systems, which incorporate Grid and high-performance computing, could be employed to address the requirements. PMID:20107625

  18. Evaluation of research in biomedical ontologies

    PubMed Central

    Dumontier, Michel; Gkoutos, Georgios V.

    2013-01-01

    Ontologies are now pervasive in biomedicine, where they serve as a means to standardize terminology, to enable access to domain knowledge, to verify data consistency and to facilitate integrative analyses over heterogeneous biomedical data. For this purpose, research on biomedical ontologies applies theories and methods from diverse disciplines such as information management, knowledge representation, cognitive science, linguistics and philosophy. Depending on the desired applications in which ontologies are being applied, the evaluation of research in biomedical ontologies must follow different strategies. Here, we provide a classification of research problems in which ontologies are being applied, focusing on the use of ontologies in basic and translational research, and we demonstrate how research results in biomedical ontologies can be evaluated. The evaluation strategies depend on the desired application and measure the success of using an ontology for a particular biomedical problem. For many applications, the success can be quantified, thereby facilitating the objective evaluation and comparison of research in biomedical ontology. The objective, quantifiable comparison of research results based on scientific applications opens up the possibility for systematically improving the utility of ontologies in biomedical research. PMID:22962340

  19. Entrez Neuron RDFa: a pragmatic semantic web application for data integration in neuroscience research.

    PubMed

    Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

    2009-01-01

    The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.

  20. Combining clinical and genomics queries using i2b2 – Three methods

    PubMed Central

    Murphy, Shawn N.; Avillach, Paul; Bellazzi, Riccardo; Phillips, Lori; Gabetta, Matteo; Eran, Alal; McDuffie, Michael T.; Kohane, Isaac S.

    2017-01-01

    We are fortunate to be living in an era of twin biomedical data surges: a burgeoning representation of human phenotypes in the medical records of our healthcare systems, and high-throughput sequencing making rapid technological advances. The difficulty representing genomic data and its annotations has almost by itself led to the recognition of a biomedical “Big Data” challenge, and the complexity of healthcare data only compounds the problem to the point that coherent representation of both systems on the same platform seems insuperably difficult. We investigated the capability for complex, integrative genomic and clinical queries to be supported in the Informatics for Integrating Biology and the Bedside (i2b2) translational software package. Three different data integration approaches were developed: The first is based on Sequence Ontology, the second is based on the tranSMART engine, and the third on CouchDB. These novel methods for representing and querying complex genomic and clinical data on the i2b2 platform are available today for advancing precision medicine. PMID:28388645

  1. From Lexical Regularities to Axiomatic Patterns for the Quality Assurance of Biomedical Terminologies and Ontologies.

    PubMed

    van Damme, Philip; Quesada-Martínez, Manuel; Cornet, Ronald; Fernández-Breis, Jesualdo Tomás

    2018-06-13

    Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies. Copyright © 2018. Published by Elsevier Inc.

  2. User evaluation of ride technology research

    NASA Technical Reports Server (NTRS)

    Mckenzie, J. R.; Brumaghim, S. H.

    1976-01-01

    The 23 organizations queried represent government, carrier, and manufacturing interests in air, marine, rail, and surface transportation systems. Results indicate a strong need for common terminology and data analysis/reporting techniques. The various types of ride criteria currently in use are discussed, particularly in terms of their respective data base requirements. A plan of action is proposed for fulfilling the ride technology needs identified by this study.

  3. Abstraction networks for terminologies: Supporting management of "big knowledge".

    PubMed

    Halper, Michael; Gu, Huanying; Perl, Yehoshua; Ochs, Christopher

    2015-05-01

    Terminologies and terminological systems have assumed important roles in many medical information processing environments, giving rise to the "big knowledge" challenge when terminological content comprises tens of thousands to millions of concepts arranged in a tangled web of relationships. Use and maintenance of knowledge structures on that scale can be daunting. The notion of abstraction network is presented as a means of facilitating the usability, comprehensibility, visualization, and quality assurance of terminologies. An abstraction network overlays a terminology's underlying network structure at a higher level of abstraction. In particular, it provides a more compact view of the terminology's content, avoiding the display of minutiae. General abstraction network characteristics are discussed. Moreover, the notion of meta-abstraction network, existing at an even higher level of abstraction than a typical abstraction network, is described for cases where even the abstraction network itself represents a case of "big knowledge." Various features in the design of abstraction networks are demonstrated in a methodological survey of some existing abstraction networks previously developed and deployed for a variety of terminologies. The applicability of the general abstraction-network framework is shown through use-cases of various terminologies, including the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), the Medical Entities Dictionary (MED), and the Unified Medical Language System (UMLS). Important characteristics of the surveyed abstraction networks are provided, e.g., the magnitude of the respective size reduction referred to as the abstraction ratio. Specific benefits of these alternative terminology-network views, particularly their use in terminology quality assurance, are discussed. Examples of meta-abstraction networks are presented. The "big knowledge" challenge constitutes the use and maintenance of terminological structures that comprise tens of thousands to millions of concepts and their attendant complexity. The notion of abstraction network has been introduced as a tool in helping to overcome this challenge, thus enhancing the usefulness of terminologies. Abstraction networks have been shown to be applicable to a variety of existing biomedical terminologies, and these alternative structural views hold promise for future expanded use with additional terminologies. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Mining biomedical images towards valuable information retrieval in biomedical and life sciences

    PubMed Central

    Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas

    2016-01-01

    Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. PMID:27538578

  5. Consensus-Driven Development of a Terminology for Biobanking, the Duke Experience.

    PubMed

    Ellis, Helena; Joshi, Mary-Beth; Lynn, Aenoch J; Walden, Anita

    2017-04-01

    Biobanking at Duke University has existed for decades and has grown over time in silos and based on specialized needs, as is true with most biomedical research centers. These silos developed informatics systems to support their own individual requirements, with no regard for semantic or syntactic interoperability. Duke undertook an initiative to implement an enterprise-wide biobanking information system to serve its many diverse biobanking entities. A significant part of this initiative was the development of a common terminology for use in the commercial software platform. Common terminology provides the foundation for interoperability across biobanks for data and information sharing. We engaged experts in research, informatics, and biobanking through a consensus-driven process to agree on 361 terms and their definitions that encompass the lifecycle of a biospecimen. Existing standards, common terms, and data elements from published articles provided a foundation on which to build the biobanking terminology; a broader set of stakeholders then provided additional input and feedback in a secondary vetting process. The resulting standardized biobanking terminology is now available for sharing with the biobanking community to serve as a foundation for other institutions who are considering a similar initiative.

  6. Consensus-Driven Development of a Terminology for Biobanking, the Duke Experience

    PubMed Central

    Joshi, Mary-Beth; Lynn, Aenoch J.; Walden, Anita

    2017-01-01

    Biobanking at Duke University has existed for decades and has grown over time in silos and based on specialized needs, as is true with most biomedical research centers. These silos developed informatics systems to support their own individual requirements, with no regard for semantic or syntactic interoperability. Duke undertook an initiative to implement an enterprise-wide biobanking information system to serve its many diverse biobanking entities. A significant part of this initiative was the development of a common terminology for use in the commercial software platform. Common terminology provides the foundation for interoperability across biobanks for data and information sharing. We engaged experts in research, informatics, and biobanking through a consensus-driven process to agree on 361 terms and their definitions that encompass the lifecycle of a biospecimen. Existing standards, common terms, and data elements from published articles provided a foundation on which to build the biobanking terminology; a broader set of stakeholders then provided additional input and feedback in a secondary vetting process. The resulting standardized biobanking terminology is now available for sharing with the biobanking community to serve as a foundation for other institutions who are considering a similar initiative. PMID:28338350

  7. An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.

    PubMed

    Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard

    2012-02-01

    The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.

  8. XGI: a graphical interface for XQuery creation.

    PubMed

    Li, Xiang; Gennari, John H; Brinkley, James F

    2007-10-11

    XML has become the default standard for data exchange among heterogeneous data sources, and in January 2007 XQuery (XML Query language) was recommended by the World Wide Web Consortium as the query language for XML. However, XQuery is a complex language that is difficult for non-programmers to learn. We have therefore developed XGI (XQuery Graphical Interface), a visual interface for graphically generating XQuery. In this paper we demonstrate the functionality of XGI through its application to a biomedical XML dataset. We describe the system architecture and the features of XGI in relation to several existing querying systems, we demonstrate the system's usability through a sample query construction, and we discuss a preliminary evaluation of XGI. Finally, we describe some limitations of the system, and our plans for future improvements.

  9. Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction

    PubMed Central

    Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong

    2010-01-01

    Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696

  10. A common layer of interoperability for biomedical ontologies based on OWL EL.

    PubMed

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Wimalaratne, Sarala; Rebholz-Schuhmann, Dietrich; Schofield, Paul; Gkoutos, Georgios V

    2011-04-01

    Ontologies are essential in biomedical research due to their ability to semantically integrate content from different scientific databases and resources. Their application improves capabilities for querying and mining biological knowledge. An increasing number of ontologies is being developed for this purpose, and considerable effort is invested into formally defining them in order to represent their semantics explicitly. However, current biomedical ontologies do not facilitate data integration and interoperability yet, since reasoning over these ontologies is very complex and cannot be performed efficiently or is even impossible. We propose the use of less expressive subsets of ontology representation languages to enable efficient reasoning and achieve the goal of genuine interoperability between ontologies. We present and evaluate EL Vira, a framework that transforms OWL ontologies into the OWL EL subset, thereby enabling the use of tractable reasoning. We illustrate which OWL constructs and inferences are kept and lost following the conversion and demonstrate the performance gain of reasoning indicated by the significant reduction of processing time. We applied EL Vira to the open biomedical ontologies and provide a repository of ontologies resulting from this conversion. EL Vira creates a common layer of ontological interoperability that, for the first time, enables the creation of software solutions that can employ biomedical ontologies to perform inferences and answer complex queries to support scientific analyses. The EL Vira software is available from http://el-vira.googlecode.com and converted OBO ontologies and their mappings are available from http://bioonto.gen.cam.ac.uk/el-ont.

  11. Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships.

    PubMed

    Zong, Nansu; Lee, Sungin; Ahn, Jinhyun; Kim, Hong-Gee

    2017-08-01

    The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search-improving search with inputs, keywords and preferences, under different topics. This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs). Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine. The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Biomedical data integration in computational drug design and bioinformatics.

    PubMed

    Seoane, Jose A; Aguiar-Pulido, Vanessa; Munteanu, Cristian R; Rivero, Daniel; Rabunal, Juan R; Dorado, Julian; Pazos, Alejandro

    2013-03-01

    In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different, heterogeneous and distributed biomedical data sources. Data integration solutions can be very useful not only in the context of drug design, but also in biomedical information retrieval, clinical diagnosis, system biology, etc. In this review, we analyze the most common approaches to biomedical data integration, such as federated databases, data warehousing, multi-agent systems and semantic technology, as well as the solutions developed using these approaches in the past few years.

  13. Mining biomedical images towards valuable information retrieval in biomedical and life sciences.

    PubMed

    Ahmed, Zeeshan; Zeeshan, Saman; Dandekar, Thomas

    2016-01-01

    Biomedical images are helpful sources for the scientists and practitioners in drawing significant hypotheses, exemplifying approaches and describing experimental results in published biomedical literature. In last decades, there has been an enormous increase in the amount of heterogeneous biomedical image production and publication, which results in a need for bioimaging platforms for feature extraction and analysis of text and content in biomedical images to take advantage in implementing effective information retrieval systems. In this review, we summarize technologies related to data mining of figures. We describe and compare the potential of different approaches in terms of their developmental aspects, used methodologies, produced results, achieved accuracies and limitations. Our comparative conclusions include current challenges for bioimaging software with selective image mining, embedded text extraction and processing of complex natural language queries. © The Author(s) 2016. Published by Oxford University Press.

  14. Entrez Neuron RDFa: a pragmatic Semantic Web application for data integration in neuroscience research

    PubMed Central

    Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

    2013-01-01

    The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present “Entrez Neuron”, a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the ‘HCLS knowledgebase’ developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrates how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup. PMID:19745321

  15. Physical properties of biological entities: an introduction to the ontology of physics for biology.

    PubMed

    Cook, Daniel L; Bookstein, Fred L; Gennari, John H

    2011-01-01

    As biomedical investigators strive to integrate data and analyses across spatiotemporal scales and biomedical domains, they have recognized the benefits of formalizing languages and terminologies via computational ontologies. Although ontologies for biological entities-molecules, cells, organs-are well-established, there are no principled ontologies of physical properties-energies, volumes, flow rates-of those entities. In this paper, we introduce the Ontology of Physics for Biology (OPB), a reference ontology of classical physics designed for annotating biophysical content of growing repositories of biomedical datasets and analytical models. The OPB's semantic framework, traceable to James Clerk Maxwell, encompasses modern theories of system dynamics and thermodynamics, and is implemented as a computational ontology that references available upper ontologies. In this paper we focus on the OPB classes that are designed for annotating physical properties encoded in biomedical datasets and computational models, and we discuss how the OPB framework will facilitate biomedical knowledge integration. © 2011 Cook et al.

  16. Objective and automated protocols for the evaluation of biomedical search engines using No Title Evaluation protocols.

    PubMed

    Campagne, Fabien

    2008-02-29

    The evaluation of information retrieval techniques has traditionally relied on human judges to determine which documents are relevant to a query and which are not. This protocol is used in the Text Retrieval Evaluation Conference (TREC), organized annually for the past 15 years, to support the unbiased evaluation of novel information retrieval approaches. The TREC Genomics Track has recently been introduced to measure the performance of information retrieval for biomedical applications. We describe two protocols for evaluating biomedical information retrieval techniques without human relevance judgments. We call these protocols No Title Evaluation (NT Evaluation). The first protocol measures performance for focused searches, where only one relevant document exists for each query. The second protocol measures performance for queries expected to have potentially many relevant documents per query (high-recall searches). Both protocols take advantage of the clear separation of titles and abstracts found in Medline. We compare the performance obtained with these evaluation protocols to results obtained by reusing the relevance judgments produced in the 2004 and 2005 TREC Genomics Track and observe significant correlations between performance rankings generated by our approach and TREC. Spearman's correlation coefficients in the range of 0.79-0.92 are observed comparing bpref measured with NT Evaluation or with TREC evaluations. For comparison, coefficients in the range 0.86-0.94 can be observed when evaluating the same set of methods with data from two independent TREC Genomics Track evaluations. We discuss the advantages of NT Evaluation over the TRels and the data fusion evaluation protocols introduced recently. Our results suggest that the NT Evaluation protocols described here could be used to optimize some search engine parameters before human evaluation. Further research is needed to determine if NT Evaluation or variants of these protocols can fully substitute for human evaluations.

  17. From Gene to Protein: A 3-Week Intensive Course in Molecular Biology for Physical Scientists

    ERIC Educational Resources Information Center

    Nadeau, Jay L.

    2009-01-01

    This article describes a 3-week intensive molecular biology methods course based upon fluorescent proteins, which is successfully taught at the McGill University to advanced undergraduates and graduates in physics, chemical engineering, biomedical engineering, and medicine. No previous knowledge of biological terminology or methods is expected, so…

  18. A Diagram Editor for Efficient Biomedical Knowledge Capture and Integration

    PubMed Central

    Yu, Bohua; Jakupovic, Elvis; Wilson, Justin; Dai, Manhong; Xuan, Weijian; Mirel, Barbara; Athey, Brian; Watson, Stanley; Meng, Fan

    2008-01-01

    Understanding the molecular mechanisms underlying complex disorders requires the integration of data and knowledge from different sources including free text literature and various biomedical databases. To facilitate this process, we created the Biomedical Concept Diagram Editor (BCDE) to help researchers distill knowledge from data and literature and aid the process of hypothesis development. A key feature of BCDE is the ability to capture information with a simple drag-and-drop. This is a vast improvement over manual methods of knowledge and data recording and greatly increases the efficiency of the biomedical researcher. BCDE also provides a unique concept matching function to enforce consistent terminology, which enables conceptual relationships deposited by different researchers in the BCDE database to be mined and integrated for intelligible and useful results. We hope BCDE will promote the sharing and integration of knowledge from different researchers for effective hypothesis development. PMID:21347131

  19. Enabling complex queries to drug information sources through functional composition.

    PubMed

    Peters, Lee; Mortensen, Jonathan; Nguyen, Thang; Bodenreider, Olivier

    2013-01-01

    Our objective was to enable an end-user to create complex queries to drug information sources through functional composition, by creating sequences of functions from application program interfaces (API) to drug terminologies. The development of a functional composition model seeks to link functions from two distinct APIs. An ontology was developed using Protégé to model the functions of the RxNorm and NDF-RT APIs by describing the semantics of their input and output. A set of rules were developed to define the interoperable conditions for functional composition. The operational definition of interoperability between function pairs is established by executing the rules on the ontology. We illustrate that the functional composition model supports common use cases, including checking interactions for RxNorm drugs and deploying allergy lists defined in reference to drug properties in NDF-RT. This model supports the RxMix application (http://mor.nlm.nih.gov/RxMix/), an application we developed for enabling complex queries to the RxNorm and NDF-RT APIs.

  20. Using text mining to link journal articles to neuroanatomical databases

    PubMed Central

    French, Leon; Pavlidis, Paul

    2013-01-01

    The electronic linking of neuroscience information, including data embedded in the primary literature, would permit powerful queries and analyses driven by structured databases. This task would be facilitated by automated procedures which can identify biological concepts in journals. Here we apply an approach for automatically mapping formal identifiers of neuroanatomical regions to text found in journal abstracts, and apply it to a large body of abstracts from the Journal of Comparative Neurology (JCN). The analyses yield over one hundred thousand brain region mentions which we map to 8,225 brain region concepts in multiple organisms. Based on the analysis of a manually annotated corpus, we estimate mentions are mapped at 95% precision and 63% recall. Our results provide insights into the patterns of publication on brain regions and species of study in the Journal, but also point to important challenges in the standardization of neuroanatomical nomenclatures. We find that many terms in the formal terminologies never appear in a JCN abstract, while conversely, many terms authors use are not reflected in the terminologies. To improve the terminologies we deposited 136 unrecognized brain regions into the Neuroscience Lexicon (NeuroLex). The training data, terminologies, normalizations, evaluations and annotated journal abstracts are freely available at http://www.chibi.ubc.ca/WhiteText/. PMID:22120205

  1. Managing biomedical image metadata for search and retrieval of similar images.

    PubMed

    Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris

    2011-08-01

    Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.

  2. Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis

    PubMed Central

    Wu, Tsung-Jung; Schriml, Lynn M.; Chen, Qing-Rong; Colbert, Maureen; Crichton, Daniel J.; Finney, Richard; Hu, Ying; Kibbe, Warren A.; Kincaid, Heather; Meerzaman, Daoud; Mitraka, Elvira; Pan, Yang; Smith, Krista M.; Srivastava, Sudhir; Ward, Sari; Yan, Cheng; Mazumder, Raja

    2015-01-01

    Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org PMID:25841438

  3. The BioLexicon: a large-scale terminological resource for biomedical text mining

    PubMed Central

    2011-01-01

    Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. PMID:21992002

  4. The BioLexicon: a large-scale terminological resource for biomedical text mining.

    PubMed

    Thompson, Paul; McNaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; del Gratta, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, C J; Sasaki, Yutaka; Venturi, Giulia; Rebholz-Schuhmann, Dietrich; Ananiadou, Sophia

    2011-10-12

    Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.

  5. A Deep Learning Method to Automatically Identify Reports of Scientifically Rigorous Clinical Research from the Biomedical Literature: Comparative Analytic Study.

    PubMed

    Del Fiol, Guilherme; Michelson, Matthew; Iorio, Alfonso; Cotoi, Chris; Haynes, R Brian

    2018-06-25

    A major barrier to the practice of evidence-based medicine is efficiently finding scientifically sound studies on a given clinical topic. To investigate a deep learning approach to retrieve scientifically sound treatment studies from the biomedical literature. We trained a Convolutional Neural Network using a noisy dataset of 403,216 PubMed citations with title and abstract as features. The deep learning model was compared with state-of-the-art search filters, such as PubMed's Clinical Query Broad treatment filter, McMaster's textword search strategy (no Medical Subject Heading, MeSH, terms), and Clinical Query Balanced treatment filter. A previously annotated dataset (Clinical Hedges) was used as the gold standard. The deep learning model obtained significantly lower recall than the Clinical Queries Broad treatment filter (96.9% vs 98.4%; P<.001); and equivalent recall to McMaster's textword search (96.9% vs 97.1%; P=.57) and Clinical Queries Balanced filter (96.9% vs 97.0%; P=.63). Deep learning obtained significantly higher precision than the Clinical Queries Broad filter (34.6% vs 22.4%; P<.001) and McMaster's textword search (34.6% vs 11.8%; P<.001), but was significantly lower than the Clinical Queries Balanced filter (34.6% vs 40.9%; P<.001). Deep learning performed well compared to state-of-the-art search filters, especially when citations were not indexed. Unlike previous machine learning approaches, the proposed deep learning model does not require feature engineering, or time-sensitive or proprietary features, such as MeSH terms and bibliometrics. Deep learning is a promising approach to identifying reports of scientifically rigorous clinical research. Further work is needed to optimize the deep learning model and to assess generalizability to other areas, such as diagnosis, etiology, and prognosis. ©Guilherme Del Fiol, Matthew Michelson, Alfonso Iorio, Chris Cotoi, R Brian Haynes. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.06.2018.

  6. Abstracting data warehousing issues in scientific research.

    PubMed

    Tews, Cody; Bracio, Boris R

    2002-01-01

    This paper presents the design and implementation of the Idaho Biomedical Data Management System (IBDMS). This system preprocesses biomedical data from the IMPROVE (Improving Control of Patient Status in Critical Care) library via an Open Database Connectivity (ODBC) connection. The ODBC connection allows for local and remote simulations to access filtered, joined, and sorted data using the Structured Query Language (SQL). The tool is capable of providing an overview of available data in addition to user defined data subset for verification of models of the human respiratory system.

  7. Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.

    PubMed

    Tchechmedjiev, Andon; Abdaoui, Amine; Emonet, Vincent; Melzi, Soumia; Jonnagaddala, Jitendra; Jonquet, Clement

    2018-06-01

    Second use of clinical data commonly involves annotating biomedical text with terminologies and ontologies. The National Center for Biomedical Ontology Annotator is a frequently used annotation service, originally designed for biomedical data, but not very suitable for clinical text annotation. In order to add new functionalities to the NCBO Annotator without hosting or modifying the original Web service, we have designed a proxy architecture that enables seamless extensions by pre-processing of the input text and parameters, and post processing of the annotations. We have then implemented enhanced functionalities for annotating and indexing free text such as: scoring, detection of context (negation, experiencer, temporality), new output formats and coarse-grained concept recognition (with UMLS Semantic Groups). In this paper, we present the NCBO Annotator+, a Web service which incorporates these new functionalities as well as a small set of evaluation results for concept recognition and clinical context detection on two standard evaluation tasks (Clef eHealth 2017, SemEval 2014). The Annotator+ has been successfully integrated into the SIFR BioPortal platform-an implementation of NCBO BioPortal for French biomedical terminologies and ontologies-to annotate English text. A Web user interface is available for testing and ontology selection (http://bioportal.lirmm.fr/ncbo_annotatorplus); however the Annotator+ is meant to be used through the Web service application programming interface (http://services.bioportal.lirmm.fr/ncbo_annotatorplus). The code is openly available, and we also provide a Docker packaging to enable easy local deployment to process sensitive (e.g. clinical) data in-house (https://github.com/sifrproject). andon.tchechmedjiev@lirmm.fr. Supplementary data are available at Bioinformatics online.

  8. Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

    PubMed

    Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

    2016-08-08

    Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.

  9. The Biomedical Resource Ontology (BRO) to Enable Resource Discovery in Clinical and Translational Research

    PubMed Central

    Tenenbaum, Jessica D.; Whetzel, Patricia L.; Anderson, Kent; Borromeo, Charles D.; Dinov, Ivo D.; Gabriel, Davera; Kirschner, Beth; Mirel, Barbara; Morris, Tim; Noy, Natasha; Nyulas, Csongor; Rubenson, David; Saxman, Paul R.; Singh, Harpreet; Whelan, Nancy; Wright, Zach; Athey, Brian D.; Becich, Michael J.; Ginsburg, Geoffrey S.; Musen, Mark A.; Smith, Kevin A.; Tarantal, Alice F.; Rubin, Daniel L; Lyster, Peter

    2010-01-01

    The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health. PMID:20955817

  10. The BioIntelligence Framework: a new computational platform for biomedical knowledge computing.

    PubMed

    Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles; Mousses, Spyro

    2013-01-01

    Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information.

  11. Improving accuracy for identifying related PubMed queries by an integrated approach.

    PubMed

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  12. Improving accuracy for identifying related PubMed queries by an integrated approach

    PubMed Central

    Lu, Zhiyong; Wilbur, W. John

    2009-01-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users’ search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1,539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1,396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments. PMID:19162232

  13. Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.

    PubMed

    De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning

    2015-08-01

    Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, …) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience

    PubMed Central

    Bug, William J.; Ascoli, Giorgio A.; Grethe, Jeffrey S.; Gupta, Amarnath; Fennema-Notestine, Christine; Laird, Angela R.; Larson, Stephen D.; Rubin, Daniel; Shepherd, Gordon M.; Turner, Jessica A.; Martone, Maryann E.

    2009-01-01

    A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirement for a formally structured ontology for neuroscience with sufficient granularity to describe and access a diverse collection of information became obvious. This requirement led to the NIF standardized (NIFSTD) ontology, a comprehensive collection of common neuroscience domain terminologies woven into an ontologically consistent, unified representation of the biomedical domains typically used to describe neuroscience data (e.g., anatomy, cell types, techniques), as well as digital resources (tools, databases) being created throughout the neuroscience community. NIFSTD builds upon a structure established by the BIRNLex, a lexicon of concepts covering clinical neuroimaging research developed by the Biomedical Informatics Research Network (BIRN) project. Each distinct domain module is represented using the Web Ontology Language (OWL). As much as has been practical, NIFSTD reuses existing community ontologies that cover the required biomedical domains, building the more specific concepts required to annotate NIF resources. By following this principle, an extensive vocabulary was assembled in a relatively short period of time for NIF information annotation, organization, and retrieval, in a form that promotes easy extension and modification. We report here on the structure of the NIFSTD, and its predecessor BIRNLex, the principles followed in its construction and provide examples of its use within NIF. PMID:18975148

  15. Normalizing biomedical terms by minimizing ambiguity and variability

    PubMed Central

    Tsuruoka, Yoshimasa; McNaught, John; Ananiadou, Sophia

    2008-01-01

    Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known. PMID:18426547

  16. The NIFSTD and BIRNLex vocabularies: building comprehensive ontologies for neuroscience.

    PubMed

    Bug, William J; Ascoli, Giorgio A; Grethe, Jeffrey S; Gupta, Amarnath; Fennema-Notestine, Christine; Laird, Angela R; Larson, Stephen D; Rubin, Daniel; Shepherd, Gordon M; Turner, Jessica A; Martone, Maryann E

    2008-09-01

    A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirement for a formally structured ontology for neuroscience with sufficient granularity to describe and access a diverse collection of information became obvious. This requirement led to the NIF standardized (NIFSTD) ontology, a comprehensive collection of common neuroscience domain terminologies woven into an ontologically consistent, unified representation of the biomedical domains typically used to describe neuroscience data (e.g., anatomy, cell types, techniques), as well as digital resources (tools, databases) being created throughout the neuroscience community. NIFSTD builds upon a structure established by the BIRNLex, a lexicon of concepts covering clinical neuroimaging research developed by the Biomedical Informatics Research Network (BIRN) project. Each distinct domain module is represented using the Web Ontology Language (OWL). As much as has been practical, NIFSTD reuses existing community ontologies that cover the required biomedical domains, building the more specific concepts required to annotate NIF resources. By following this principle, an extensive vocabulary was assembled in a relatively short period of time for NIF information annotation, organization, and retrieval, in a form that promotes easy extension and modification. We report here on the structure of the NIFSTD, and its predecessor BIRNLex, the principles followed in its construction and provide examples of its use within NIF.

  17. Teaching of laser medical topics: Latvian experience

    NASA Astrophysics Data System (ADS)

    Spigulis, Janis

    2002-10-01

    Pilot program for Master's studies on Biomedical Optics has been developed and launched at University of Latvia in 1995. The Curriculum contains several basic subjects like Fundamentals of Biomedical Optics, Medical Lightguides, Anatomy and Physiology, Lasers and Non-coherent Light Sources, Optical Instrumentation for Healthcare, Optical Methods for Patient Treatment, Basic Physics, etc. Special English Terminology and Laboratory-Clinical Praxis are also involved, and the Master Theses is the final step for the degree award. Recently a new extensive short course for medical laser users "Lasers and Bio-optics in Medicine" has been prepared in the PowerPoint format and successfully presented in Latvia, Lithuania and Sweden.

  18. Research Trend Visualization by MeSH Terms from PubMed.

    PubMed

    Yang, Heyoung; Lee, Hyuck Jai

    2018-05-30

    Motivation : PubMed is a primary source of biomedical information comprising search tool function and the biomedical literature from MEDLINE which is the US National Library of Medicine premier bibliographic database, life science journals and online books. Complimentary tools to PubMed have been developed to help the users search for literature and acquire knowledge. However, these tools are insufficient to overcome the difficulties of the users due to the proliferation of biomedical literature. A new method is needed for searching the knowledge in biomedical field. Methods : A new method is proposed in this study for visualizing the recent research trends based on the retrieved documents corresponding to a search query given by the user. The Medical Subject Headings (MeSH) are used as the primary analytical element. MeSH terms are extracted from the literature and the correlations between them are calculated. A MeSH network, called MeSH Net, is generated as the final result based on the Pathfinder Network algorithm. Results : A case study for the verification of proposed method was carried out on a research area defined by the search query (immunotherapy and cancer and "tumor microenvironment"). The MeSH Net generated by the method is in good agreement with the actual research activities in the research area (immunotherapy). Conclusion : A prototype application generating MeSH Net was developed. The application, which could be used as a "guide map for travelers", allows the users to quickly and easily acquire the knowledge of research trends. Combination of PubMed and MeSH Net is expected to be an effective complementary system for the researchers in biomedical field experiencing difficulties with search and information analysis.

  19. Hydrocephalus caused by unilateral foramen of Monro obstruction: A review on terminology

    PubMed Central

    Nigri, Flavio; Gobbi, Gabriel Neffa; da Costa Ferreira Pinto, Pedro Henrique; Simões, Elington Lannes; Caparelli-Daquer, Egas Moniz

    2016-01-01

    Background: Hydrocephalus caused by unilateral foramen of Monro (FM) obstruction has been referred to in literature by many different terminologies. Precise terminology describing hydrocephalus confined to just one lateral ventricle has a very important prognostic value and determines whether or not the patient can be shunt free after an endoscopic procedure. Methods: Aiming to define the best term for unilateral FM obstruction, 19 terms were employed on PubMed database (http://www.ncbi.nlm.nih.gov/pubmed) as quoted phrases. Results: A total of 194 articles were found. Four patterns of hydrocephalus were discriminated as a result of our research term query and were divided by types for didactic purpose. Type A - partial dilation of the lateral ventricle; Type B - pure unilateral obstruction of the FM; Type C - previously shunted patients with secondary obstruction of the FM; and Type D - asymmetric lateral ventricles with patent FM. Conclusion: In unilateral FM obstruction hydrocephalus, an in-depth review on terminology application is critical to avoid mistakes that may compromise comparisons among different series. This terminology review suggests that Type B hydrocephalus, i.e., the hydrocephalus confined to just one lateral ventricle with no other sites of cerebrospinal fluid circulation blockage, are best described by the terms unilateral hydrocephalus (UH) and monoventricular hydrocephalus, the first being by far the most popular. Type A hydrocephalus is best represented in the literature by the terms uniloculated hydrocephalus and loculated ventricle; Type C hydrocephalus by the terms isolated lateral ventricle and isolated UH; and Type D hydrocephalus by the term asymmetric hydrocephalus. PMID:27274402

  20. Development and evaluation of a biomedical search engine using a predicate-based vector space model.

    PubMed

    Kwak, Myungjae; Leroy, Gondy; Martinez, Jesse D; Harwell, Jeffrey

    2013-10-01

    Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p<.001) for the predicate-based (80%) than for the keyword-based (71%) approach. Relevance was almost doubled with the predicate-based approach-2.1 versus 1.6 without rank order adjustment (p<.001) and 1.34 versus 0.98 with rank order adjustment (p<.001) for predicate--versus keyword-based approach respectively. Predicates can support more precise searching than keywords, laying the foundation for rich and sophisticated information search. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Vaccine and Drug Ontology Studies (VDOS 2014).

    PubMed

    Tao, Cui; He, Yongqun; Arabandi, Sivaram

    2016-01-01

    The "Vaccine and Drug Ontology Studies" (VDOS) international workshop series focuses on vaccine- and drug-related ontology modeling and applications. Drugs and vaccines have been critical to prevent and treat human and animal diseases. Work in both (drugs and vaccines) areas is closely related - from preclinical research and development to manufacturing, clinical trials, government approval and regulation, and post-licensure usage surveillance and monitoring. Over the last decade, tremendous efforts have been made in the biomedical ontology community to ontologically represent various areas associated with vaccines and drugs - extending existing clinical terminology systems such as SNOMED, RxNorm, NDF-RT, and MedDRA, developing new models such as the Vaccine Ontology (VO) and Ontology of Adverse Events (OAE), vernacular medical terminologies such as the Consumer Health Vocabulary (CHV). The VDOS workshop series provides a platform for discussing innovative solutions as well as the challenges in the development and applications of biomedical ontologies for representing and analyzing drugs and vaccines, their administration, host immune responses, adverse events, and other related topics. The five full-length papers included in this 2014 thematic issue focus on two main themes: (i) General vaccine/drug-related ontology development and exploration, and (ii) Interaction and network-related ontology studies.

  2. Constructing a Graph Database for Semantic Literature-Based Discovery.

    PubMed

    Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C

    2015-01-01

    Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.

  3. Structural Design and Physicochemical Foundations of Hydrogels for Biomedical Applications.

    PubMed

    Li, Qingyong; Ning, Zhengxiang; Ren, Jiaoyan; Liao, Wenzhen

    2018-01-01

    Biomedical research, known as medical research, is conducive to support and promote the development of knowledge in the field of medicine. Hydrogels have been extensively used in many biomedical fields due to their highly absorbent and flexible properties. The smart hydrogels, especially, can respond to a broad range of external stimuli such as temperature, pH value, light, electric and magnetic fields. With excellent biocompatibility, tunable rheology, mechanical properties, porosity, and hydrated molecular structure, hydrogels are considered as promising candidate for simulating local tissue microenvironment. In this review article, we mainly focused on the most recent development of engineering synthetic hydrogels; moreover, the classification, properties, especially the biomedical applications including tissue engineering and cell scaffolding, drug and gene delivery, immunotherapies and vaccines, are summarized and discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Applications of Nanoflowers in Biomedicine.

    PubMed

    Negahdary, Masoud; Heli, Hossein

    2018-02-14

    Nanotechnology has opened new windows for biomedical researches and treatment of diseases. Nanostructures with flower-like shapes (nanoflowers) which have exclusive morphology and properties have been interesting for many researchers. In this review, various applications of nanoflowers in biomedical researches and patents from various aspects have been investigated and reviewed. Nanoflowers attracted serious attentions in whole biomedical fields such as cardiovascular diseases, microbiology, sensors and biosensors, biochemical and cellular studies, cancer therapy, healthcare, etc. The competitive power of nanoflowers against other in use technologies provides successful achievements in the progress of mentioned biomedical studies. The use of nanoflowers in biomedicine leads to improving accuracy, reducing time to achieve the results, reducing costs, creating optimal treatment conditions as well as avoiding side effects of the treatment of specific diseases, and increasing functional strength. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. Finding and accessing diagrams in biomedical publications.

    PubMed

    Kuhn, Tobias; Luong, ThaiBinh; Krauthammer, Michael

    2012-01-01

    Complex relationships in biomedical publications are often communicated by diagrams such as bar and line charts, which are a very effective way of summarizing and communicating multi-faceted data sets. Given the ever-increasing amount of published data, we argue that the precise retrieval of such diagrams is of great value for answering specific and otherwise hard-to-meet information needs. To this end, we demonstrate the use of advanced image processing and classification for identifying bar and line charts by the shape and relative location of the different image elements that make up the charts. With recall and precisions of close to 90% for the detection of relevant figures, we discuss the use of this technology in an existing biomedical image search engine, and outline how it enables new forms of literature queries over biomedical relationships that are represented in these charts.

  6. Towards a Consistent and Scientifically Accurate Drug Ontology.

    PubMed

    Hogan, William R; Hanna, Josh; Joseph, Eric; Brochhausen, Mathias

    2013-01-01

    Our use case for comparative effectiveness research requires an ontology of drugs that enables querying National Drug Codes (NDCs) by active ingredient, mechanism of action, physiological effect, and therapeutic class of the drug products they represent. We conducted an ontological analysis of drugs from the realist perspective, and evaluated existing drug terminology, ontology, and database artifacts from (1) the technical perspective, (2) the perspective of pharmacology and medical science (3) the perspective of description logic semantics (if they were available in Web Ontology Language or OWL), and (4) the perspective of our realism-based analysis of the domain. No existing resource was sufficient. Therefore, we built the Drug Ontology (DrOn) in OWL, which we populated with NDCs and other classes from RxNorm using only content created by the National Library of Medicine. We also built an application that uses DrOn to query for NDCs as outlined above, available at: http://ingarden.uams.edu/ingredients. The application uses an OWL-based description logic reasoner to execute end-user queries. DrOn is available at http://code.google.com/p/dr-on.

  7. Unsupervised method for automatic construction of a disease dictionary from a large free text collection.

    PubMed

    Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

    2008-11-06

    Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting con-textual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35-88%) over available, manually created disease terminologies.

  8. Unsupervised Method for Automatic Construction of a Disease Dictionary from a Large Free Text Collection

    PubMed Central

    Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

    2008-01-01

    Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting contextual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35–88%) over available, manually created disease terminologies. PMID:18999169

  9. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698

  10. Image BOSS: a biomedical object storage system

    NASA Astrophysics Data System (ADS)

    Stacy, Mahlon C.; Augustine, Kurt E.; Robb, Richard A.

    1997-05-01

    Researchers using biomedical images have data management needs which are oriented perpendicular to clinical PACS. The image BOSS system is designed to permit researchers to organize and select images based on research topic, image metadata, and a thumbnail of the image. Image information is captured from existing images in a Unix based filesystem, stored in an object oriented database, and presented to the user in a familiar laboratory notebook metaphor. In addition, the ImageBOSS is designed to provide an extensible infrastructure for future content-based queries directly on the images.

  11. vSPARQL: A View Definition Language for the Semantic Web

    PubMed Central

    Shaw, Marianne; Detwiler, Landon T.; Noy, Natalya; Brinkley, James; Suciu, Dan

    2010-01-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. PMID:20800106

  12. Meeting medical terminology needs--the Ontology-Enhanced Medical Concept Mapper.

    PubMed

    Leroy, G; Chen, H

    2001-12-01

    This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.

  13. Exploring performance issues for a clinical database organized using an entity-attribute-value representation.

    PubMed

    Chen, R S; Nadkarni, P; Marenco, L; Levin, F; Erdos, J; Miller, P L

    2000-01-01

    The entity-attribute-value representation with classes and relationships (EAV/CR) provides a flexible and simple database schema to store heterogeneous biomedical data. In certain circumstances, however, the EAV/CR model is known to retrieve data less efficiently than conventionally based database schemas. To perform a pilot study that systematically quantifies performance differences for database queries directed at real-world microbiology data modeled with EAV/CR and conventional representations, and to explore the relative merits of different EAV/CR query implementation strategies. Clinical microbiology data obtained over a ten-year period were stored using both database models. Query execution times were compared for four clinically oriented attribute-centered and entity-centered queries operating under varying conditions of database size and system memory. The performance characteristics of three different EAV/CR query strategies were also examined. Performance was similar for entity-centered queries in the two database models. Performance in the EAV/CR model was approximately three to five times less efficient than its conventional counterpart for attribute-centered queries. The differences in query efficiency became slightly greater as database size increased, although they were reduced with the addition of system memory. The authors found that EAV/CR queries formulated using multiple, simple SQL statements executed in batch were more efficient than single, large SQL statements. This paper describes a pilot project to explore issues in and compare query performance for EAV/CR and conventional database representations. Although attribute-centered queries were less efficient in the EAV/CR model, these inefficiencies may be addressable, at least in part, by the use of more powerful hardware or more memory, or both.

  14. Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

    PubMed Central

    Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila

    2018-01-01

    Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374

  15. The BioIntelligence Framework: a new computational platform for biomedical knowledge computing

    PubMed Central

    Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles

    2013-01-01

    Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information. PMID:22859646

  16. View-Based Searching Systems--Progress Towards Effective Disintermediation.

    ERIC Educational Resources Information Center

    Pollitt, A. Steven; Smith, Martin P.; Treglown, Mark; Braekevelt, Patrick

    This paper presents the background and then reports progress made in the development of two view-based searching systems--HIBROWSE for EMBASE, searching Europe's most important biomedical bibliographic database, and HIBROWSE for EPOQUE, improving access to the European Parliament's Online Query System. The HIBROWSE approach to searching promises…

  17. Invited article: is it time for neurohospitalists?

    PubMed

    Freeman, William D; Gronseth, Gary; Eidelman, Benjamin H

    2008-04-08

    Explosive growth of hospital-based medicine specialists, termed hospitalists, has occurred in the past decade. This was fueled by pressures within the American health care system for timely, cost-effective, and high-quality care and by the growing chasm between inpatient and outpatient care. In this article, we sought to answer five questions: 1) What is a neurohospitalist? 2) How many neurohospitalists practice in the United States? 3) What are potential advantages of neurohospitalists? 4) What are the challenges of implementing a neurohospitalist practice? 5) What effect does a neurohospitalist have on clinical outcomes? We queried biomedical databases (e.g., PubMed) by using the search terms "hospitalist," "neurohospitalist," and "neurology hospitalist." We also searched the Society of Hospital Medicine and the American Academy of Neurology Dendrite classified advertisement Web sites for hospitalist and neurology hospitalist growth by using the same search terms. We defined neurology hospitalists (neurohospitalists) as neurologists who devote at least one-quarter of their time managing inpatients with neurologic disease. Although the number of hospitalists has grown considerably over the past decade, limited data on neurohospitalists exist. Advertisements for neurohospitalist positions have increased from 2003 through 2007, but accurate assessment of growth is limited by the lack of a central organizational affiliation and unifying terminology. Health care pressures spawned the growth of medicine and pediatric hospitalists, who provide efficient, cost-effective care by reducing the length of hospitalization. Because neurologists experience the same pressures, we expect neurohospitalists to increase in number, especially within areas that have sufficient inpatient volume and resources.

  18. Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery

    PubMed Central

    Shen, Feichen; Liu, Hongfang; Sohn, Sunghwan; Larson, David W.; Lee, Yugyung

    2017-01-01

    In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic. PMID:28983419

  19. Finding and Accessing Diagrams in Biomedical Publications

    PubMed Central

    Kuhn, Tobias; Luong, ThaiBinh; Krauthammer, Michael

    2012-01-01

    Complex relationships in biomedical publications are often communicated by diagrams such as bar and line charts, which are a very effective way of summarizing and communicating multi-faceted data sets. Given the ever-increasing amount of published data, we argue that the precise retrieval of such diagrams is of great value for answering specific and otherwise hard-to-meet information needs. To this end, we demonstrate the use of advanced image processing and classification for identifying bar and line charts by the shape and relative location of the different image elements that make up the charts. With recall and precisions of close to 90% for the detection of relevant figures, we discuss the use of this technology in an existing biomedical image search engine, and outline how it enables new forms of literature queries over biomedical relationships that are represented in these charts. PMID:23304318

  20. Prototype of Multifunctional Full-text Library in the Architecture Web-browser / Web-server / SQL-server

    NASA Astrophysics Data System (ADS)

    Lyapin, Sergey; Kukovyakin, Alexey

    Within the framework of the research program "Textaurus" an operational prototype of multifunctional library T-Libra v.4.1. has been created which makes it possible to carry out flexible parametrizable search within a full-text database. The information system is realized in the architecture Web-browser / Web-server / SQL-server. This allows to achieve an optimal combination of universality and efficiency of text processing, on the one hand, and convenience and minimization of expenses for an end user (due to applying of a standard Web-browser as a client application), on the other one. The following principles underlie the information system: a) multifunctionality, b) intelligence, c) multilingual primary texts and full-text searching, d) development of digital library (DL) by a user ("administrative client"), e) multi-platform working. A "library of concepts", i.e. a block of functional models of semantic (concept-oriented) searching, as well as a subsystem of parametrizable queries to a full-text database, which is closely connected with the "library", serve as a conceptual basis of multifunctionality and "intelligence" of the DL T-Libra v.4.1. An author's paragraph is a unit of full-text searching in the suggested technology. At that, the "logic" of an educational / scientific topic or a problem can be built in a multilevel flexible structure of a query and the "library of concepts", replenishable by the developers and experts. About 10 queries of various level of complexity and conceptuality are realized in the suggested version of the information system: from simple terminological searching (taking into account lexical and grammatical paradigms of Russian) to several kinds of explication of terminological fields and adjustable two-parameter thematic searching (a [set of terms] and a [distance between terms] within the limits of an author's paragraph are such parameters correspondingly).

  1. Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms.

    PubMed

    Kim, Sun; Yeganova, Lana; Wilbur, W John

    2016-10-01

    Medical Subject Headings (MeSH(®)) is a controlled vocabulary for indexing and searching biomedical literature. MeSH terms and subheadings are organized in a hierarchical structure and are used to indicate the topics of an article. Biologists can use either MeSH terms as queries or the MeSH interface provided in PubMed(®) for searching PubMed abstracts. However, these are rarely used, and there is no convenient way to link standardized MeSH terms to user queries. Here, we introduce a web interface which allows users to enter queries to find MeSH terms closely related to the queries. Our method relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries. https://www.ncbi.nlm.nih.gov/IRET/MESHABLE/ CONTACT: sun.kim@nih.gov Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  2. BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.

    PubMed

    Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália

    2016-07-01

    Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations. The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  3. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications.

    PubMed

    Whetzel, Patricia L; Noy, Natalya F; Shah, Nigam H; Alexander, Paul R; Nyulas, Csongor; Tudorache, Tania; Musen, Mark A

    2011-07-01

    The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

  4. Auditing Associative Relations across Two Knowledge Sources

    PubMed Central

    Vizenor, Lowell T.; Bodenreider, Olivier; McCray, Alexa T.

    2009-01-01

    Objectives This paper proposes a novel semantic method for auditing associative relations in biomedical terminologies. We tested our methodology on two Unified Medical Language System (UMLS) knowledge sources. Methods We use the UMLS semantic groups as high-level representations of the domain and range of relationships in the Metathesaurus and in the Semantic Network. A mapping created between Metathesaurus relationships and Semantic Network relationships forms the basis for comparing the signatures of a given Metathesaurus relationship to the signatures of the semantic relationship to which it is mapped. The consistency of Metathesaurus relations is studied for each relationship. Results Of the 177 associative relationships in the Metathesaurus, 84 (48%) exhibit a high degree of consistency with the corresponding Semantic Network relationships. Overall, 63% of the 1.8M associative relations in the Metathesaurus are consistent with relations in the Semantic Network. Conclusion The semantics of associative relationships in biomedical terminologies should be defined explicitly by their developers. The Semantic Network would benefit from being extended with new relationships and with new relations for some existing relationships. The UMLS editing environment could take advantage of the correspondence established between relationships in the Metathesaurus and the Semantic Network. Finally, the auditing method also yielded useful information for refining the mapping of associative relationships between the two sources. PMID:19475724

  5. What Four Million Mappings Can Tell You about Two Hundred Ontologies

    NASA Astrophysics Data System (ADS)

    Ghazvinian, Amir; Noy, Natalya F.; Jonquet, Clement; Shah, Nigam; Musen, Mark A.

    The field of biomedicine has embraced the Semantic Web probably more than any other field. As a result, there is a large number of biomedical ontologies covering overlapping areas of the field. We have developed BioPortal—an open community-based repository of biomedical ontologies. We analyzed ontologies and terminologies in BioPortal and the Unified Medical Language System (UMLS), creating more than 4 million mappings between concepts in these ontologies and terminologies based on the lexical similarity of concept names and synonyms. We then analyzed the mappings and what they tell us about the ontologies themselves, the structure of the ontology repository, and the ways in which the mappings can help in the process of ontology design and evaluation. For example, we can use the mappings to guide users who are new to a field to the most pertinent ontologies in that field, to identify areas of the domain that are not covered sufficiently by the ontologies in the repository, and to identify which ontologies will serve well as background knowledge in domain-specific tools. While we used a specific (but large) ontology repository for the study, we believe that the lessons we learned about the value of a large-scale set of mappings to ontology users and developers are general and apply in many other domains.

  6. The Clinical Practice Library of Medicine (CPLM): An on-line biomedical computer library. System documentation

    NASA Technical Reports Server (NTRS)

    Grams, R. R.

    1982-01-01

    A system designed to access a large range of available medical textbook information in an online interactive fashion is described. A high level query type database manager, INQUIRE, is used. Operating instructions, system flow diagrams, database descriptions, text generation, and error messages are discussed. User information is provided.

  7. vSPARQL: a view definition language for the semantic web.

    PubMed

    Shaw, Marianne; Detwiler, Landon T; Noy, Natalya; Brinkley, James; Suciu, Dan

    2011-02-01

    Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. Copyright © 2010 Elsevier Inc. All rights reserved.

  8. A Comparison of Evaluation Metrics for Biomedical Journals, Articles, and Websites in Terms of Sensitivity to Topic

    PubMed Central

    Fu, Lawrence D.; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F.

    2011-01-01

    Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed’s clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. PMID:21419864

  9. BIOSMILE web search: a web application for annotating biomedical entities and relations.

    PubMed

    Dai, Hong-Jie; Huang, Chi-Hsin; Lin, Ryan T K; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2008-07-01

    BIOSMILE web search (BWS), a web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form. To date, BWS has been field tested by over 30 biologists and questionnaires have shown that subjects are highly satisfied with its capabilities and usability. BWS is accessible free of charge at http://bioservices.cse.yzu.edu.tw/BWS.

  10. Isosemantic rendering of clinical information using formal ontologies and RDF.

    PubMed

    Martínez-Costa, Catalina; Bosca, Diego; Legaz-García, Mari Carmen; Tao, Cui; Fernández Breis, Jesualdo Tomás; Schulz, Stefan; Chute, Christopher G

    2013-01-01

    The generation of a semantic clinical infostructure requires linking ontologies, clinical models and terminologies [1]. Here we describe an approach that would permit data coming from different sources and represented in different standards to be queried in a homogeneous and integrated way. Our assumption is that data providers should be able to agree and share the meaning of the data they want to exchange and to exploit. We will describe how Clinical Element Model (CEM) and OpenEHR datasets can be jointly exploited in Semantic Web environments.

  11. Cell-based Assays for Assessing Toxicity: A Basic Guide.

    PubMed

    Parboosing, Raveen; Mzobe, Gugulethu; Chonco, Louis; Moodley, Indres

    2016-01-01

    Assessment of toxicity is an important component of the drug discovery process. Cellbased assays are a popular choice for assessing cytotoxicity. However, these assays are complex because of the wide variety of formats and methods that are available, lack of standardization, confusing terminology and the inherent variability of biological systems and measurement. This review is intended as a guide on how to take these factors into account when planning, conducting and/or interpreting cell based toxicity assays. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. The National Institutes of Health's Biomedical Translational Research Information System (BTRIS): Design, Contents, Functionality and Experience to Date

    PubMed Central

    Cimino, James J.; Ayres, Elaine J.; Remennik, Lyubov; Rath, Sachi; Freedman, Robert; Beri, Andrea; Chen, Yang; Huser, Vojtech

    2013-01-01

    The US National Institutes of Health (NIH) has developed the Biomedical Translational Research Information System (BTRIS) to support researchers’ access to translational and clinical data. BTRIS includes a data repository, a set of programs for loading data from NIH electronic health records and research data management systems, an ontology for coding the disparate data with a single terminology, and a set of user interface tools that provide access to identified data from individual research studies and data across all studies from which individually identifiable data have been removed. This paper reports on unique design elements of the system, progress to date and user experience after five years of development and operation. PMID:24262893

  13. Biophotonics Master studies: teaching and training experience at University of Latvia

    NASA Astrophysics Data System (ADS)

    Spigulis, Janis

    2007-06-01

    Two-year program for Master's studies on Biophotonics (Biomedical Optics) has been originally developed and carried out at University of Latvia since 1995. The Curriculum contains basic subjects like Fundamentals of Biomedical Optics, Medical Lightguides, Anatomy and Physiology, Lasers and Non-coherent Light Sources, Basic Physics, etc. Student laboratories, special English Terminology and Laboratory-Clinical Praxis are also involved as the training components, and Master project is the final step for the degree award. Life-long learning is supported by several E-courses and an extensive short course for medical laser users "Lasers and Bio-optics in Medicine". Recently a new inter-university European Social Fund project was started to adapt the program accordingly to the Bologna Declaration guidelines.

  14. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  15. Enhancing biomedical text summarization using semantic relation extraction.

    PubMed

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  16. NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.

    PubMed

    Tseytlin, Eugene; Mitchell, Kevin; Legowski, Elizabeth; Corrigan, Julia; Chavan, Girish; Jacobson, Rebecca S

    2016-01-14

    Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.

  17. Augmenting Oracle Text with the UMLS for enhanced searching of free-text medical reports.

    PubMed

    Ding, Jing; Erdal, Selnur; Dhaval, Rakesh; Kamal, Jyoti

    2007-10-11

    The intrinsic complexity of free-text medical reports imposes great challenges for information retrieval systems. We have developed a prototype search engine for retrieving clinical reports that leverages the powerful indexing and querying capabilities of Oracle Text, and the rich biomedical domain knowledge and semantic structures that are captured in the UMLS Metathesaurus.

  18. CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.

    PubMed

    Perez-Rey, David; Jimenez-Castellanos, Ana; Garcia-Remesal, Miguel; Crespo, Jose; Maojo, Victor

    2012-04-05

    Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems.

  19. search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information

    PubMed Central

    2013-01-01

    Background Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. Results We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user’s query, advanced data searching based on the specified user’s query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. Conclusions search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/. PMID:23452691

  20. search GenBank: interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information.

    PubMed

    Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur

    2013-03-01

    Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the NCBI Entrez search engine in querying biomedical databases. The possibility of creating and saving macros in the search GenBank is a unique feature and has a great potential. The potential will further grow in the future with the increasing density of networks of relationships between data stored in particular databases. search GenBank is available for public use at http://sgb.biotools.pl/.

  1. CDAPubMed: a browser extension to retrieve EHR-based biomedical literature

    PubMed Central

    2012-01-01

    Background Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems. PMID:22480327

  2. Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

    PubMed

    Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V

    2011-01-01

    Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.

  3. A comprehensive SWOT audit of the role of the biomedical physicist in the education of healthcare professionals in Europe.

    PubMed

    Caruana, C J; Wasilewska-Radwanska, M; Aurengo, A; Dendy, P P; Karenauskaite, V; Malisan, M R; Meijer, J H; Mihov, D; Mornstein, V; Rokita, E; Vano, E; Weckstrom, M; Wucherer, M

    2010-04-01

    Although biomedical physicists provide educational services to the healthcare professions in the majority of universities in Europe, their precise role with respect to the education of the healthcare professions has not been studied systematically. To address this issue we are conducting a research project to produce a strategic development model for the role using the well-established SWOT (Strengths, Weaknesses, Opportunities, Threats) methodology. SWOT based strategic planning is a two-step process: one first carries out a SWOT position audit and then uses the identified SWOT themes to construct the strategic development model. This paper reports the results of a SWOT audit for the role of the biomedical physicist in the education of the healthcare professions in Europe. Internal Strengths and Weaknesses of the role were identified through a qualitative survey of biomedical physics departments and biomedical physics curricula delivered to healthcare professionals across Europe. External environmental Opportunities and Threats were identified through a systematic survey of the healthcare, healthcare professional education and higher education literature and categorized under standard PEST (Political, Economic, Social-Psychological, Technological-Scientific) categories. The paper includes an appendix of terminology. Defined terms are marked with an asterisk in the text. Copyright 2009 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  4. The Unified Medical Language System (UMLS): integrating biomedical terminology

    PubMed Central

    Bodenreider, Olivier

    2004-01-01

    The Unified Medical Language System (http://umlsks.nlm.nih.gov) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million names for some 900 000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. Vocabularies integrated in the UMLS Metathesaurus include the NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter-related, but may also be linked to external resources such as GenBank. In addition to data, the UMLS includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). The UMLS knowledge sources are updated quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD-ROM and by FTP. PMID:14681409

  5. The Unified Medical Language System (UMLS): integrating biomedical terminology.

    PubMed

    Bodenreider, Olivier

    2004-01-01

    The Unified Medical Language System (http://umlsks.nlm.nih.gov) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. Vocabularies integrated in the UMLS Metathesaurus include the NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter-related, but may also be linked to external resources such as GenBank. In addition to data, the UMLS includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). The UMLS knowledge sources are updated quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD-ROM and by FTP.

  6. Towards structured sharing of raw and derived neuroimaging data across existing resources

    PubMed Central

    Keator, D.B.; Helmer, K.; Steffener, J.; Turner, J.A.; Van Erp, T.G.M.; Gadde, S.; Ashish, N.; Burns, G.A.; Nichols, B.N.

    2013-01-01

    Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery. PMID:23727024

  7. Biomedical data integration - capturing similarities while preserving disparities.

    PubMed

    Bianchi, Stefano; Burla, Anna; Conti, Costanza; Farkash, Ariel; Kent, Carmel; Maman, Yonatan; Shabo, Amnon

    2009-01-01

    One of the challenges of healthcare data processing, analysis and warehousing is the integration of data gathered from disparate and diverse data sources. Promoting the adoption of worldwide accepted information standards along with common terminologies and the use of technologies derived from semantic web representation, is a suitable path to achieve that. To that end, the HL7 V3 Reference Information Model (RIM) [1] has been used as the underlying information model coupled with the Web Ontology Language (OWL) [2] as the semantic data integration technology. In this paper we depict a biomedical data integration process and demonstrate how it was used for integrating various data sources, containing clinical, environmental and genomic data, within Hypergenes, a European Commission funded project exploring the Essential Hypertension [3] disease model.

  8. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network

    PubMed Central

    Schilling, Lisa M.; Kwan, Bethany M.; Drolshagen, Charles T.; Hosokawa, Patrick W.; Brandt, Elias; Pace, Wilson D.; Uhrich, Christopher; Kamerick, Michael; Bunting, Aidan; Payne, Philip R.O.; Stephens, William E.; George, Joseph M.; Vance, Mark; Giacomini, Kelli; Braddy, Jason; Green, Mika K.; Kahn, Michael G.

    2013-01-01

    Introduction: Distributed Data Networks (DDNs) offer infrastructure solutions for sharing electronic health data from across disparate data sources to support comparative effectiveness research. Data sharing mechanisms must address technical and governance concerns stemming from network security and data disclosure laws and best practices, such as HIPAA. Methods: The Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) deploys TRIAD grid technology, a common data model, detailed technical documentation, and custom software for data harmonization to facilitate data sharing in collaboration with stakeholders in the care of safety net populations. Data sharing partners host TRIAD grid nodes containing harmonized clinical data within their internal or hosted network environments. Authorized users can use a central web-based query system to request analytic data sets. Discussion: SAFTINet DDN infrastructure achieved a number of data sharing objectives, including scalable and sustainable systems for ensuring harmonized data structures and terminologies and secure distributed queries. Initial implementation challenges were resolved through iterative discussions, development and implementation of technical documentation, governance, and technology solutions. PMID:25848567

  9. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network.

    PubMed

    Schilling, Lisa M; Kwan, Bethany M; Drolshagen, Charles T; Hosokawa, Patrick W; Brandt, Elias; Pace, Wilson D; Uhrich, Christopher; Kamerick, Michael; Bunting, Aidan; Payne, Philip R O; Stephens, William E; George, Joseph M; Vance, Mark; Giacomini, Kelli; Braddy, Jason; Green, Mika K; Kahn, Michael G

    2013-01-01

    Distributed Data Networks (DDNs) offer infrastructure solutions for sharing electronic health data from across disparate data sources to support comparative effectiveness research. Data sharing mechanisms must address technical and governance concerns stemming from network security and data disclosure laws and best practices, such as HIPAA. The Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) deploys TRIAD grid technology, a common data model, detailed technical documentation, and custom software for data harmonization to facilitate data sharing in collaboration with stakeholders in the care of safety net populations. Data sharing partners host TRIAD grid nodes containing harmonized clinical data within their internal or hosted network environments. Authorized users can use a central web-based query system to request analytic data sets. SAFTINet DDN infrastructure achieved a number of data sharing objectives, including scalable and sustainable systems for ensuring harmonized data structures and terminologies and secure distributed queries. Initial implementation challenges were resolved through iterative discussions, development and implementation of technical documentation, governance, and technology solutions.

  10. A comparison of evaluation metrics for biomedical journals, articles, and websites in terms of sensitivity to topic.

    PubMed

    Fu, Lawrence D; Aphinyanaphongs, Yindalon; Wang, Lily; Aliferis, Constantin F

    2011-08-01

    Evaluating the biomedical literature and health-related websites for quality are challenging information retrieval tasks. Current commonly used methods include impact factor for journals, PubMed's clinical query filters and machine learning-based filter models for articles, and PageRank for websites. Previous work has focused on the average performance of these methods without considering the topic, and it is unknown how performance varies for specific topics or focused searches. Clinicians, researchers, and users should be aware when expected performance is not achieved for specific topics. The present work analyzes the behavior of these methods for a variety of topics. Impact factor, clinical query filters, and PageRank vary widely across different topics while a topic-specific impact factor and machine learning-based filter models are more stable. The results demonstrate that a method may perform excellently on average but struggle when used on a number of narrower topics. Topic-adjusted metrics and other topic robust methods have an advantage in such situations. Users of traditional topic-sensitive metrics should be aware of their limitations. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Enhancing acronym/abbreviation knowledge bases with semantic information.

    PubMed

    Torii, Manabu; Liu, Hongfang

    2007-10-11

    In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.

  12. The unexpected high practical value of medical ontologies.

    PubMed

    Pinciroli, Francesco; Pisanelli, Domenico M

    2006-01-01

    Ontology is no longer a mere research topic, but its relevance has been recognized in several practical fields. Current applications areas include natural language translation, e-commerce, geographic information systems, legal information systems and biology and medicine. It is the backbone of solid and effective applications in health care and can help to build more powerful and more interoperable medical information systems. The design and implementation of ontologies in medicine is mainly focused on the re-organization of medical terminologies. This is obviously a difficult task and requires a deep analysis of the structure and the concepts of such terminologies, in order to define domain ontologies able to provide both flexibility and consistency to medical information systems. The aim of this special issue of Computers in Biology and Medicine is to report the current evolution of research in biomedical ontologies, presenting both papers devoted to methodological issues and works with a more applicative emphasis.

  13. Secure count query on encrypted genomic data.

    PubMed

    Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman

    2018-05-01

    Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record contains 500 SNPs. And, it requires 69.7 s to execute the query on the same database that also includes phenotypes. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. AlzPharm: integration of neurodegeneration data using RDF.

    PubMed

    Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-05-09

    Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.

  15. AlzPharm: integration of neurodegeneration data using RDF

    PubMed Central

    Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi

    2007-01-01

    Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287

  16. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm.

    PubMed

    Ethier, Jean-François; Dameron, Olivier; Curcin, Vasa; McGilchrist, Mark M; Verheij, Robert A; Arvanitis, Theodoros N; Taweel, Adel; Delaney, Brendan C; Burgun, Anita

    2013-01-01

    Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method. We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures. Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project. We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration.

  17. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm

    PubMed Central

    Ethier, Jean-François; Dameron, Olivier; Curcin, Vasa; McGilchrist, Mark M; Verheij, Robert A; Arvanitis, Theodoros N; Taweel, Adel; Delaney, Brendan C; Burgun, Anita

    2013-01-01

    Objective Biomedical research increasingly relies on the integration of information from multiple heterogeneous data sources. Despite the fact that structural and terminological aspects of interoperability are interdependent and rely on a common set of requirements, current efforts typically address them in isolation. We propose a unified ontology-based knowledge framework to facilitate interoperability between heterogeneous sources, and investigate if using the LexEVS terminology server is a viable implementation method. Materials and methods We developed a framework based on an ontology, the general information model (GIM), to unify structural models and terminologies, together with relevant mapping sets. This allowed a uniform access to these resources within LexEVS to facilitate interoperability by various components and data sources from implementing architectures. Results Our unified framework has been tested in the context of the EU Framework Program 7 TRANSFoRm project, where it was used to achieve data integration in a retrospective diabetes cohort study. The GIM was successfully instantiated in TRANSFoRm as the clinical data integration model, and necessary mappings were created to support effective information retrieval for software tools in the project. Conclusions We present a novel, unifying approach to address interoperability challenges in heterogeneous data sources, by representing structural and semantic models in one framework. Systems using this architecture can rely solely on the GIM that abstracts over both the structure and coding. Information models, terminologies and mappings are all stored in LexEVS and can be accessed in a uniform manner (implementing the HL7 CTS2 service functional model). The system is flexible and should reduce the effort needed from data sources personnel for implementing and managing the integration. PMID:23571850

  18. Developing a comprehensive system for content-based retrieval of image and text data from a national survey

    NASA Astrophysics Data System (ADS)

    Antani, Sameer K.; Natarajan, Mukil; Long, Jonathan L.; Long, L. Rodney; Thoma, George R.

    2005-04-01

    The article describes the status of our ongoing R&D at the U.S. National Library of Medicine (NLM) towards the development of an advanced multimedia database biomedical information system that supports content-based image retrieval (CBIR). NLM maintains a collection of 17,000 digitized spinal X-rays along with text survey data from the Second National Health and Nutritional Examination Survey (NHANES II). These data serve as a rich data source for epidemiologists and researchers of osteoarthritis and musculoskeletal diseases. It is currently possible to access these through text keyword queries using our Web-based Medical Information Retrieval System (WebMIRS). CBIR methods developed specifically for biomedical images could offer direct visual searching of these images by means of example image or user sketch. We are building a system which supports hybrid queries that have text and image-content components. R&D goals include developing algorithms for robust image segmentation for localizing and identifying relevant anatomy, labeling the segmented anatomy based on its pathology, developing suitable indexing and similarity matching methods for images and image features, and associating the survey text information for query and retrieval along with the image data. Some highlights of the system developed in MATLAB and Java are: use of a networked or local centralized database for text and image data; flexibility to incorporate new research work; provides a means to control access to system components under development; and use of XML for structured reporting. The article details the design, features, and algorithms in this third revision of this prototype system, CBIR3.

  19. A multi-site cognitive task analysis for biomedical query mediation.

    PubMed

    Hruby, Gregory W; Rasmussen, Luke V; Hanauer, David; Patel, Vimla L; Cimino, James J; Weng, Chunhua

    2016-09-01

    To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: "Identify potential index phenotype," "If needed, request EHR database access rights," and "Perform query and present output to medical researcher", and 8 are invalid. We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. A Multi-Site Cognitive Task Analysis for Biomedical Query Mediation

    PubMed Central

    Hruby, Gregory W.; Rasmussen, Luke V.; Hanauer, David; Patel, Vimla; Cimino, James J.; Weng, Chunhua

    2016-01-01

    Objective To apply cognitive task analyses of the Biomedical query mediation (BQM) processes for EHR data retrieval at multiple sites towards the development of a generic BQM process model. Materials and Methods We conducted semi-structured interviews with eleven data analysts from five academic institutions and one government agency, and performed cognitive task analyses on their BQM processes. A coding schema was developed through iterative refinement and used to annotate the interview transcripts. The annotated dataset was used to reconstruct and verify each BQM process and to develop a harmonized BQM process model. A survey was conducted to evaluate the face and content validity of this harmonized model. Results The harmonized process model is hierarchical, encompassing tasks, activities, and steps. The face validity evaluation concluded the model to be representative of the BQM process. In the content validity evaluation, out of the 27 tasks for BQM, 19 meet the threshold for semi-valid, including 3 fully valid: “Identify potential index phenotype,” “If needed, request EHR database access rights,” and “Perform query and present output to medical researcher”, and 8 are invalid. Discussion We aligned the goals of the tasks within the BQM model with the five components of the reference interview. The similarity between the process of BQM and the reference interview is promising and suggests the BQM tasks are powerful for eliciting implicit information needs. Conclusions We contribute a BQM process model based on a multi-site study. This model promises to inform the standardization of the BQM process towards improved communication efficiency and accuracy. PMID:27435950

  1. Dynamic tables: an architecture for managing evolving, heterogeneous biomedical data in relational database management systems.

    PubMed

    Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis

    2007-01-01

    Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.

  2. Characteristics Desired in Clinical Data Warehouse for Biomedical Research

    PubMed Central

    Shin, Soo-Yong; Kim, Woo Sung

    2014-01-01

    Objectives Due to the unique characteristics of clinical data, clinical data warehouses (CDWs) have not been successful so far. Specifically, the use of CDWs for biomedical research has been relatively unsuccessful thus far. The characteristics necessary for the successful implementation and operation of a CDW for biomedical research have not clearly defined yet. Methods Three examples of CDWs were reviewed: a multipurpose CDW in a hospital, a CDW for independent multi-institutional research, and a CDW for research use in an institution. After reviewing the three CDW examples, we propose some key characteristics needed in a CDW for biomedical research. Results A CDW for research should include an honest broker system and an Institutional Review Board approval interface to comply with governmental regulations. It should also include a simple query interface, an anonymized data review tool, and a data extraction tool. Also, it should be a biomedical research platform for data repository use as well as data analysis. Conclusions The proposed characteristics desired in a CDW may have limited transfer value to organizations in other countries. However, these analysis results are still valid in Korea, and we have developed clinical research data warehouse based on these desiderata. PMID:24872909

  3. OPPL-Galaxy, a Galaxy tool for enhancing ontology exploitation as part of bioinformatics workflows

    PubMed Central

    2013-01-01

    Background Biomedical ontologies are key elements for building up the Life Sciences Semantic Web. Reusing and building biomedical ontologies requires flexible and versatile tools to manipulate them efficiently, in particular for enriching their axiomatic content. The Ontology Pre Processor Language (OPPL) is an OWL-based language for automating the changes to be performed in an ontology. OPPL augments the ontologists’ toolbox by providing a more efficient, and less error-prone, mechanism for enriching a biomedical ontology than that obtained by a manual treatment. Results We present OPPL-Galaxy, a wrapper for using OPPL within Galaxy. The functionality delivered by OPPL (i.e. automated ontology manipulation) can be combined with the tools and workflows devised within the Galaxy framework, resulting in an enhancement of OPPL. Use cases are provided in order to demonstrate OPPL-Galaxy’s capability for enriching, modifying and querying biomedical ontologies. Conclusions Coupling OPPL-Galaxy with other bioinformatics tools of the Galaxy framework results in a system that is more than the sum of its parts. OPPL-Galaxy opens a new dimension of analyses and exploitation of biomedical ontologies, including automated reasoning, paving the way towards advanced biological data analyses. PMID:23286517

  4. Enhancing Biomedical Text Summarization Using Semantic Relation Extraction

    PubMed Central

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization. PMID:21887336

  5. Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

    PubMed Central

    Ananiadou, Sophia

    2016-01-01

    Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks. PMID:27643689

  6. Using local language syndromic terminology in participatory epidemiology: Lessons for One Health practitioners among the Maasai of Ngorongoro, Tanzania.

    PubMed

    Queenan, Kevin; Mangesho, Peter; Ole-Neselle, Moses; Karimuribo, Esron; Rweyemamu, Mark; Kock, Richard; Häsler, Barbara

    2017-04-01

    Pastoralists and agro-pastoralists often occupy remote and hostile environments, which lack infrastructure and capacity in human and veterinary healthcare and disease surveillance systems. Participatory epidemiology (PE) and Participatory Disease Surveillance (PDS) are particularly useful in situations of resource scarcity, where conventional diagnostics and surveillance data of disease prevalence may be intermittent or limited. Livestock keepers, when participating in PE studies about health issues, commonly use their local language terms, which are often syndromic and descriptive in nature. Practitioners of PE recommend confirmation of their findings with triangulation including biomedical diagnostic techniques. However, the latter is not practiced in all studies, usually due to time, financial or logistical constraints. A cross sectional study was undertaken with the Maasai of Ngorongoro District, Tanzania. It aimed to identify the terms used to describe the infectious diseases of livestock and humans with the greatest perceived impact on livelihoods. Furthermore, it aimed to characterise the usefulness and limitations of relying on local terminology when conducting PE studies in which diagnoses were not confirmed. Semi-structured interviews were held with 23 small groups, totalling 117 community members within five villages across the district. In addition, informal discussions and field observations were conducted with village elders, district veterinary and medical officers, meat inspectors and livestock field officers. For human conditions including zoonoses, several biomedical terms are now part of the common language. Conversely, livestock conditions are described using local Maasai terms, usually associated with the signs observed by the livestock keeper. Several of these descriptive, syndromic terms are used inconsistently and showed temporal and spatial variations. This study highlights the complexity and ambiguity which may exist in local terminology when used in PE studies. It emphases the need for further analysis of such findings, including laboratory diagnosis where possible to improve specificity before incorporating them into PDS or disease control interventions. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Automated Database Mediation Using Ontological Metadata Mappings

    PubMed Central

    Marenco, Luis; Wang, Rixin; Nadkarni, Prakash

    2009-01-01

    Objective To devise an automated approach for integrating federated database information using database ontologies constructed from their extended metadata. Background One challenge of database federation is that the granularity of representation of equivalent data varies across systems. Dealing effectively with this problem is analogous to dealing with precoordinated vs. postcoordinated concepts in biomedical ontologies. Model Description The authors describe an approach based on ontological metadata mapping rules defined with elements of a global vocabulary, which allows a query specified at one granularity level to fetch data, where possible, from databases within the federation that use different granularities. This is implemented in OntoMediator, a newly developed production component of our previously described Query Integrator System. OntoMediator's operation is illustrated with a query that accesses three geographically separate, interoperating databases. An example based on SNOMED also illustrates the applicability of high-level rules to support the enforcement of constraints that can prevent inappropriate curator or power-user actions. Summary A rule-based framework simplifies the design and maintenance of systems where categories of data must be mapped to each other, for the purpose of either cross-database query or for curation of the contents of compositional controlled vocabularies. PMID:19567801

  8. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  9. Image query and indexing for digital x rays

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Thoma, George R.

    1998-12-01

    The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.

  10. An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.

    PubMed

    Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D

    2013-01-01

    Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.

  11. A LDA-based approach to promoting ranking diversity for genomics information retrieval.

    PubMed

    Chen, Yan; Yin, Xiaoshi; Li, Zhoujun; Hu, Xiaohua; Huang, Jimmy Xiangji

    2012-06-11

    In the biomedical domain, there are immense data and tremendous increase of genomics and biomedical relevant publications. The wealth of information has led to an increasing amount of interest in and need for applying information retrieval techniques to access the scientific literature in genomics and related biomedical disciplines. In many cases, the desired information of a query asked by biologists is a list of a certain type of entities covering different aspects that are related to the question, such as cells, genes, diseases, proteins, mutations, etc. Hence, it is important of a biomedical IR system to be able to provide relevant and diverse answers to fulfill biologists' information needs. However traditional IR model only concerns with the relevance between retrieved documents and user query, but does not take redundancy between retrieved documents into account. This will lead to high redundancy and low diversity in the retrieval ranked lists. In this paper, we propose an approach which employs a topic generative model called Latent Dirichlet Allocation (LDA) to promoting ranking diversity for biomedical information retrieval. Different from other approaches or models which consider aspects on word level, our approach assumes that aspects should be identified by the topics of retrieved documents. We present LDA model to discover topic distribution of retrieval passages and word distribution of each topic dimension, and then re-rank retrieval results with topic distribution similarity between passages based on N-size slide window. We perform our approach on TREC 2007 Genomics collection and two distinctive IR baseline runs, which can achieve 8% improvement over the highest Aspect MAP reported in TREC 2007 Genomics track. The proposed method is the first study of adopting topic model to genomics information retrieval, and demonstrates its effectiveness in promoting ranking diversity as well as in improving relevance of ranked lists of genomics search. Moreover, we proposes a distance measure to quantify how much a passage can increase topical diversity by considering both topical importance and topical coefficient by LDA, and the distance measure is a modified Euclidean distance.

  12. SWS: accessing SRS sites contents through Web Services.

    PubMed

    Romano, Paolo; Marra, Domenico

    2008-03-26

    Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis. New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services. We have developed 'SRS by WS' (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems. Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.

  13. Alkemio: association of chemicals with biomedical topics by text and data mining

    PubMed Central

    Gijón-Correas, José A.; Andrade-Navarro, Miguel A.; Fontaine, Jean F.

    2014-01-01

    The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. Availability: http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. PMID:24838570

  14. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications.

    PubMed

    Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

    2014-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu.

  15. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications

    PubMed Central

    Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

    2014-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu PMID:24131510

  16. Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts.

    PubMed

    Sharma, Vivekanand; Law, Wayne; Balick, Michael J; Sarkar, Indra Neil

    2017-01-01

    The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement.

  17. Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts

    PubMed Central

    Sharma, Vivekanand; Law, Wayne; Balick, Michael J.; Sarkar, Indra Neil

    2017-01-01

    The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement. PMID:29854223

  18. Simplified Deployment of Health Informatics Applications by Providing Docker Images.

    PubMed

    Löbe, Matthias; Ganslandt, Thomas; Lotzmann, Lydia; Mate, Sebastian; Christoph, Jan; Baum, Benjamin; Sariyar, Murat; Wu, Jie; Stäubert, Sebastian

    2016-01-01

    Due to the specific needs of biomedical researchers, in-house development of software is widespread. A common problem is to maintain and enhance software after the funded project has ended. Even if many tools are made open source, only a couple of projects manage to attract a user basis large enough to ensure sustainability. Reasons for this include complex installation and configuration of biomedical software as well as an ambiguous terminology of the features provided; all of which make evaluation of software laborious. Docker is a para-virtualization technology based on Linux containers that eases deployment of applications and facilitates evaluation. We investigated a suite of software developments funded by a large umbrella organization for networked medical research within the last 10 years and created Docker containers for a number of applications to support utilization and dissemination.

  19. CrossQuery: a web tool for easy associative querying of transcriptome data.

    PubMed

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  20. Using RxNorm and NDF-RT to classify medication data extracted from electronic health records: experiences from the Rochester Epidemiology Project.

    PubMed

    Pathak, Jyotishman; Murphy, Sean P; Willaert, Brian N; Kremers, Hilal M; Yawn, Barbara P; Rocca, Walter A; Chute, Christopher G

    2011-01-01

    RxNorm and NDF-RT published by the National Library of Medicine (NLM) and Veterans Affairs (VA), respectively, are two publicly available federal medication terminologies. In this study, we evaluate the applicability of RxNorm and National Drug File-Reference Terminology (NDF-RT) for extraction and classification of medication data retrieved using structured querying and natural language processing techniques from electronic health records at two different medical centers within the Rochester Epidemiology Project (REP). Specifically, we explore how mappings between RxNorm concept codes and NDF-RT drug classes can be leveraged for hierarchical organization and grouping of REP medication data, identify gaps and coverage issues, and analyze the recently released NLM's NDF-RT Web service API. Our study concludes that RxNorm and NDF-RT can be applied together for classification of medication extracted from multiple EHR systems, although several issues and challenges remain to be addressed. We further conclude that the Web service APIs developed by the NLM provide useful functionalities for such activities.

  1. Designing Reliable Cohorts of Cardiac Patients across MIMIC and eICU

    PubMed Central

    Chronaki, Catherine; Shahin, Abdullah; Mark, Roger

    2016-01-01

    The design of the patient cohort is an essential and fundamental part of any clinical patient study. Knowledge of the Electronic Health Records, underlying Database Management System, and the relevant clinical workflows are central to an effective cohort design. However, with technical, semantic, and organizational interoperability limitations, the database queries associated with a patient cohort may need to be reconfigured in every participating site. i2b2 and SHRINE advance the notion of patient cohorts as first class objects to be shared, aggregated, and recruited for research purposes across clinical sites. This paper reports on initial efforts to assess the integration of Medical Information Mart for Intensive Care (MIMIC) and Philips eICU, two large-scale anonymized intensive care unit (ICU) databases, using standard terminologies, i.e. LOINC, ICD9-CM and SNOMED-CT. Focus of this work is lab and microbiology observations and key demographics for patients with a primary cardiovascular ICD9-CM diagnosis. Results and discussion reflecting on reference core terminology standards, offer insights on efforts to combine detailed intensive care data from multiple ICUs worldwide. PMID:27774488

  2. Quality assessment of structure and language elements of written responses given by seven Scandinavian drug information centres.

    PubMed

    Reppe, Linda Amundstuen; Spigset, Olav; Kampmann, Jens Peter; Damkier, Per; Christensen, Hanne Rolighed; Böttiger, Ylva; Schjøtt, Jan

    2017-05-01

    The aim of this study was to identify structure and language elements affecting the quality of responses from Scandinavian drug information centres (DICs). Six different fictitious drug-related queries were sent to each of seven Scandinavian DICs. The centres were blinded for which queries were part of the study. The responses were assessed qualitatively by six clinical pharmacologists (internal experts) and six general practitioners (GPs, external experts). In addition, linguistic aspects of the responses were evaluated by a plain language expert. The quality of responses was generally judged as satisfactory to good. Presenting specific advice and conclusions were considered to improve the quality of the responses. However, small nuances in language formulations could affect the individual judgments of the experts, e.g. on whether or not advice was given. Some experts preferred the use of primary sources to the use of secondary and tertiary sources. Both internal and external experts criticised the use of abbreviations, professional terminology and study findings that was left unexplained. The plain language expert emphasised the importance of defining and explaining pharmacological terms to ensure that enquirers understand the response as intended. In addition, more use of active voice and less compressed text structure would be desirable. This evaluation of responses to DIC queries may give some indications on how to improve written responses on drug-related queries with respect to language and text structure. Giving specific advice and precise conclusions and avoiding too compressed language and non-standard abbreviations may aid to reach this goal.

  3. Modeling and mining term association for improving biomedical information retrieval performance.

    PubMed

    Hu, Qinmin; Huang, Jimmy Xiangji; Hu, Xiaohua

    2012-06-11

    The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

  4. Modeling and mining term association for improving biomedical information retrieval performance

    PubMed Central

    2012-01-01

    Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. PMID:22901087

  5. Formal ontologies in biomedical knowledge representation.

    PubMed

    Schulz, S; Jansen, L

    2013-01-01

    Medical decision support and other intelligent applications in the life sciences depend on increasing amounts of digital information. Knowledge bases as well as formal ontologies are being used to organize biomedical knowledge and data. However, these two kinds of artefacts are not always clearly distinguished. Whereas the popular RDF(S) standard provides an intuitive triple-based representation, it is semantically weak. Description logics based ontology languages like OWL-DL carry a clear-cut semantics, but they are computationally expensive, and they are often misinterpreted to encode all kinds of statements, including those which are not ontological. We distinguish four kinds of statements needed to comprehensively represent domain knowledge: universal statements, terminological statements, statements about particulars and contingent statements. We argue that the task of formal ontologies is solely to represent universal statements, while the non-ontological kinds of statements can nevertheless be connected with ontological representations. To illustrate these four types of representations, we use a running example from parasitology. We finally formulate recommendations for semantically adequate ontologies that can efficiently be used as a stable framework for more context-dependent biomedical knowledge representation and reasoning applications like clinical decision support systems.

  6. Conceptual mapping of user's queries to medical subject headings.

    PubMed Central

    Zieman, Y. L.; Bleich, H. L.

    1997-01-01

    This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680

  7. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  8. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  9. Cancer surviving patients' rehabilitation – understanding failure through application of theoretical perspectives from Habermas

    PubMed Central

    Mikkelsen, Thorbjørn H; Soendergaard, Jens; Jensen, Anders B; Olesen, Frede

    2008-01-01

    This study aims to analyze whether the rehabilitation of cancer surviving patients (CSPs) can be better organized. The data for this paper consists of focus group interviews (FGIs) with CSPs, general practitioners (GPs) and hospital physicians. The analysis draws on the theoretical framework of Jürgen Habermas, utilizing his notions of 'the system and the life world' and 'communicative and strategic action'. In Habermas' terminology, the social security system and the healthcare system are subsystems that belong to what he calls the 'system', where actions are based on strategic actions activated by the means of media such as money and power which provide the basis for other actors' actions. The social life, on the other hand, in Habermas' terminology, belongs to what he calls the 'life world', where communicative action is based on consensual coordination among individuals. Our material suggests that, within the hospital world, the strategic actions related to diagnosis, treatment and cure in the biomedical discourse dominate. They function as inclusion/exclusion criteria for further treatment. However, the GPs appear to accept the CSPs' previous cancer diagnosis as a precondition sufficient for providing assistance. Although the GPs use the biomedical discourse and often give biomedical examples to exemplify rehabilitation needs, they find psychosocial aspects, so-called lifeworld aspects, to be an important component of their job when helping CSPs. In this way, they appear more open to communicative action in relation to the CSPs' lifeworld than do the hospital physicians. Our data also suggests that the CSPs' lifeworld can be partly colonized by the system during hospitalization, making it difficult for CSPs when they are discharged at the end of treatment. This situation seems to be crucial to our understanding of why CSPs often feel left in limbo after discharge. We conclude that the distinction between the system and the lifeworld and the implications of a possible colonization during hospitalization offers an important theoretical framework for determining and addressing different types of rehabilitation needs. PMID:18538001

  10. Semantic Web repositories for genomics data using the eXframe platform.

    PubMed

    Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna

    2014-01-01

    With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.

  11. Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016

    PubMed Central

    Cieslewicz, Artur; Dutkiewicz, Jakub; Jedrzejek, Czeslaw

    2018-01-01

    Abstract Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data PMID:29688372

  12. BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing.

    PubMed

    Pang, Chao; Hendriksen, Dennis; Dijkstra, Martijn; van der Velde, K Joeri; Kuiper, Joel; Hillege, Hans L; Swertz, Morris A

    2015-01-01

    Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  13. BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing

    PubMed Central

    Pang, Chao; Hendriksen, Dennis; Dijkstra, Martijn; van der Velde, K Joeri; Kuiper, Joel; Hillege, Hans L; Swertz, Morris A

    2015-01-01

    Objective Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. Materials and methods To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. Results We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. Conclusions BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org. PMID:25361575

  14. Medical terminology in online patient-patient communication: evidence of high health literacy?

    PubMed

    Fage-Butler, Antoinette M; Nisbeth Jensen, Matilde

    2016-06-01

    Health communication research and guidelines often recommend that medical terminology be avoided when communicating with patients due to their limited understanding of medical terms. However, growing numbers of e-patients use the Internet to equip themselves with specialized biomedical knowledge that is couched in medical terms, which they then share on participatory media, such as online patient forums. Given possible discrepancies between preconceptions about the kind of language that patients can understand and the terms they may actually know and use, the purpose of this paper was to investigate medical terminology used by patients in online patient forums. Using data from online patient-patient communication where patients communicate with each other without expert moderation or intervention, we coded two data samples from two online patient forums dedicated to thyroid issues. Previous definitions of medical terms (dichotomized into technical and semi-technical) proved too rudimentary to encapsulate the types of medical terms the patients used. Therefore, using an inductive approach, we developed an analytical framework consisting of five categories of medical terms: dictionary-defined medical terms, co-text-defined medical terms, medical initialisms, medication brand names and colloquial technical terms. The patients in our data set used many medical terms from all of these categories. Our findings suggest the value of a situated, condition-specific approach to health literacy that recognizes the vertical kind of knowledge that patients with chronic diseases may have. We make cautious recommendations for clinical practice, arguing for an adaptive approach to medical terminology use with patients. © 2015 The Authors. Health Expectations Published by John Wiley & Sons Ltd.

  15. Explorative search of distributed bio-data to answer complex biomedical questions

    PubMed Central

    2014-01-01

    Background The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions. Results A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions. Conclusions By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery. PMID:24564278

  16. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    PubMed Central

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  17. Omicseq: a web-based search engine for exploring omics datasets

    PubMed Central

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  18. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

    PubMed

    Rebholz-Schuhmann, Dietrich; Grabmüller, Christoph; Kavaliauskas, Silvestras; Croset, Samuel; Woollard, Peter; Backofen, Rolf; Filsell, Wendy; Clark, Dominic

    2014-07-01

    In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker. Copyright © 2013. Published by Elsevier Ltd.

  19. The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies.

    PubMed

    Harispe, Sébastien; Ranwez, Sylvie; Janaqi, Stefan; Montmain, Jacky

    2014-03-01

    The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org.

  20. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  1. Exposing the cancer genome atlas as a SPARQL endpoint

    PubMed Central

    Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.

    2011-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208

  2. Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns

    PubMed Central

    Abeysinghe, Rashmie; Brooks, Michael A.; Talbert, Jeffery; Licong, Cui

    2017-01-01

    Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structurallexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus. PMID:29854100

  3. Image Acquisition Context

    PubMed Central

    Bidgood, W. Dean; Bray, Bruce; Brown, Nicolas; Mori, Angelo Rossi; Spackman, Kent A.; Golichowski, Alan; Jones, Robert H.; Korman, Louis; Dove, Brent; Hildebrand, Lloyd; Berg, Michael

    1999-01-01

    Objective: To support clinically relevant indexing of biomedical images and image-related information based on the attributes of image acquisition procedures and the judgments (observations) expressed by observers in the process of image interpretation. Design: The authors introduce the notion of “image acquisition context,” the set of attributes that describe image acquisition procedures, and present a standards-based strategy for utilizing the attributes of image acquisition context as indexing and retrieval keys for digital image libraries. Methods: The authors' indexing strategy is based on an interdependent message/terminology architecture that combines the Digital Imaging and Communication in Medicine (DICOM) standard, the SNOMED (Systematized Nomenclature of Human and Veterinary Medicine) vocabulary, and the SNOMED DICOM microglossary. The SNOMED DICOM microglossary provides context-dependent mapping of terminology to DICOM data elements. Results: The capability of embedding standard coded descriptors in DICOM image headers and image-interpretation reports improves the potential for selective retrieval of image-related information. This favorably affects information management in digital libraries. PMID:9925229

  4. The NCI Thesaurus quality assurance life cycle.

    PubMed

    de Coronado, Sherri; Wright, Lawrence W; Fragoso, Gilberto; Haber, Margaret W; Hahn-Dantona, Elizabeth A; Hartel, Francis W; Quan, Sharon L; Safran, Tracy; Thomas, Nicole; Whiteman, Lori

    2009-06-01

    The National Cancer Institute Enterprise Vocabulary Services (NCI EVS) uses a wide range of quality assurance (QA) techniques to maintain and extend NCI Thesaurus (NCIt). NCIt is a reference terminology and biomedical ontology used in a growing number of NCI and other systems that extend from translational and basic research through clinical care to public information and administrative activities. Both automated and manual QA techniques are employed throughout the editing and publication cycle, which includes inserting and editing NCIt in NCI Metathesaurus. NCI EVS conducts its own additional periodic and ongoing content QA. External reviews, and extensive evaluation by and interaction with EVS partners and other users, have also played an important part in the QA process. There have always been tensions and compromises between meeting the needs of dependent systems and providing consistent and well-structured content; external QA and feedback have been important in identifying and addressing such issues. Currently, NCI EVS is exploring new approaches to broaden external participation in the terminology development and QA process.

  5. SATORI: a system for ontology-guided visual exploration of biomedical data repositories.

    PubMed

    Lekschas, Fritz; Gehlenborg, Nils

    2018-04-01

    The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.

  6. Biomedical information retrieval across languages.

    PubMed

    Daumke, Philipp; Markü, Kornél; Poprat, Michael; Schulz, Stefan; Klar, Rüdiger

    2007-06-01

    This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.

  7. Evaluation of relational and NoSQL database architectures to manage genomic annotations.

    PubMed

    Schulz, Wade L; Nelson, Brent G; Felker, Donn K; Durant, Thomas J S; Torres, Richard

    2016-12-01

    While the adoption of next generation sequencing has rapidly expanded, the informatics infrastructure used to manage the data generated by this technology has not kept pace. Historically, relational databases have provided much of the framework for data storage and retrieval. Newer technologies based on NoSQL architectures may provide significant advantages in storage and query efficiency, thereby reducing the cost of data management. But their relative advantage when applied to biomedical data sets, such as genetic data, has not been characterized. To this end, we compared the storage, indexing, and query efficiency of a common relational database (MySQL), a document-oriented NoSQL database (MongoDB), and a relational database with NoSQL support (PostgreSQL). When used to store genomic annotations from the dbSNP database, we found the NoSQL architectures to outperform traditional, relational models for speed of data storage, indexing, and query retrieval in nearly every operation. These findings strongly support the use of novel database technologies to improve the efficiency of data management within the biological sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Analysis of PubMed User Sessions Using a Full-Day PubMed Query Log: A Comparison of Experienced and Nonexperienced PubMed Users

    PubMed Central

    2015-01-01

    Background PubMed is the largest biomedical bibliographic information source on the Internet. PubMed has been considered one of the most important and reliable sources of up-to-date health care evidence. Previous studies examined the effects of domain expertise/knowledge on search performance using PubMed. However, very little is known about PubMed users’ knowledge of information retrieval (IR) functions and their usage in query formulation. Objective The purpose of this study was to shed light on how experienced/nonexperienced PubMed users perform their search queries by analyzing a full-day query log. Our hypotheses were that (1) experienced PubMed users who use system functions quickly retrieve relevant documents and (2) nonexperienced PubMed users who do not use them have longer search sessions than experienced users. Methods To test these hypotheses, we analyzed PubMed query log data containing nearly 3 million queries. User sessions were divided into two categories: experienced and nonexperienced. We compared experienced and nonexperienced users per number of sessions, and experienced and nonexperienced user sessions per session length, with a focus on how fast they completed their sessions. Results To test our hypotheses, we measured how successful information retrieval was (at retrieving relevant documents), represented as the decrease rates of experienced and nonexperienced users from a session length of 1 to 2, 3, 4, and 5. The decrease rate (from a session length of 1 to 2) of the experienced users was significantly larger than that of the nonexperienced groups. Conclusions Experienced PubMed users retrieve relevant documents more quickly than nonexperienced PubMed users in terms of session length. PMID:26139516

  9. Alkemio: association of chemicals with biomedical topics by text and data mining.

    PubMed

    Gijón-Correas, José A; Andrade-Navarro, Miguel A; Fontaine, Jean F

    2014-07-01

    The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users. http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Extracting biomedical events from pairs of text entities

    PubMed Central

    2015-01-01

    Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478

  11. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.

    PubMed

    Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.

  12. The NASA Air Traffic Management Ontology: Technical Documentation

    NASA Technical Reports Server (NTRS)

    Keller, Richard M.

    2017-01-01

    This document is intended to serve as comprehensive documentation for the NASA Air Traffic Management (ATM) Ontology. The ATM Ontology is a conceptual model that defines key classes of entities and relationships pertaining to the US National Airspace System (NAS) and the management of air traffic through that system. A wide variety of classes are represented in the ATM Ontology, including classes corresponding to flights, aircraft, manufacturers, airports, airlines, air routes, NAS facilities, air traffic control advisories, weather phenomena, and many others. The Ontology can be useful in the context of a variety of information management tasks relevant to NAS, including information exchange, data query and search, information organization, information integration, and terminology standardization.

  13. The value of Retrospective and Concurrent Think Aloud in formative usability testing of a physician data query tool.

    PubMed

    Peute, Linda W P; de Keizer, Nicolette F; Jaspers, Monique W M

    2015-06-01

    To compare the performance of the Concurrent (CTA) and Retrospective (RTA) Think Aloud method and to assess their value in a formative usability evaluation of an Intensive Care Registry-physician data query tool designed to support ICU quality improvement processes. Sixteen representative intensive care physicians participated in the usability evaluation study. Subjects were allocated to either the CTA or RTA method by a matched randomized design. Each subject performed six usability-testing tasks of varying complexity in the query tool in a real-working context. Methods were compared with regard to number and type of problems detected. Verbal protocols of CTA and RTA were analyzed in depth to assess differences in verbal output. Standardized measures were applied to assess thoroughness in usability problem detection weighted per problem severity level and method overall effectiveness in detecting usability problems with regard to the time subjects spent per method. The usability evaluation of the data query tool revealed a total of 43 unique usability problems that the intensive care physicians encountered. CTA detected unique usability problems with regard to graphics/symbols, navigation issues, error messages, and the organization of information on the query tool's screens. RTA detected unique issues concerning system match with subjects' language and applied terminology. The in-depth verbal protocol analysis of CTA provided information on intensive care physicians' query design strategies. Overall, CTA performed significantly better than RTA in detecting usability problems. CTA usability problem detection effectiveness was 0.80 vs. 0.62 (p<0.05) respectively, with an average difference of 42% less time spent per subject compared to RTA. In addition, CTA was more thorough in detecting usability problems of a moderate (0.85 vs. 0.7) and severe nature (0.71 vs. 0.57). In this study, the CTA is more effective in usability-problem detection and provided clarification of intensive care physician query design strategies to inform redesign of the query tool. However, CTA does not outperform RTA. The RTA additionally elucidated unique usability problems and new user requirements. Based on the results of this study, we recommend the use of CTA in formative usability evaluation studies of health information technology. However, we recommend further research on the application of RTA in usability studies with regard to user expertise and experience when focusing on user profile customized (re)design. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability.

    PubMed

    Komatsoulis, George A; Warzel, Denise B; Hartel, Francis W; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; Coronado, Sherri de; Reeves, Dianne M; Hadfield, Jillaine B; Ludet, Christophe; Covitz, Peter A

    2008-02-01

    One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service-Oriented Architecture (SSOA) for cancer research by the National Cancer Institute's cancer Biomedical Informatics Grid (caBIG).

  15. caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability

    PubMed Central

    Komatsoulis, George A.; Warzel, Denise B.; Hartel, Frank W.; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; de Coronado, Sherri; Reeves, Dianne M.; Hadfield, Jillaine B.; Ludet, Christophe; Covitz, Peter A.

    2008-01-01

    One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service Oriented Architecture (SSOA) for cancer research by the National Cancer Institute’s cancer Biomedical Informatics Grid (caBIG™). PMID:17512259

  16. A unified anatomy ontology of the vertebrate skeletal system.

    PubMed

    Dahdul, Wasila M; Balhoff, James P; Blackburn, David C; Diehl, Alexander D; Haendel, Melissa A; Hall, Brian K; Lapp, Hilmar; Lundberg, John G; Mungall, Christopher J; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E; Vickaryous, Matthew K; Westerfield, Monte; Mabee, Paula M

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.

  17. A Unified Anatomy Ontology of the Vertebrate Skeletal System

    PubMed Central

    Dahdul, Wasila M.; Balhoff, James P.; Blackburn, David C.; Diehl, Alexander D.; Haendel, Melissa A.; Hall, Brian K.; Lapp, Hilmar; Lundberg, John G.; Mungall, Christopher J.; Ringwald, Martin; Segerdell, Erik; Van Slyke, Ceri E.; Vickaryous, Matthew K.; Westerfield, Monte; Mabee, Paula M.

    2012-01-01

    The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity. PMID:23251424

  18. PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm

    PubMed Central

    Plikus, Maksim V; Zhang, Zina; Chuong, Cheng-Ming

    2006-01-01

    Background Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. Results PubFocus web server automates analysis of MEDLINE/PubMed search queries by enriching them with two widely used human factor-based bibliometric indicators of publication quality: journal impact factor and volume of forward references. In addition to providing basic volumetric statistics, PubFocus also prioritizes citations and evaluates authors' impact on the field of search. PubFocus also analyses presence and occurrence of biomedical key terms within citations by utilizing controlled vocabularies. Conclusion We have developed citations' prioritisation algorithm based on journal impact factor, forward referencing volume, referencing dynamics, and author's contribution level. It can be applied either to the primary set of PubMed search results or to the subsets of these results identified through key terms from controlled biomedical vocabularies and ontologies. NCI (National Cancer Institute) thesaurus and MGD (Mouse Genome Database) mammalian gene orthology have been implemented for key terms analytics. PubFocus provides a scalable platform for the integration of multiple available ontology databases. PubFocus analytics can be adapted for input sources of biomedical citations other than PubMed. PMID:17014720

  19. PAUSE: Predictive Analytics Using SPARQL-Endpoints

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas R; Ainsworth, Keela; Bond, Nathaniel

    2014-07-11

    This invention relates to the medical industry and more specifically to methods of predicting risks. With the impetus towards personalized and evidence-based medicine, the need for a framework to analyze/interpret quantitative measurements (blood work, toxicology, etc.) with qualitative descriptions (specialist reports after reading images, bio-medical knowledgebase, etc.) to predict diagnostic risks is fast emerging. We describe a software solution that leverages hardware for scalable in-memory analytics and applies next-generation semantic query tools on medical data.

  20. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more.

    PubMed

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-07-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: 'Find all diseases associated with Bisphenol A'. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Semantic Web repositories for genomics data using the eXframe platform

    PubMed Central

    2014-01-01

    Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072

  2. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more

    PubMed Central

    Liu, Yifeng; Liang, Yongjie; Wishart, David

    2015-01-01

    PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation. PMID:25925572

  3. Exposing the cancer genome atlas as a SPARQL endpoint.

    PubMed

    Deus, Helena F; Veiga, Diogo F; Freire, Pablo R; Weinstein, John N; Mills, Gordon B; Almeida, Jonas S

    2010-12-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. Copyright © 2010 Elsevier Inc. All rights reserved.

  4. Bio-TDS: bioscience query tool discovery system.

    PubMed

    Gnimpieba, Etienne Z; VanDiermen, Menno S; Gustafson, Shayla M; Conn, Bill; Lushbough, Carol M

    2017-01-04

    Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 12 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS's scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on BIOLOGICAL DATA ANALYSIS: The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    PubMed

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Omicseq: a web-based search engine for exploring omics datasets.

    PubMed

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis

    PubMed Central

    Rahman, M. M.; Antani, S. K.; Thoma, G. R.

    2011-01-01

    We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall. PMID:21822350

  8. Development, dissemination, and applications of a new terminological resource, the Q-Code taxonomy for professional aspects of general practice/family medicine.

    PubMed

    Jamoulle, Marc; Resnick, Melissa; Grosjean, Julien; Ittoo, Ashwin; Cardillo, Elena; Vander Stichele, Robert; Darmoni, Stefan; Vanmeerbeek, Marc

    2018-12-01

    While documentation of clinical aspects of General Practice/Family Medicine (GP/FM) is assured by the International Classification of Primary Care (ICPC), there is no taxonomy for the professional aspects (context and management) of GP/FM. To present the development, dissemination, applications, and resulting face validity of the Q-Codes taxonomy specifically designed to describe contextual features of GP/FM, proposed as an extension to the ICPC. The Q-Codes taxonomy was developed from Lamberts' seminal idea for indexing contextual content (1987) by a multi-disciplinary team of knowledge engineers, linguists and general practitioners, through a qualitative and iterative analysis of 1702 abstracts from six GP/FM conferences using Atlas.ti software. A total of 182 concepts, called Q-Codes, representing professional aspects of GP/FM were identified and organized in a taxonomy. Dissemination: The taxonomy is published as an online terminological resource, using semantic web techniques and web ontology language (OWL) ( http://www.hetop.eu/Q ). Each Q-Code is identified with a unique resource identifier (URI), and provided with preferred terms, and scope notes in ten languages (Portuguese, Spanish, English, French, Dutch, Korean, Vietnamese, Turkish, Georgian, German) and search filters for MEDLINE and web searches. This taxonomy has already been used to support queries in bibliographic databases (e.g., MEDLINE), to facilitate indexing of grey literature in GP/FM as congress abstracts, master theses, websites and as an educational tool in vocational teaching, Conclusions: The rapidly growing list of practical applications provides face-validity for the usefulness of this freely available new terminological resource.

  9. Reliable and Persistent Identification of Linked Data Elements

    NASA Astrophysics Data System (ADS)

    Wood, David

    Linked Data techniques rely upon common terminology in a manner similar to a relational database'vs reliance on a schema. Linked Data terminology anchors metadata descriptions and facilitates navigation of information. Common vocabularies ease the human, social tasks of understanding datasets sufficiently to construct queries and help to relate otherwise disparate datasets. Vocabulary terms must, when using the Resource Description Framework, be grounded in URIs. A current bestpractice on the World Wide Web is to serve vocabulary terms as Uniform Resource Locators (URLs) and present both human-readable and machine-readable representations to the public. Linked Data terminology published to theWorldWideWeb may be used by others without reference or notification to the publishing party. That presents a problem: Vocabulary publishers take on an implicit responsibility to maintain and publish their terms via the URLs originally assigned, regardless of the inconvenience such a responsibility may cause. Over the course of years, people change jobs, publishing organizations change Internet domain names, computers change IP addresses,systems administrators publish old material in new ways. Clearly, a mechanism is required to manageWeb-based vocabularies over a long term. This chapter places Linked Data vocabularies in context with the wider concepts of metadata in general and specifically metadata on the Web. Persistent identifier mechanisms are reviewed, with a particular emphasis on Persistent URLs, or PURLs. PURLs and PURL services are discussed in the context of Linked Data. Finally, historic weaknesses of PURLs are resolved by the introduction of a federation of PURL services to address needs specific to Linked Data.

  10. A journey to Semantic Web query federation in the life sciences.

    PubMed

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.

  11. A journey to Semantic Web query federation in the life sciences

    PubMed Central

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-01-01

    Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394

  12. NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation.

    PubMed

    Martínez-Romero, Marcos; Jonquet, Clement; O'Connor, Martin J; Graybeal, John; Pazos, Alejandro; Musen, Mark A

    2017-06-07

    Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a novel recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four different criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies to use together. It also can be customized to fit the needs of different ontology recommendation scenarios. Ontology Recommender 2.0 suggests relevant ontologies for annotating biomedical text data. It combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available (both via the user interface at http://bioportal.bioontology.org/recommender , and via a Web service API).

  13. A novel biomedical image indexing and retrieval system via deep preference learning.

    PubMed

    Pang, Shuchao; Orgun, Mehmet A; Yu, Zhezhou

    2018-05-01

    The traditional biomedical image retrieval methods as well as content-based image retrieval (CBIR) methods originally designed for non-biomedical images either only consider using pixel and low-level features to describe an image or use deep features to describe images but still leave a lot of room for improving both accuracy and efficiency. In this work, we propose a new approach, which exploits deep learning technology to extract the high-level and compact features from biomedical images. The deep feature extraction process leverages multiple hidden layers to capture substantial feature structures of high-resolution images and represent them at different levels of abstraction, leading to an improved performance for indexing and retrieval of biomedical images. We exploit the current popular and multi-layered deep neural networks, namely, stacked denoising autoencoders (SDAE) and convolutional neural networks (CNN) to represent the discriminative features of biomedical images by transferring the feature representations and parameters of pre-trained deep neural networks from another domain. Moreover, in order to index all the images for finding the similarly referenced images, we also introduce preference learning technology to train and learn a kind of a preference model for the query image, which can output the similarity ranking list of images from a biomedical image database. To the best of our knowledge, this paper introduces preference learning technology for the first time into biomedical image retrieval. We evaluate the performance of two powerful algorithms based on our proposed system and compare them with those of popular biomedical image indexing approaches and existing regular image retrieval methods with detailed experiments over several well-known public biomedical image databases. Based on different criteria for the evaluation of retrieval performance, experimental results demonstrate that our proposed algorithms outperform the state-of-the-art techniques in indexing biomedical images. We propose a novel and automated indexing system based on deep preference learning to characterize biomedical images for developing computer aided diagnosis (CAD) systems in healthcare. Our proposed system shows an outstanding indexing ability and high efficiency for biomedical image retrieval applications and it can be used to collect and annotate the high-resolution images in a biomedical database for further biomedical image research and applications. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Bioinformatics in proteomics: application, terminology, and pitfalls.

    PubMed

    Wiemer, Jan C; Prokudin, Alexander

    2004-01-01

    Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.

  15. BioFed: federated query processing over life sciences linked open data.

    PubMed

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint's availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

  16. Concept annotation in the CRAFT corpus.

    PubMed

    Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

    2012-07-09

    Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

  17. Concept annotation in the CRAFT corpus

    PubMed Central

    2012-01-01

    Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. PMID:22776079

  18. Past and future trends in cancer and biomedical research: a comparison between Egypt and the world using PubMed-indexed publications.

    PubMed

    Zeeneldin, Ahmed Abdelmabood; Taha, Fatma Mohamed; Moneer, Manar

    2012-07-10

    PubMed is a free web literature search service that contains almost 21 millions of abstracts and publications with almost 5 million user queries daily. The purposes of the study were to compare trends in PubMed-indexed cancer and biomedical publications from Egypt to that of the world and to predict future publication volumes. The PubMed was searched for the biomedical publications between 1991 and 2010 (publications dates). Affiliation was then limited to Egypt. Further limitation was applied to cancer, human and animal publications. Poisson regression model was used for prediction of future number of publications between 2011 and 2020. Cancer publications contributed 23% to biomedical publications both for Egypt and the world. Egyptian biomedical and cancer publications contributed about 0.13% to their world counterparts. This contribution was more than doubled over the study period. Egyptian and world's publications increased from year to year with rapid rise starting the year 2003. Egyptian as well as world's human cancer publications showed the highest increases. Egyptian publications had some peculiarities; they showed some drop at the years 1994 and 2002 and apart from the decline in the animal: human ratio with time, all Egyptian publications in the period 1991-2000 were significantly more than those in 2001-2010 (P < 0.05 for all). By 2020, Egyptian biomedical and cancer publications will increase by 158.7% and 280% relative to 2010 to constitute 0.34% and 0.17% of total PubMed publications, respectively. The Egyptian contribution to world's biomedical and cancer publications needs significant improvements through research strategic planning, setting national research priorities, adequate funding and researchers' training.

  19. Past and future trends in cancer and biomedical research: a comparison between Egypt and the World using PubMed-indexed publications

    PubMed Central

    2012-01-01

    Background PubMed is a free web literature search service that contains almost 21 millions of abstracts and publications with almost 5 million user queries daily. The purposes of the study were to compare trends in PubMed-indexed cancer and biomedical publications from Egypt to that of the world and to predict future publication volumes. Methods The PubMed was searched for the biomedical publications between 1991 and 2010 (publications dates). Affiliation was then limited to Egypt. Further limitation was applied to cancer, human and animal publications. Poisson regression model was used for prediction of future number of publications between 2011 and 2020. Results Cancer publications contributed 23% to biomedical publications both for Egypt and the world. Egyptian biomedical and cancer publications contributed about 0.13% to their world counterparts. This contribution was more than doubled over the study period. Egyptian and world’s publications increased from year to year with rapid rise starting the year 2003. Egyptian as well as world’s human cancer publications showed the highest increases. Egyptian publications had some peculiarities; they showed some drop at the years 1994 and 2002 and apart from the decline in the animal: human ratio with time, all Egyptian publications in the period 1991-2000 were significantly more than those in 2001-2010 (P < 0.05 for all). By 2020, Egyptian biomedical and cancer publications will increase by 158.7% and 280% relative to 2010 to constitute 0.34% and 0.17% of total PubMed publications, respectively. Conclusions The Egyptian contribution to world’s biomedical and cancer publications needs significant improvements through research strategic planning, setting national research priorities, adequate funding and researchers’ training. PMID:22780908

  20. An advanced search engine for patent analytics in medicinal chemistry.

    PubMed

    Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick

    2012-01-01

    Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

  1. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data

    PubMed Central

    Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191

  2. Protecting personal data in epidemiological research: DataSHIELD and UK law.

    PubMed

    Wallace, Susan E; Gaye, Amadou; Shoush, Osama; Burton, Paul R

    2014-01-01

    Data from individual collections, such as biobanks and cohort studies, are now being shared in order to create combined datasets which can be queried to ask complex scientific questions. But this sharing must be done with due regard for data protection principles. DataSHIELD is a new technology that queries nonaggregated, individual-level data in situ but returns query data in an anonymous format. This raises questions of the ability of DataSHIELD to adequately protect participant confidentiality. An ethico-legal analysis was conducted that examined each step of the DataSHIELD process from the perspective of UK case law, regulations, and guidance. DataSHIELD reaches agreed UK standards of protection for the sharing of biomedical data. All direct processing of personal data is conducted within the protected environment of the contributing study; participating studies have scientific, ethics, and data access approvals in place prior to the analysis; studies are clear that their consents conform with this use of data, and participants are informed that anonymisation for further disclosure will take place. DataSHIELD can provide a flexible means of interrogating data while protecting the participants' confidentiality in accordance with applicable legislation and guidance. © 2014 S. Karger AG, Basel.

  3. Biomedical informatics: development of a comprehensive data warehouse for clinical and genomic breast cancer research.

    PubMed

    Hu, Hai; Brzeski, Henry; Hutchins, Joe; Ramaraj, Mohan; Qu, Long; Xiong, Richard; Kalathil, Surendran; Kato, Rand; Tenkillaya, Santhosh; Carney, Jerry; Redd, Rosann; Arkalgudvenkata, Sheshkumar; Shahzad, Kashif; Scott, Richard; Cheng, Hui; Meadow, Stephen; McMichael, John; Sheu, Shwu-Lin; Rosendale, David; Kvecher, Leonid; Ahern, Stephen; Yang, Song; Zhang, Yonghong; Jordan, Rick; Somiari, Stella B; Hooke, Jeffrey; Shriver, Craig D; Somiari, Richard I; Liebman, Michael N

    2004-10-01

    The Windber Research Institute is an integrated high-throughput research center employing clinical, genomic and proteomic platforms to produce terabyte levels of data. We use biomedical informatics technologies to integrate all of these operations. This report includes information on a multi-year, multi-phase hybrid data warehouse project currently under development in the Institute. The purpose of the warehouse is to host the terabyte-level of internal experimentally generated data as well as data from public sources. We have previously reported on the phase I development, which integrated limited internal data sources and selected public databases. Currently, we are completing phase II development, which integrates our internal automated data sources and develops visualization tools to query across these data types. This paper summarizes our clinical and experimental operations, the data warehouse development, and the challenges we have faced. In phase III we plan to federate additional manual internal and public data sources and then to develop and adapt more data analysis and mining tools. We expect that the final implementation of the data warehouse will greatly facilitate biomedical informatics research.

  4. Twelfth International Symposium on Methodologies for Intelligent Systems (ISMIS󈧄) Held in Charlotte, North Carolina on October 11-14, 2000

    DTIC Science & Technology

    2000-10-14

    without any knowledge of the problem area. Therefore, Darwinian-type evolutionary computation has found a very wide range of applications, including many ...the author examined many biomedical studies that included literature searches. The Science Citation Index (SCL) Abstracts of these studies...yield many records that are non-relevant to the main technical themes of the study. In summary, these types of simple limited queries can result in two

  5. The caCORE Software Development Kit: streamlining construction of interoperable biomedical information services.

    PubMed

    Phillips, Joshua; Chilukuri, Ram; Fragoso, Gilberto; Warzel, Denise; Covitz, Peter A

    2006-01-06

    Robust, programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources are challenging to construct. Such systems require the adoption of common information models, data representations and terminology standards as well as documented application programming interfaces (APIs). The National Cancer Institute (NCI) developed the cancer common ontologic representation environment (caCORE) to provide the infrastructure necessary to achieve interoperability across the systems it develops or sponsors. The caCORE Software Development Kit (SDK) was designed to provide developers both within and outside the NCI with the tools needed to construct such interoperable software systems. The caCORE SDK requires a Unified Modeling Language (UML) tool to begin the development workflow with the construction of a domain information model in the form of a UML Class Diagram. Models are annotated with concepts and definitions from a description logic terminology source using the Semantic Connector component. The annotated model is registered in the Cancer Data Standards Repository (caDSR) using the UML Loader component. System software is automatically generated using the Codegen component, which produces middleware that runs on an application server. The caCORE SDK was initially tested and validated using a seven-class UML model, and has been used to generate the caCORE production system, which includes models with dozens of classes. The deployed system supports access through object-oriented APIs with consistent syntax for retrieval of any type of data object across all classes in the original UML model. The caCORE SDK is currently being used by several development teams, including by participants in the cancer biomedical informatics grid (caBIG) program, to create compatible data services. caBIG compatibility standards are based upon caCORE resources, and thus the caCORE SDK has emerged as a key enabling technology for caBIG. The caCORE SDK substantially lowers the barrier to implementing systems that are syntactically and semantically interoperable by providing workflow and automation tools that standardize and expedite modeling, development, and deployment. It has gained acceptance among developers in the caBIG program, and is expected to provide a common mechanism for creating data service nodes on the data grid that is under development.

  6. Adapting a Clinical Data Repository to ICD-10-CM through the use of a Terminology Repository

    PubMed Central

    Cimino, James J.; Remennick, Lyubov

    2014-01-01

    Clinical data repositories frequently contain patient diagnoses coded with the International Classification of Diseases, Ninth Revision (ICD-9-CM). These repositories now need to accommodate data coded with the Tenth Revision (ICD-10-CM). Database users wish to retrieve relevant data regardless of the system by which they are coded. We demonstrate how a terminology repository (the Research Entities Dictionary or RED) serves as an ontology relating terms of both ICD versions to each other to support seamless version-independent retrieval from the Biomedical Translational Research Information System (BTRIS) at the National Institutes of Health. We make use of the Center for Medicare and Medicaid Services’ General Equivalence Mappings (GEMs) to reduce the modeling effort required to determine whether ICD-10-CM terms should be added to the RED as new concepts or as synonyms of existing concepts. A divide-and-conquer approach is used to develop integration heuristics that offer a satisfactory interim solution and facilitate additional refinement of the integration as time and resources allow. PMID:25954344

  7. Implementation of a platform dedicated to the biomedical analysis terminologies management

    PubMed Central

    Cormont, Sylvie; Vandenbussche, Pierre-Yves; Buemi, Antoine; Delahousse, Jean; Lepage, Eric; Charlet, Jean

    2011-01-01

    Background and objectives. Assistance Publique - Hôpitaux de Paris (AP-HP) is implementing a new laboratory management system (LMS) common to the 12 hospital groups. First step to this process was to acquire a biological analysis dictionary. This dictionary is interfaced with the international nomenclature LOINC, and has been developed in collaboration with experts from all biological disciplines. In this paper we describe in three steps (modeling, data migration and integration/verification) the implementation of a platform for publishing and maintaining the AP-HP laboratory data dictionary (AnaBio). Material and Methods. Due to data complexity and volume, setting up a platform dedicated to the terminology management was a key requirement. This is an enhancement tackling identified weaknesses of previous spreadsheet tool. Our core model allows interoperability regarding data exchange standards and dictionary evolution. Results. We completed our goals within one year. In addition, structuring data representation has lead to a significant data quality improvement (impacting more than 10% of data). The platform is active in the 21 hospitals of the institution spread into 165 laboratories. PMID:22195205

  8. A methodology for extending domain coverage in SemRep.

    PubMed

    Rosemblat, Graciela; Shin, Dongwook; Kilicoglu, Halil; Sneiderman, Charles; Rindflesch, Thomas C

    2013-12-01

    We describe a domain-independent methodology to extend SemRep coverage beyond the biomedical domain. SemRep, a natural language processing application originally designed for biomedical texts, uses the knowledge sources provided by the Unified Medical Language System (UMLS©). Ontological and terminological extensions to the system are needed in order to support other areas of knowledge. We extended SemRep's application by developing a semantic representation of a previously unsupported domain. This was achieved by adapting well-known ontology engineering phases and integrating them with the UMLS knowledge sources on which SemRep crucially depends. While the process to extend SemRep coverage has been successfully applied in earlier projects, this paper presents in detail the step-wise approach we followed and the mechanisms implemented. A case study in the field of medical informatics illustrates how the ontology engineering phases have been adapted for optimal integration with the UMLS. We provide qualitative and quantitative results, which indicate the validity and usefulness of our methodology. Published by Elsevier Inc.

  9. Ontology-based knowledge management for personalized adverse drug events detection.

    PubMed

    Cao, Feng; Sun, Xingzhi; Wang, Xiaoyuan; Li, Bo; Li, Jing; Pan, Yue

    2011-01-01

    Since Adverse Drug Event (ADE) has become a leading cause of death around the world, there arises high demand for helping clinicians or patients to identify possible hazards from drug effects. Motivated by this, we present a personalized ADE detection system, with the focus on applying ontology-based knowledge management techniques to enhance ADE detection services. The development of electronic health records makes it possible to automate the personalized ADE detection, i.e., to take patient clinical conditions into account during ADE detection. Specifically, we define the ADE ontology to uniformly manage the ADE knowledge from multiple sources. We take advantage of the rich semantics from the terminology SNOMED-CT and apply it to ADE detection via the semantic query and reasoning.

  10. Chronic Diseases in North-West Tanzania and Southern Uganda. Public Perceptions of Terminologies, Aetiologies, Symptoms and Preferred Management.

    PubMed

    Nnko, Soori; Bukenya, Dominic; Kavishe, Bazil Balthazar; Biraro, Samuel; Peck, Robert; Kapiga, Saidi; Grosskurth, Heiner; Seeley, Janet

    2015-01-01

    Research has shown that health system utilization is low for chronic diseases (CDs) other than HIV. We describe the knowledge and perceptions of CDs identified from rural and urban communities in north-west Tanzania and southern Uganda. Data were collected through a quantitative population survey, a quantitative health facility survey and focus group discussions (FGDs) and in-depth interviews (IDIs) in subgroups of population survey participants. The main focus of this paper is the findings from the FGDs and IDIs. We conducted 24 FGDs, involving approximately 180 adult participants and IDIs with 116 participants (≥18 years). CDs studied included: asthma/chronic obstructive lung disease (COPD), diabetes, epilepsy, hypertension, cardiac failure and HIV- related disease. The understanding of most chronic conditions involved a combination of biomedical information, gleaned from health facility visits, local people who had suffered from a complaint or knew others who had and beliefs drawn from information shared in the community. The biomedical contribution shows some understanding of the aetiology of a condition and the management of that condition. However, local beliefs for certain conditions (such as epilepsy) suggest that biomedical treatment may be futile and therefore work counter to biomedical prescriptions for management. Current perceptions of selected CDs may represent a barrier that prevents people from adopting efficacious health and treatment seeking behaviours. Interventions to improve this situation must include efforts to improve the quality of existing health services, so that people can access relevant, reliable and trustworthy services.

  11. A study of the influence of task familiarity on user behaviors and performance with a MeSH term suggestion interface for PubMed bibliographic search.

    PubMed

    Tang, Muh-Chyun; Liu, Ying-Hsang; Wu, Wan-Ching

    2013-09-01

    Previous research has shown that information seekers in biomedical domain need more support in formulating their queries. A user study was conducted to evaluate the effectiveness of a metadata based query suggestion interface for PubMed bibliographic search. The study also investigated the impact of search task familiarity on search behaviors and the effectiveness of the interface. A real user, user search request and real system approach was used for the study. Unlike tradition IR evaluation, where assigned tasks were used, the participants were asked to search requests of their own. Forty-four researchers in Health Sciences participated in the evaluation - each conducted two research requests of their own, alternately with the proposed interface and the PubMed baseline. Several performance criteria were measured to assess the potential benefits of the experimental interface, including users' assessment of their original and eventual queries, the perceived usefulness of the interfaces, satisfaction with the search results, and the average relevance score of the saved records. The results show that, when searching for an unfamiliar topic, users were more likely to change their queries, indicating the effect of familiarity on search behaviors. The results also show that the interface scored higher on several of the performance criteria, such as the "goodness" of the queries, perceived usefulness, and user satisfaction. Furthermore, in line with our hypothesis, the proposed interface was relatively more effective when less familiar search requests were attempted. Results indicate that there is a selective compatibility between search familiarity and search interface. One implication of the research for system evaluation is the importance of taking into consideration task familiarity when assessing the effectiveness of interactive IR systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. What Does Anonymization Mean? DataSHIELD and the Need for Consensus on Anonymization Terminology.

    PubMed

    Wallace, Susan E

    2016-06-01

    Anonymization is a recognized process by which identifiers can be removed from identifiable data to protect an individual's confidentiality and is used as a standard practice when sharing data in biomedical research. However, a plethora of terms, such as coding, pseudonymization, unlinked, and deidentified, have been and continue to be used, leading to confusion and uncertainty. This article shows that this is a historic problem and argues that such continuing uncertainty regarding the levels of protection given to data risks damaging initiatives designed to assist researchers conducting cross-national studies and sharing data internationally. DataSHIELD and the creation of a legal template are used as examples of initiatives that rely on anonymization, but where the inconsistency in terminology could hinder progress. More broadly, this article argues that there is a real possibility that there could be possible damage to the public's trust in research and the institutions that carry it out by relying on vague notions of the anonymization process. Research participants whose lack of clear understanding of the research process is compensated for by trusting those carrying out the research may have that trust damaged if the level of protection given to their data does not match their expectations. One step toward ensuring understanding between parties would be consistent use of clearly defined terminology used internationally, so that all those involved are clear on the level of identifiability of any particular set of data and, therefore, how that data can be accessed and shared.

  13. User centered and ontology based information retrieval system for life sciences.

    PubMed

    Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent

    2012-01-25

    Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.

  14. User centered and ontology based information retrieval system for life sciences

    PubMed Central

    2012-01-01

    Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375

  15. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  16. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  17. Biomedical Simulation: Evolution, Concepts, Challenges and Future Trends.

    PubMed

    Sá-Couto, Carla; Patrão, Luís; Maio-Matos, Francisco; Pêgo, José Miguel

    2016-12-30

    Biomedical simulation is an effective educational complement for healthcare training, both at undergraduate and postgraduate level. It enables knowledge, skills and attitudes to be acquired in a safe, educationally orientated and efficient manner. In this context, simulation provides skills and experience that facilitate the transfer of cognitive, psychomotor and proper communication competences, thus changing behavior and attitudes, and ultimately improving patient safety. Beyond the impact on individual and team performance, simulation provides an opportunity to study organizational failures and improve system performance. Over the last decades, simulation in healthcare had a slow but steady growth, with a visible maturation in the last ten years. The simulation community must continue to provide the core leadership in developing standards. There is a need for strategies and policy development to ensure its coordinated and cost-effective implementation, applied to patient safety. This paper reviews the evolutionary movements of biomedical simulation, including a review of the Portuguese initiatives and nationwide programs. For leveling knowledge and standardize terminology, basic but essential concepts in clinical simulation, together with some considerations on assessment, validation and reliability are presented. The final sections discuss the current challenges and future initiatives and strategies, crucial for the integration of simulation programs in the greater movement toward patient safety.

  18. Creating a classification of image types in the medical literature for visual categorization

    NASA Astrophysics Data System (ADS)

    Müller, Henning; Kalpathy-Cramer, Jayashree; Demner-Fushman, Dina; Antani, Sameer

    2012-02-01

    Content-based image retrieval (CBIR) from specialized collections has often been proposed for use in such areas as diagnostic aid, clinical decision support, and teaching. The visual retrieval from broad image collections such as teaching files, the medical literature or web images, by contrast, has not yet reached a high maturity level compared to textual information retrieval. Visual image classification into a relatively small number of classes (20-100) on the other hand, has shown to deliver good results in several benchmarks. It is, however, currently underused as a basic technology for retrieval tasks, for example, to limit the search space. Most classification schemes for medical images are focused on specific areas and consider mainly the medical image types (modalities), imaged anatomy, and view, and merge them into a single descriptor or classification hierarchy. Furthermore, they often ignore other important image types such as biological images, statistical figures, flowcharts, and diagrams that frequently occur in the biomedical literature. Most of the current classifications have also been created for radiology images, which are not the only types to be taken into account. With Open Access becoming increasingly widespread particularly in medicine, images from the biomedical literature are more easily available for use. Visual information from these images and knowledge that an image is of a specific type or medical modality could enrich retrieval. This enrichment is hampered by the lack of a commonly agreed image classification scheme. This paper presents a hierarchy for classification of biomedical illustrations with the goal of using it for visual classification and thus as a basis for retrieval. The proposed hierarchy is based on relevant parts of existing terminologies, such as the IRMA-code (Image Retrieval in Medical Applications), ad hoc classifications and hierarchies used in imageCLEF (Image retrieval task at the Cross-Language Evaluation Forum) and NLM's (National Library of Medicine) OpenI. Furtheron, mappings to NLM's MeSH (Medical Subject Headings), RSNA's RadLex (Radiological Society of North America, Radiology Lexicon), and the IRMA code are also attempted for relevant image types. Advantages derived from such hierarchical classification for medical image retrieval are being evaluated through benchmarks such as imageCLEF, and R&D systems such as NLM's OpenI. The goal is to extend this hierarchy progressively and (through adding image types occurring in the biomedical literature) to have a terminology for visual image classification based on image types distinguishable by visual means and occurring in the medical open access literature.

  19. NeuroNames: an ontology for the BrainInfo portal to neuroscience on the web.

    PubMed

    Bowden, Douglas M; Song, Evan; Kosheleva, Julia; Dubach, Mark F

    2012-01-01

    BrainInfo ( http://braininfo.org ) is a growing portal to neuroscientific information on the Web. It is indexed by NeuroNames, an ontology designed to compensate for ambiguities in neuroanatomical nomenclature. The 20-year old ontology continues to evolve toward the ideal of recognizing all names of neuroanatomical entities and accommodating all structural concepts about which neuroscientists communicate, including multiple concepts of entities for which neuroanatomists have yet to determine the best or 'true' conceptualization. To make the definitions of structural concepts unambiguous and terminologically consistent we created a 'default vocabulary' of unique structure names selected from existing terminology. We selected standard names by criteria designed to maximize practicality for use in verbal communication as well as computerized knowledge management. The ontology of NeuroNames accommodates synonyms and homonyms of the standard terms in many languages. It defines complex structures as models composed of primary structures, which are defined in unambiguous operational terms. NeuroNames currently relates more than 16,000 names in eight languages to some 2,500 neuroanatomical concepts. The ontology is maintained in a relational database with three core tables: Names, Concepts and Models. BrainInfo uses NeuroNames to index information by structure, to interpret users' queries and to clarify terminology on remote web pages. NeuroNames is a resource vocabulary of the NLM's Unified Medical Language System (UMLS, 2011) and the basis for the brain regions component of NIFSTD (NeuroLex, 2011). The current version has been downloaded to hundreds of laboratories for indexing data and linking to BrainInfo, which attracts some 400 visitors/day, downloading 2,000 pages/day.

  20. Construction of an annotated corpus to support biomedical information extraction

    PubMed Central

    Thompson, Paul; Iqbal, Syed A; McNaught, John; Ananiadou, Sophia

    2009-01-01

    Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes. PMID:19852798

  1. Lignocellulosic Biomass Derived Functional Materials: Synthesis and Applications in Biomedical Engineering.

    PubMed

    Zhang, Lei; Peng, Xinwen; Zhong, Linxin; Chua, Weitian; Xiang, Zhihua; Sun, Runcang

    2017-09-18

    The pertinent issue of resources shortage arising from global climate change in the recent years has accentuated the importance of materials that are environmental friendly. Despite the merits of current material like cellulose as the most abundant natural polysaccharide on earth, the incorporation of lignocellulosic biomass has the potential to value-add the recent development of cellulose-derivatives in drug delivery systems. Lignocellulosic biomass, with a hierarchical structure, comprised of cellulose, hemicellulose and lignin. As an excellent substrate that is renewable, biodegradable, biocompatible and chemically accessible for modified materials, lignocellulosic biomass sets forth a myriad of applications. To date, materials derived from lignocellulosic biomass have been extensively explored for new technological development and applications, such as biomedical, green electronics and energy products. In this review, chemical constituents of lignocellulosic biomass are first discussed before we critically examine the potential alternatives in the field of biomedical application. In addition, the pretreatment methods for extracting cellulose, hemicellulose and lignin from lignocellulosic biomass as well as their biological applications including drug delivery, biosensor, tissue engineering etc will be reviewed. It is anticipated there will be an increasing interest and research findings in cellulose, hemicellulose and lignin from natural resources, which help provide important directions for the development in biomedical applications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. KNODWAT: A scientific framework application for testing knowledge discovery methods for the biomedical domain

    PubMed Central

    2013-01-01

    Background Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. Results A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. Conclusions The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework. PMID:23763826

  3. KNODWAT: a scientific framework application for testing knowledge discovery methods for the biomedical domain.

    PubMed

    Holzinger, Andreas; Zupan, Mario

    2013-06-13

    Professionals in the biomedical domain are confronted with an increasing mass of data. Developing methods to assist professional end users in the field of Knowledge Discovery to identify, extract, visualize and understand useful information from these huge amounts of data is a huge challenge. However, there are so many diverse methods and methodologies available, that for biomedical researchers who are inexperienced in the use of even relatively popular knowledge discovery methods, it can be very difficult to select the most appropriate method for their particular research problem. A web application, called KNODWAT (KNOwledge Discovery With Advanced Techniques) has been developed, using Java on Spring framework 3.1. and following a user-centered approach. The software runs on Java 1.6 and above and requires a web server such as Apache Tomcat and a database server such as the MySQL Server. For frontend functionality and styling, Twitter Bootstrap was used as well as jQuery for interactive user interface operations. The framework presented is user-centric, highly extensible and flexible. Since it enables methods for testing using existing data to assess suitability and performance, it is especially suitable for inexperienced biomedical researchers, new to the field of knowledge discovery and data mining. For testing purposes two algorithms, CART and C4.5 were implemented using the WEKA data mining framework.

  4. Securely and Flexibly Sharing a Biomedical Data Management System

    PubMed Central

    Wang, Fusheng; Hussels, Phillip; Liu, Peiya

    2011-01-01

    Biomedical database systems need not only to address the issues of managing complex data, but also to provide data security and access control to the system. These include not only system level security, but also instance level access control such as access of documents, schemas, or aggregation of information. The latter is becoming more important as multiple users can share a single scientific data management system to conduct their research, while data have to be protected before they are published or IP-protected. This problem is challenging as users’ needs for data security vary dramatically from one application to another, in terms of who to share with, what resources to be shared, and at what access level. We develop a comprehensive data access framework for a biomedical data management system SciPort. SciPort provides fine-grained multi-level space based access control of resources at not only object level (documents and schemas), but also space level (resources set aggregated in a hierarchy way). Furthermore, to simplify the management of users and privileges, customizable role-based user model is developed. The access control is implemented efficiently by integrating access privileges into the backend XML database, thus efficient queries are supported. The secure access approach we take makes it possible for multiple users to share the same biomedical data management system with flexible access management and high data security. PMID:21625285

  5. Phase 2 of CATALISE: a multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology.

    PubMed

    Bishop, Dorothy V M; Snowling, Margaret J; Thompson, Paul A; Greenhalgh, Trisha

    2017-10-01

    Lack of agreement about criteria and terminology for children's language problems affects access to services as well as hindering research and practice. We report the second phase of a study using an online Delphi method to address these issues. In the first phase, we focused on criteria for language disorder. Here we consider terminology. The Delphi method is an iterative process in which an initial set of statements is rated by a panel of experts, who then have the opportunity to view anonymised ratings from other panel members. On this basis they can either revise their views or make a case for their position. The statements are then revised based on panel feedback, and again rated by and commented on by the panel. In this study, feedback from a second round was used to prepare a final set of statements in narrative form. The panel included 57 individuals representing a range of professions and nationalities. We achieved at least 78% agreement for 19 of 21 statements within two rounds of ratings. These were collapsed into 12 statements for the final consensus reported here. The term 'Language Disorder' is recommended to refer to a profile of difficulties that causes functional impairment in everyday life and is associated with poor prognosis. The term, 'Developmental Language Disorder' (DLD) was endorsed for use when the language disorder was not associated with a known biomedical aetiology. It was also agreed that (a) presence of risk factors (neurobiological or environmental) does not preclude a diagnosis of DLD, (b) DLD can co-occur with other neurodevelopmental disorders (e.g. ADHD) and (c) DLD does not require a mismatch between verbal and nonverbal ability. This Delphi exercise highlights reasons for disagreements about terminology for language disorders and proposes standard definitions and nomenclature. © 2017 The Authors. Journal of Child Psychology and Psychiatry published by John Wiley & Sons Ltd on behalf of Association for Child and Adolescent Mental Health.

  6. The Protein Disease Database of human body fluids: II. Computer methods and data issues.

    PubMed

    Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

    1995-01-01

    The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.

  7. Improving information retrieval using Medical Subject Headings Concepts: a test case on rare and chronic diseases.

    PubMed

    Darmoni, Stéfan J; Soualmia, Lina F; Letord, Catherine; Jaulent, Marie-Christine; Griffon, Nicolas; Thirion, Benoît; Névéol, Aurélie

    2012-07-01

    As more scientific work is published, it is important to improve access to the biomedical literature. Since 2000, when Medical Subject Headings (MeSH) Concepts were introduced, the MeSH Thesaurus has been concept based. Nevertheless, information retrieval is still performed at the MeSH Descriptor or Supplementary Concept level. The study assesses the benefit of using MeSH Concepts for indexing and information retrieval. Three sets of queries were built for thirty-two rare diseases and twenty-two chronic diseases: (1) using PubMed Automatic Term Mapping (ATM), (2) using Catalog and Index of French-language Health Internet (CISMeF) ATM, and (3) extrapolating the MEDLINE citations that should be indexed with a MeSH Concept. Type 3 queries retrieve significantly fewer results than type 1 or type 2 queries (about 18,000 citations versus 200,000 for rare diseases; about 300,000 citations versus 2,000,000 for chronic diseases). CISMeF ATM also provides better precision than PubMed ATM for both disease categories. Using MeSH Concept indexing instead of ATM is theoretically possible to improve retrieval performance with the current indexing policy. However, using MeSH Concept information retrieval and indexing rules would be a fundamentally better approach. These modifications have already been implemented in the CISMeF search engine.

  8. BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

    PubMed Central

    Ahmed, Syed Toufeeq; Davulcu, Hasan; Tikves, Sukru; Nair, Radhika; Zhao, Zhongming

    2012-01-01

    Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease. PMID:22693501

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crain, Steven P.; Yang, Shuang-Hong; Zha, Hongyuan

    Access to health information by consumers is ham- pered by a fundamental language gap. Current attempts to close the gap leverage consumer oriented health information, which does not, however, have good coverage of slang medical terminology. In this paper, we present a Bayesian model to automatically align documents with different dialects (slang, com- mon and technical) while extracting their semantic topics. The proposed diaTM model enables effective information retrieval, even when the query contains slang words, by explicitly modeling the mixtures of dialects in documents and the joint influence of dialects and topics on word selection. Simulations us- ing consumermore » questions to retrieve medical information from a corpus of medical documents show that diaTM achieves a 25% improvement in information retrieval relevance by nDCG@5 over an LDA baseline.« less

  10. The health care and life sciences community profile for dataset descriptions

    PubMed Central

    Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko

    2016-01-01

    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295

  11. Besides Precision & Recall: Exploring Alternative Approaches to Evaluating an Automatic Indexing Tool for MEDLINE

    PubMed Central

    Névéol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

    2006-01-01

    Objective This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. Materials and methods The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Results Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. Conclusions The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis. PMID:17238409

  12. Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for MEDLINE.

    PubMed

    Neveol, Aurélie; Zeng, Kelly; Bodenreider, Olivier

    2006-01-01

    This paper explores alternative approaches for the evaluation of an automatic indexing tool for MEDLINE, complementing the traditional precision and recall method. The performance of MTI, the Medical Text Indexer used at NLM to produce MeSH recommendations for biomedical journal articles is evaluated on a random set of MEDLINE citations. The evaluation examines semantic similarity at the term level (indexing terms). In addition, the documents retrieved by queries resulting from MTI index terms for a given document are compared to the PubMed related citations for this document. Semantic similarity scores between sets of index terms are higher than the corresponding Dice similarity scores. Overall, 75% of the original documents and 58% of the top ten related citations are retrieved by queries based on the automatic indexing. The alternative measures studied in this paper confirm previous findings and may be used to select particular documents from the test set for a more thorough analysis.

  13. The UCSC Genome Browser: What Every Molecular Biologist Should Know

    PubMed Central

    Mangan, Mary E.; Williams, Jennifer M.; Kuhn, Robert M.; Lathe, Warren C.

    2014-01-01

    Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data make benchwork much more efficient and drive new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. PMID:24984850

  14. Clinical decision rules, spinal pain classification and prediction of treatment outcome: A discussion of recent reports in the rehabilitation literature

    PubMed Central

    2012-01-01

    Clinical decision rules are an increasingly common presence in the biomedical literature and represent one strategy of enhancing clinical-decision making with the goal of improving the efficiency and effectiveness of healthcare delivery. In the context of rehabilitation research, clinical decision rules have been predominantly aimed at classifying patients by predicting their treatment response to specific therapies. Traditionally, recommendations for developing clinical decision rules propose a multistep process (derivation, validation, impact analysis) using defined methodology. Research efforts aimed at developing a “diagnosis-based clinical decision rule” have departed from this convention. Recent publications in this line of research have used the modified terminology “diagnosis-based clinical decision guide.” Modifications to terminology and methodology surrounding clinical decision rules can make it more difficult for clinicians to recognize the level of evidence associated with a decision rule and understand how this evidence should be implemented to inform patient care. We provide a brief overview of clinical decision rule development in the context of the rehabilitation literature and two specific papers recently published in Chiropractic and Manual Therapies. PMID:22726639

  15. From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

    NASA Astrophysics Data System (ADS)

    Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

    2007-11-01

    Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.

  16. ARIANE: integration of information databases within a hospital intranet.

    PubMed

    Joubert, M; Aymard, S; Fieschi, D; Volot, F; Staccini, P; Robert, J J; Fieschi, M

    1998-05-01

    Large information systems handle massive volume of data stored in heterogeneous sources. Each server has its own model of representation of concepts with regard to its aims. One of the main problems end-users encounter when accessing different servers is to match their own viewpoint on biomedical concepts with the various representations that are made in the databases servers. The aim of the project ARIANE is to provide end-users with easy-to-use and natural means to access and query heterogeneous information databases. The objectives of this research work consist in building a conceptual interface by means of the Internet technology inside an enterprise Intranet and to propose a method to realize it. This method is based on the knowledge sources provided by the Unified Medical Language System (UMLS) project of the US National Library of Medicine. Experiments concern queries to three different information servers: PubMed, a Medline server of the NLM; Thériaque, a French database on drugs implemented in the Hospital Intranet; and a Web site dedicated to Internet resources in gastroenterology and nutrition, located at the Faculty of Medicine of Nice (France). Accessing to each of these servers is different according to the kind of information delivered and according to the technology used to query it. Dealing with health care professional workstation, the authors introduced in the ARIANE project quality criteria in order to attempt a homogeneous and efficient way to build a query system able to be integrated in existing information systems and to integrate existing and new information sources.

  17. The proportion of cancer-related entries in PubMed has increased considerably; is cancer truly "The Emperor of All Maladies"?

    PubMed

    Reyes-Aldasoro, Constantino Carlos

    2017-01-01

    In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain "DNA", "Computational" or "Mathematical" have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK's Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material.

  18. The proportion of cancer-related entries in PubMed has increased considerably; is cancer truly “The Emperor of All Maladies”?

    PubMed Central

    2017-01-01

    In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain “DNA”, “Computational” or “Mathematical” have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK’s Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material. PMID:28282418

  19. Ontology-Based Approach to Social Data Sentiment Analysis: Detection of Adolescent Depression Signals.

    PubMed

    Jung, Hyesil; Park, Hyeoun-Ae; Song, Tae-Min

    2017-07-24

    Social networking services (SNSs) contain abundant information about the feelings, thoughts, interests, and patterns of behavior of adolescents that can be obtained by analyzing SNS postings. An ontology that expresses the shared concepts and their relationships in a specific field could be used as a semantic framework for social media data analytics. The aim of this study was to refine an adolescent depression ontology and terminology as a framework for analyzing social media data and to evaluate description logics between classes and the applicability of this ontology to sentiment analysis. The domain and scope of the ontology were defined using competency questions. The concepts constituting the ontology and terminology were collected from clinical practice guidelines, the literature, and social media postings on adolescent depression. Class concepts, their hierarchy, and the relationships among class concepts were defined. An internal structure of the ontology was designed using the entity-attribute-value (EAV) triplet data model, and superclasses of the ontology were aligned with the upper ontology. Description logics between classes were evaluated by mapping concepts extracted from the answers to frequently asked questions (FAQs) onto the ontology concepts derived from description logic queries. The applicability of the ontology was validated by examining the representability of 1358 sentiment phrases using the ontology EAV model and conducting sentiment analyses of social media data using ontology class concepts. We developed an adolescent depression ontology that comprised 443 classes and 60 relationships among the classes; the terminology comprised 1682 synonyms of the 443 classes. In the description logics test, no error in relationships between classes was found, and about 89% (55/62) of the concepts cited in the answers to FAQs mapped onto the ontology class. Regarding applicability, the EAV triplet models of the ontology class represented about 91.4% of the sentiment phrases included in the sentiment dictionary. In the sentiment analyses, "academic stresses" and "suicide" contributed negatively to the sentiment of adolescent depression. The ontology and terminology developed in this study provide a semantic foundation for analyzing social media data on adolescent depression. To be useful in social media data analysis, the ontology, especially the terminology, needs to be updated constantly to reflect rapidly changing terms used by adolescents in social media postings. In addition, more attributes and value sets reflecting depression-related sentiments should be added to the ontology. ©Hyesil Jung, Hyeoun-Ae Park, Tae-Min Song. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 24.07.2017.

  20. Ontology-Based Approach to Social Data Sentiment Analysis: Detection of Adolescent Depression Signals

    PubMed Central

    Jung, Hyesil; Song, Tae-Min

    2017-01-01

    Background Social networking services (SNSs) contain abundant information about the feelings, thoughts, interests, and patterns of behavior of adolescents that can be obtained by analyzing SNS postings. An ontology that expresses the shared concepts and their relationships in a specific field could be used as a semantic framework for social media data analytics. Objective The aim of this study was to refine an adolescent depression ontology and terminology as a framework for analyzing social media data and to evaluate description logics between classes and the applicability of this ontology to sentiment analysis. Methods The domain and scope of the ontology were defined using competency questions. The concepts constituting the ontology and terminology were collected from clinical practice guidelines, the literature, and social media postings on adolescent depression. Class concepts, their hierarchy, and the relationships among class concepts were defined. An internal structure of the ontology was designed using the entity-attribute-value (EAV) triplet data model, and superclasses of the ontology were aligned with the upper ontology. Description logics between classes were evaluated by mapping concepts extracted from the answers to frequently asked questions (FAQs) onto the ontology concepts derived from description logic queries. The applicability of the ontology was validated by examining the representability of 1358 sentiment phrases using the ontology EAV model and conducting sentiment analyses of social media data using ontology class concepts. Results We developed an adolescent depression ontology that comprised 443 classes and 60 relationships among the classes; the terminology comprised 1682 synonyms of the 443 classes. In the description logics test, no error in relationships between classes was found, and about 89% (55/62) of the concepts cited in the answers to FAQs mapped onto the ontology class. Regarding applicability, the EAV triplet models of the ontology class represented about 91.4% of the sentiment phrases included in the sentiment dictionary. In the sentiment analyses, “academic stresses” and “suicide” contributed negatively to the sentiment of adolescent depression. Conclusions The ontology and terminology developed in this study provide a semantic foundation for analyzing social media data on adolescent depression. To be useful in social media data analysis, the ontology, especially the terminology, needs to be updated constantly to reflect rapidly changing terms used by adolescents in social media postings. In addition, more attributes and value sets reflecting depression-related sentiments should be added to the ontology. PMID:28739560

  1. The caCORE Software Development Kit: Streamlining construction of interoperable biomedical information services

    PubMed Central

    Phillips, Joshua; Chilukuri, Ram; Fragoso, Gilberto; Warzel, Denise; Covitz, Peter A

    2006-01-01

    Background Robust, programmatically accessible biomedical information services that syntactically and semantically interoperate with other resources are challenging to construct. Such systems require the adoption of common information models, data representations and terminology standards as well as documented application programming interfaces (APIs). The National Cancer Institute (NCI) developed the cancer common ontologic representation environment (caCORE) to provide the infrastructure necessary to achieve interoperability across the systems it develops or sponsors. The caCORE Software Development Kit (SDK) was designed to provide developers both within and outside the NCI with the tools needed to construct such interoperable software systems. Results The caCORE SDK requires a Unified Modeling Language (UML) tool to begin the development workflow with the construction of a domain information model in the form of a UML Class Diagram. Models are annotated with concepts and definitions from a description logic terminology source using the Semantic Connector component. The annotated model is registered in the Cancer Data Standards Repository (caDSR) using the UML Loader component. System software is automatically generated using the Codegen component, which produces middleware that runs on an application server. The caCORE SDK was initially tested and validated using a seven-class UML model, and has been used to generate the caCORE production system, which includes models with dozens of classes. The deployed system supports access through object-oriented APIs with consistent syntax for retrieval of any type of data object across all classes in the original UML model. The caCORE SDK is currently being used by several development teams, including by participants in the cancer biomedical informatics grid (caBIG) program, to create compatible data services. caBIG compatibility standards are based upon caCORE resources, and thus the caCORE SDK has emerged as a key enabling technology for caBIG. Conclusion The caCORE SDK substantially lowers the barrier to implementing systems that are syntactically and semantically interoperable by providing workflow and automation tools that standardize and expedite modeling, development, and deployment. It has gained acceptance among developers in the caBIG program, and is expected to provide a common mechanism for creating data service nodes on the data grid that is under development. PMID:16398930

  2. Towards a semantic lexicon for biological language processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verspoor, K.

    It is well understood that natural language processing (NLP) applications require sophisticated lexical resources to support their processing goals. In the biomedical domain, we are privileged to have access to extensive terminological resources in the form of controlled vocabularies and ontologies, which have been integrated into the framework of the National Library of Medicine's Unified Medical Language System's (UMLS) Metathesaurus. However, the existence of such terminological resources does not guarantee their utility for NLP. In particular, we have two core requirements for lexical resources for NLP in addition to the basic enumeration of important domain terms: representation of morphosyntactic informationmore » about those terms, specifically part of speech information and inflectional patterns to support parsing and lemma assignment, and representation of semantic information indicating general categorical information about terms, and significant relations between terms to support text understanding and inference (Hahn et at, 1999). Biomedical vocabularies by and large commonly leave out morphosyntactic information, and where they address semantic considerations, they often do so in an unprincipled manner, for instance by indicating a relation between two concepts without indicating the type of that relation. But all is not lost. The UMLS knowledge sources include two additional resources which are relevant - the SPECIALIST lexicon, a lexicon addressing our morphosyntactic requirements, and the Semantic Network, a representation of core conceptual categories in the biomedical domain. The coverage of these two knowledge sources with respect to the full coverage of the Metathesaurus is, however, not entirely clear. Furthermore, when our goals are specifically to process biological text - and often more specifically, text in the molecular biology domain - it is difficult to say whether the coverage of these resources is meaningful. The utility of the UMLS knowledge sources for medical language processing (MLP) has been explored (Johnson, 1999; Friedman et al 2001); the time has now come to repeat these experiments with respect to biological language processing (BLP). To that end, this paper presents an analysis of ihe UMLS resources, specifically with an eye towards constructing lexical resources suitable for BLP. We follow the paradigm presented in Johnson (1999) for medical language, exploring overlap between the UMLS Metathesaurus and SPECIALIST lexicon to construct a morphosyntactic and semantically-specified lexicon, and then further explore the overlap with a relevant domain corpus for molecular biology.« less

  3. A new method for the automatic retrieval of medical cases based on the RadLex ontology.

    PubMed

    Spanier, A B; Cohen, D; Joskowicz, L

    2017-03-01

    The goal of medical case-based image retrieval (M-CBIR) is to assist radiologists in the clinical decision-making process by finding medical cases in large archives that most resemble a given case. Cases are described by radiology reports comprised of radiological images and textual information on the anatomy and pathology findings. The textual information, when available in standardized terminology, e.g., the RadLex ontology, and used in conjunction with the radiological images, provides a substantial advantage for M-CBIR systems. We present a new method for incorporating textual radiological findings from medical case reports in M-CBIR. The input is a database of medical cases, a query case, and the number of desired relevant cases. The output is an ordered list of the most relevant cases in the database. The method is based on a new case formulation, the Augmented RadLex Graph and an Anatomy-Pathology List. It uses a new case relatedness metric [Formula: see text] that prioritizes more specific medical terms in the RadLex tree over less specific ones and that incorporates the length of the query case. An experimental study on 8 CT queries from the 2015 VISCERAL 3D Case Retrieval Challenge database consisting of 1497 volumetric CT scans shows that our method has accuracy rates of 82 and 70% on the first 10 and 30 most relevant cases, respectively, thereby outperforming six other methods. The increasing amount of medical imaging data acquired in clinical practice constitutes a vast database of untapped diagnostically relevant information. This paper presents a new hybrid approach to retrieving the most relevant medical cases based on textual and image information.

  4. Conceptual Knowledge Acquisition in Biomedicine: A Methodological Review

    PubMed Central

    Payne, Philip R.O.; Mendonça, Eneida A.; Johnson, Stephen B.; Starren, Justin B.

    2007-01-01

    The use of conceptual knowledge collections or structures within the biomedical domain is pervasive, spanning a variety of applications including controlled terminologies, semantic networks, ontologies, and database schemas. A number of theoretical constructs and practical methods or techniques support the development and evaluation of conceptual knowledge collections. This review will provide an overview of the current state of knowledge concerning conceptual knowledge acquisition, drawing from multiple contributing academic disciplines such as biomedicine, computer science, cognitive science, education, linguistics, semiotics, and psychology. In addition, multiple taxonomic approaches to the description and selection of conceptual knowledge acquisition and evaluation techniques will be proposed in order to partially address the apparent fragmentation of the current literature concerning this domain. PMID:17482521

  5. Semantic biomedical resource discovery: a Natural Language Processing framework.

    PubMed

    Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis

    2015-09-30

    A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.

  6. A service-oriented distributed semantic mediator: integrating multiscale biomedical information.

    PubMed

    Mora, Oscar; Engelbrecht, Gerhard; Bisbal, Jesus

    2012-11-01

    Biomedical research continuously generates large amounts of heterogeneous and multimodal data spread over multiple data sources. These data, if appropriately shared and exploited, could dramatically improve the research practice itself, and ultimately the quality of health care delivered. This paper presents DISMED (DIstributed Semantic MEDiator), an open source semantic mediator that provides a unified view of a federated environment of multiscale biomedical data sources. DISMED is a Web-based software application to query and retrieve information distributed over a set of registered data sources, using semantic technologies. It also offers a userfriendly interface specifically designed to simplify the usage of these technologies by non-expert users. Although the architecture of the software mediator is generic and domain independent, in the context of this paper, DISMED has been evaluated for managing biomedical environments and facilitating research with respect to the handling of scientific data distributed in multiple heterogeneous data sources. As part of this contribution, a quantitative evaluation framework has been developed. It consist of a benchmarking scenario and the definition of five realistic use-cases. This framework, created entirely with public datasets, has been used to compare the performance of DISMED against other available mediators. It is also available to the scientific community in order to evaluate progress in the domain of semantic mediation, in a systematic and comparable manner. The results show an average improvement in the execution time by DISMED of 55% compared to the second best alternative in four out of the five use-cases of the experimental evaluation.

  7. Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge

    PubMed Central

    Scerri, Antony; Kuriakose, John; Deshmane, Amit Ajit; Stanger, Mark; Moore, Rebekah; Naik, Raj; de Waard, Anita

    2017-01-01

    Abstract We developed a two-stream, Apache Solr-based information retrieval system in response to the bioCADDIE 2016 Dataset Retrieval Challenge. One stream was based on the principle of word embeddings, the other was rooted in ontology based indexing. Despite encountering several issues in the data, the evaluation procedure and the technologies used, the system performed quite well. We provide some pointers towards future work: in particular, we suggest that more work in query expansion could benefit future biomedical search engines. Database URL: https://data.mendeley.com/datasets/zd9dxpyybg/1 PMID:29220454

  8. Recent progress in automatically extracting information from the pharmacogenomic literature

    PubMed Central

    Garten, Yael; Coulet, Adrien; Altman, Russ B

    2011-01-01

    The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications. PMID:21047206

  9. Smartphone home monitoring of ECG

    NASA Astrophysics Data System (ADS)

    Szu, Harold; Hsu, Charles; Moon, Gyu; Landa, Joseph; Nakajima, Hiroshi; Hata, Yutaka

    2012-06-01

    A system of ambulatory, halter, electrocardiography (ECG) monitoring system has already been commercially available for recording and transmitting heartbeats data by the Internet. However, it enjoys the confidence with a reservation and thus a limited market penetration, our system was targeting at aging global villagers having an increasingly biomedical wellness (BMW) homecare needs, not hospital related BMI (biomedical illness). It was designed within SWaP-C (Size, Weight, and Power, Cost) using 3 innovative modules: (i) Smart Electrode (lowpower mixed signal embedded with modern compressive sensing and nanotechnology to improve the electrodes' contact impedance); (ii) Learnable Database (in terms of adaptive wavelets transform QRST feature extraction, Sequential Query Relational database allowing home care monitoring retrievable Aided Target Recognition); (iii) Smartphone (touch screen interface, powerful computation capability, caretaker reporting with GPI, ID, and patient panic button for programmable emergence procedure). It can provide a supplementary home screening system for the post or the pre-diagnosis care at home with a build-in database searchable with the time, the place, and the degree of urgency happened, using in-situ screening.

  10. CRISPR-cas System as a Genome Engineering Platform: Applications in Biomedicine and Biotechnology.

    PubMed

    Hashemi, Atieh

    2018-01-01

    Genome editing mediated by Clustered Regularly Interspaced Palindromic Repeats (CRISPR) and its associated proteins (Cas) has recently been considered to be used as efficient, rapid and site-specific tool in the modification of endogenous genes in biomedically important cell types and whole organisms. It has become a predictable and precise method of choice for genome engineering by specifying a 20-nt targeting sequence within its guide RNA. Firstly, this review aims to describe the biology of CRISPR system. Next, the applications of CRISPR-Cas9 in various ways, such as efficient generation of a wide variety of biomedically important cellular models as well as those of animals, modifying epigenomes, conducting genome-wide screens, gene therapy, labelling specific genomic loci in living cells, metabolic engineering of yeast and bacteria and endogenous gene expression regulation by an altered version of this system were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. New Perspectives on Biomedical Applications of Iron Oxide Nanoparticles.

    PubMed

    Magro, Massimiliano; Baratella, Davide; Bonaiuto, Emanuela; de A Roger, Jessica; Vianello, Fabio

    2018-02-12

    Iron oxide nanomaterials are considered promising tools for improved therapeutic efficacy and diagnostic applications in biomedicine. Accordingly, engineered iron oxide nanomaterials are increasingly proposed in biomedicine, and the interdisciplinary researches involving physics, chemistry, biology (nanotechnology) and medicine have led to exciting developments in the last decades. The progresses of the development of magnetic nanoparticles with tailored physico-chemical and surface properties produced a variety of clinically relevant applications, spanning from magnetic resonance imaging (MRI), drug delivery, magnetic hyperthermia, to in vitro diagnostics. Notwithstanding the wellknown conventional synthetic procedures and their wide use, along with recent advances in the synthetic methods open the door to new generations of naked iron oxide nanoparticles possessing peculiar surface chemistries, suitable for other competitive biomedical applications. New abilities to rationally manipulate iron oxides and their physical, chemical, and biological properties, allow the emersion of additional possibilities for designing novel nanomaterials for theranostic applications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. A novel method for efficient archiving and retrieval of biomedical images using MPEG-7

    NASA Astrophysics Data System (ADS)

    Meyer, Joerg; Pahwa, Ash

    2004-10-01

    Digital archiving and efficient retrieval of radiological scans have become critical steps in contemporary medical diagnostics. Since more and more images and image sequences (single scans or video) from various modalities (CT/MRI/PET/digital X-ray) are now available in digital formats (e.g., DICOM-3), hospitals and radiology clinics need to implement efficient protocols capable of managing the enormous amounts of data generated daily in a typical clinical routine. We present a method that appears to be a viable way to eliminate the tedious step of manually annotating image and video material for database indexing. MPEG-7 is a new framework that standardizes the way images are characterized in terms of color, shape, and other abstract, content-related criteria. A set of standardized descriptors that are automatically generated from an image is used to compare an image to other images in a database, and to compute the distance between two images for a given application domain. Text-based database queries can be replaced with image-based queries using MPEG-7. Consequently, image queries can be conducted without any prior knowledge of the keys that were used as indices in the database. Since the decoding and matching steps are not part of the MPEG-7 standard, this method also enables searches that were not planned by the time the keys were generated.

  13. Evaluating the granularity balance of hierarchical relationships within large biomedical terminologies towards quality improvement.

    PubMed

    Luo, Lingyun; Tong, Ling; Zhou, Xiaoxi; Mejino, Jose L V; Ouyang, Chunping; Liu, Yongbin

    2017-11-01

    Organizing the descendants of a concept under a particular semantic relationship may be rather arbitrarily carried out during the manual creation processes of large biomedical terminologies, resulting in imbalances in relationship granularity. This work aims to propose scalable models towards systematically evaluating the granularity balance of semantic relationships. We first utilize "parallel concepts set (PCS)" and two features (the length and the strength) of the paths between PCSs to design the general evaluation models, based on which we propose eight concrete evaluation models generated by two specific types of PCSs: single concept set and symmetric concepts set. We then apply those concrete models to the IS-A relationship in FMA and SNOMED CT's Body Structure subset, as well as to the Part-Of relationship in FMA. Moreover, without loss of generality, we conduct two additional rounds of applications on the Part-Of relationship after removing length redundancies and strength redundancies sequentially. At last, we perform automatic evaluation on the imbalances detected after the final round for identifying missing concepts, misaligned relations and inconsistencies. For the IS-A relationship, 34 missing concepts, 80 misalignments and 18 redundancies in FMA as well as 28 missing concepts, 114 misalignments and 1 redundancy in SNOMED CT were uncovered. In addition, 6,801 instances of imbalances for the Part-Of relationship in FMA were also identified, including 3,246 redundancies. After removing those redundancies from FMA, the total number of Part-Of imbalances was dramatically reduced to 327, including 51 missing concepts, 294 misaligned relations, and 36 inconsistencies. Manual curation performed by the FMA project leader confirmed the effectiveness of our method in identifying curation errors. In conclusion, the granularity balance of hierarchical semantic relationship is a valuable property to check for ontology quality assurance, and the scalable evaluation models proposed in this study are effective in fulfilling this task, especially in auditing relationships with sub-hierarchies, such as the seldom evaluated Part-Of relationship. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Knowledge based word-concept model estimation and refinement for biomedical text mining.

    PubMed

    Jimeno Yepes, Antonio; Berlanga, Rafael

    2015-02-01

    Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Conjugates of classical DNA/RNA binder with nucleobase: chemical, biochemical and biomedical applications.

    PubMed

    Saftic, Dijana; Ban, Zeljka; Matic, Josipa; Tumir, Lidija-Marija; Piantanida, Ivo

    2018-05-07

    Among the most intensively studied classes of small molecules (molecular weight < 650) in biomedical research are small molecules that non-covalently bind to DNA/RNA, and another intensively studied class are nucleobase derivatives. Both classes have been intensively elaborated in many books and reviews. However, conjugates consisting of DNA/RNA binder covalently linked to nucleobase are much less studied and have not been reviewed in the last two decades. Therefore, this review summarized reports on the design of classical DNA/RNA binder - nucleobase conjugates, as well as data about their interactions with various DNA or RNA targets, and even in some cases protein targets involved. According to these data, the most important structural aspects of selective or even specific recognition between small molecule and target are proposed, and where possible related biochemical and biomedical aspects were discussed. The general conclusion is that this, rather new class of molecules showed an amazing set of recognition tools for numerous DNA or RNA targets in the last two decades, as well as few intriguing in vitro and in vivo selectivities. Several lead research lines show promising advancements toward either novel, highly selective markers or bioactive, potentially druggable molecules. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Biomedical Applications Of Aromatic Azo Compounds: From Chromophore To Pharmacophore.

    PubMed

    Ali, Yousaf; Hamid, Shafida Abd; Rashid, Umer

    2018-05-23

    Azo dyes are widely used in textile, fiber, cosmetic, leather, paint and printing industries. Besides their characteristic coloring function, biological properties of certain azo compounds including antibacterial, antiviral, antifungal and cytotoxic are also reported. Azo compounds can be used as drug carriers, either by acting as a 'cargo' that entrap therapeutic agents or by prodrug approach. The drug is released by internal or external stimuli in the region of interest, as observed in colon-targeted drug delivery. Besides drug-like and drug carrier properties, a number of azo dyes are used in cellular staining to visualize cellular components and metabolic processes. However, the biological significance of azo compounds, especially in cancer chemotherapy, is still in its infancy. This may be linked to early findings that declared azo compounds as one of the possible causes of cancer and mutagenesis. Currently, researchers are screening the aromatic azo compounds for their potential biomedical use, including cancer diagnosis and therapy. The medical applications of azo compounds, particularly in cancer research are discussed. The biomedical significance of cis-trans interchange and negative implications of azo compounds are also highlighted in brief. This review may provide the researchers a platform in the quest of more potent therapeutic agents of this class. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. S3QL: A distributed domain specific language for controlled semantic integration of life sciences data

    PubMed Central

    2011-01-01

    Background The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control. We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data. Results Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases. Conclusions S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms. PMID:21756325

  18. S3QL: a distributed domain specific language for controlled semantic integration of life sciences data.

    PubMed

    Deus, Helena F; Correa, Miriã C; Stanislaus, Romesh; Miragaia, Maria; Maass, Wolfgang; de Lencastre, Hermínia; Fox, Ronan; Almeida, Jonas S

    2011-07-14

    The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data. Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases. S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

  19. CHEMICAL PRIORITIZATION FOR DEVELOPMENTAL ...

    EPA Pesticide Factsheets

    Defining a predictive model of developmental toxicity from in vitro and high-throughput screening (HTS) assays can be limited by the availability of developmental defects data. ToxRefDB (www.epa.gov/ncct/todrefdb) was built from animal studies on data-rich environmental chemicals, and has been used as an anchor for predictive modeling of ToxCast™ data. Scaling to thousands of untested chemicals requires another approach. ToxPlorer™ was developed as a tool to query and extract specific facts about defined biological entities from the open scientific literature and to coherently synthesize relevant knowledge about relationships, pathways and processes in toxicity. Here, we investigated the specific application of ToxPlorer to weighting HTS assay targets for relevance to developmental defects as defined in the literature. First, we systemically analyzed 88,193 Pubmed abstracts selected by bulk query using harmonized terminology for 862 developmental endpoints (www.devtox.net) and 364,334 dictionary term entities in our VT-KB (virtual tissues knowledgebase). We specifically focused on entities corresponding to genes/proteins mapped across of >500 ToxCast HTS assays. The 88,193 devtox abstracts mentioned 244 gene/protein entities in an aggregated total of ~8,000 occurrences. Each of the 244 assays was scored and weighted by the number of devtox articles and relevance to developmental processes. This score was used as a feature for chemical prioritization by Toxic

  20. Equity in Medicaid Reimbursement for Otolaryngologists.

    PubMed

    Conduff, Joseph H; Coelho, Daniel H

    2017-12-01

    Objective To study state Medicaid reimbursement rates for inpatient and outpatient otolaryngology services and to compare with federal Medicare benchmarks. Study Design State and federal database query. Setting Not applicable. Methods Based on Medicare claims data, 26 of the most common Current Procedural Terminology codes reimbursed to otolaryngologists were selected and the payments recorded. These were further divided into outpatient and operative services. Medicaid payment schemes were queried for the same services in 49 states and Washington, DC. The difference in Medicaid and Medicare payment in dollars and percentage was determined and the reimbursement per relative value unit calculated. Medicaid reimbursement differences (by dollar amount and by percentage) were qualified as a shortfall or excess as compared with the Medicare benchmark. Results Marked differences in Medicaid and Medicare reimbursement exist for all services provided by otolaryngologists, most commonly as a substantial shortfall. The Medicaid shortfall varied in amount among states, and great variability in reimbursement exists within and between operative and outpatient services. Operative services were more likely than outpatient services to have a greater Medicaid shortfall. Shortfalls and excesses were not consistent among procedures or states. Conclusions The variation in Medicaid payment models reflects marked differences in the value of the same work provided by otolaryngologists-in many cases, far less than federal benchmarks. These results question the fairness of the Medicaid reimbursement scheme in otolaryngology, with potential serious implications on access to care for this underserved patient population.

  1. 41. DISCOVERY, SEARCH, AND COMMUNICATION OF TEXTUAL KNOWLEDGE RESOURCES IN DISTRIBUTED SYSTEMS a. Discovering and Utilizing Knowledge Sources for Metasearch Knowledge Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamora, Antonio

    Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less

  2. Proximal, Distal, and the Politics of Causation: What’s Level Got to Do With It?

    PubMed Central

    Krieger, Nancy

    2008-01-01

    Causal thinking in public health, and especially in the growing literature on social determinants of health, routinely employs the terminology of proximal (or downstream) and distal (or upstream). I argue that the use of these terms is problematic and adversely affects public health research, practice, and causal accountability. At issue are distortions created by conflating measures of space, time, level, and causal strength. To make this case, I draw on an ecosocial perspective to show how public health got caught in the middle of the problematic proximal–distal divide—surprisingly embraced by both biomedical and social determinist frameworks—and propose replacing the terms proximal and distal with explicit language about levels, pathways, and power. PMID:18172144

  3. MD-CTS: An integrated terminology reference of clinical and translational medicine.

    PubMed

    Ray, Will; Finamore, Joe; Rastegar-Mojarad, Majid; Kadolph, Chris; Ye, Zhan; Bohne, Jacquie; Xu, Yin; Burish, Dan; Sondelski, Joshua; Easker, Melissa; Finnegan, Brian; Bartkowiak, Barbara; Smith, Catherine Arnott; Tachinardi, Umberto; Mendonca, Eneida A; Weichelt, Bryan; Lin, Simon M

    2016-01-01

    New vocabularies are rapidly evolving in the literature relative to the practice of clinical medicine and translational research. To provide integrated access to new terms, we developed a mobile and desktop online reference-Marshfield Dictionary of Clinical and Translational Science (MD-CTS). It is the first public resource that comprehensively integrates Wiktionary (word definition), BioPortal (ontology), Wiki (image reference), and Medline abstract (word usage) information. MD-CTS is accessible at http://spellchecker.mfldclin.edu/. The website provides a broadened capacity for the wider clinical and translational science community to keep pace with newly emerging scientific vocabulary. An initial evaluation using 63 randomly selected biomedical words suggests that online references generally provided better coverage (73%-95%) than paper-based dictionaries (57-71%).

  4. Desiderata for Healthcare Integrated Data Repositories Based on Architectural Comparison of Three Public Repositories

    PubMed Central

    Huser, Vojtech; Cimino, James J.

    2013-01-01

    Integrated data repositories (IDRs) are indispensable tools for numerous biomedical research studies. We compare three large IDRs (Informatics for Integrating Biology and the Bedside (i2b2), HMO Research Network’s Virtual Data Warehouse (VDW) and Observational Medical Outcomes Partnership (OMOP) repository) in order to identify common architectural features that enable efficient storage and organization of large amounts of clinical data. We define three high-level classes of underlying data storage models and we analyze each repository using this classification. We look at how a set of sample facts is represented in each repository and conclude with a list of desiderata for IDRs that deal with the information storage model, terminology model, data integration and value-sets management. PMID:24551366

  5. Desiderata for healthcare integrated data repositories based on architectural comparison of three public repositories.

    PubMed

    Huser, Vojtech; Cimino, James J

    2013-01-01

    Integrated data repositories (IDRs) are indispensable tools for numerous biomedical research studies. We compare three large IDRs (Informatics for Integrating Biology and the Bedside (i2b2), HMO Research Network's Virtual Data Warehouse (VDW) and Observational Medical Outcomes Partnership (OMOP) repository) in order to identify common architectural features that enable efficient storage and organization of large amounts of clinical data. We define three high-level classes of underlying data storage models and we analyze each repository using this classification. We look at how a set of sample facts is represented in each repository and conclude with a list of desiderata for IDRs that deal with the information storage model, terminology model, data integration and value-sets management.

  6. Proximal, distal, and the politics of causation: what's level got to do with it?

    PubMed

    Krieger, Nancy

    2008-02-01

    Causal thinking in public health, and especially in the growing literature on social determinants of health, routinely employs the terminology of proximal (or downstream) and distal (or upstream). I argue that the use of these terms is problematic and adversely affects public health research, practice, and causal accountability. At issue are distortions created by conflating measures of space, time, level, and causal strength. To make this case, I draw on an ecosocial perspective to show how public health got caught in the middle of the problematic proximal-distal divide--surprisingly embraced by both biomedical and social determinist frameworks--and propose replacing the terms proximal and distal with explicit language about levels, pathways, and power.

  7. An Overview of the Technological and Scientific Achievements of the Terahertz

    NASA Astrophysics Data System (ADS)

    Rostami, Ali; Rasooli, Hassan; Baghban, Hamed

    2011-01-01

    Due to the importance of terahertz radiation in the past several years in spectroscopy, astrophysics, and imaging techniques namely for biomedical applications (its low interference and non-ionizing characteristics, has been made to be a good candidate to be used as a powerful technique for safe, in vivo medical imaging), we decided to review of the terahertz technology and its associated science achievements. The review consists of terahertz terminology, different applications, and main components which are used for detection and generation of terahertz radiation. Also a brief theoretical study of generation and detection of terahertz pulses will be considered. Finally, the chapter will be ended by providing the usage of organic materials for generation and detection of terahertz radiation.

  8. BioSearch: a semantic search engine for Bio2RDF

    PubMed Central

    Qiu, Honglei; Huang, Jiacheng

    2017-01-01

    Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451

  9. The UCSC Genome Browser: What Every Molecular Biologist Should Know.

    PubMed

    Mangan, Mary E; Williams, Jennifer M; Kuhn, Robert M; Lathe, Warren C

    2014-07-01

    Electronic data resources can enable molecular biologists to quickly get information from around the world that a decade ago would have been buried in papers scattered throughout the library. The ability to access, query, and display these data makes benchwork much more efficient and drives new discoveries. Increasingly, mastery of software resources and corresponding data repositories is required to fully explore the volume of data generated in biomedical and agricultural research, because only small amounts of data are actually found in traditional publications. The UCSC Genome Browser provides a wealth of data and tools that advance understanding of genomic context for many species, enable detailed analysis of data, and provide the ability to interrogate regions of interest across disparate data sets from a wide variety of sources. Researchers can also supplement the standard display with their own data to query and share this with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser. Copyright © 2014 John Wiley & Sons, Inc.

  10. Proposal for an Update of the Definition and Scope of Behavioral Medicine.

    PubMed

    Dekker, Joost; Stauder, Adrienne; Penedo, Frank J

    2017-02-01

    We aim to provide an update of the definition and scope of behavioral medicine in the Charter of ISBM, as the present version was developed more than 25 years ago. We identify issues which need clarification or updating. This leads us to propose an update of the definition and scope of behavioral medicine. Issues in need of clarification or updating include the scope of behavioral medicine (biobehavioral mechanisms, clinical diagnosis and intervention, and prevention and health promotion); research as an essential characteristic of all three areas of behavioral medicine; the application of behavioral medicine; the terminology of behavioral medicine as a multidisciplinary field; and the relationship and distinction between behavioral medicine, mental health, health psychology, and psychosomatic medicine. We propose the following updated definition and scope of behavioral medicine: "Behavioral medicine can be defined as the multidisciplinary field concerned with the development and integration of biomedical and behavioral knowledge relevant to health and disease, and the application of this knowledge to prevention, health promotion, diagnosis, treatment, rehabilitation, and care. The scope of behavioral medicine extends from biobehavioral mechanisms (i.e., the interaction of biomedical processes with psychological, social, societal, cultural, and environmental processes), to clinical diagnosis and intervention, and to public health."

  11. Informatics Support for Basic Research in Biomedicine

    PubMed Central

    Rindflesch, Thomas C.; Blake, Catherine L.; Fiszman, Marcelo; Kilicoglu, Halil; Rosemblat, Graciela; Schneider, Jodi; Zeiss, Caroline J.

    2017-01-01

    Abstract Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text. We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call “discovery browsing,” which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details. As a use case, we examine current developments in basic research on mechanisms of Alzheimer’s disease. Out of the nearly 90 000 citations returned by the PubMed query “Alzheimer’s disease,” discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein. PMID:28838071

  12. Trends in Utilization of Vocal Fold Injection Procedures.

    PubMed

    Rosow, David E

    2015-11-01

    Office-based vocal fold injections have become increasingly popular over the past 15 years. Examination of trends in procedure coding for vocal fold injections in the United States from 2000 to 2012 was undertaken to see if they reflect this shift. The US Part B Medicare claims database was queried from 2000 through 2012 for multiple Current Procedural Terminology codes. Over the period studied, the number of nonoperative laryngoscopic injections (31513, 31570) and operative medialization laryngoplasties (31588) remained constant. Operative vocal fold injection (31571) demonstrated marked linear growth over the 12-year study period, from 744 procedures in 2000 to 4788 in 2012-an increase >640%. The dramatic increased incidence in the use of code 31571 reflects an increasing share of vocal fold injections being performed in the operating room and not in an office setting, running counter to the prevailing trend toward awake, office-based injection procedures. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.

  13. A feature dictionary supporting a multi-domain medical knowledge base.

    PubMed

    Naeymi-Rad, F

    1989-01-01

    Because different terminology is used by physicians of different specialties in different locations to refer to the same feature (signs, symptoms, test results), it is essential that our knowledge development tools provide a means to access a common pool of terms. This paper discusses the design of an online medical dictionary that provides a solution to this problem for developers of multi-domain knowledge bases for MEDAS (Medical Emergency Decision Assistance System). Our Feature Dictionary supports phrase equivalents for features, feature interactions, feature classifications, and translations to the binary features generated by the expert during knowledge creation. It is also used in the conversion of a domain knowledge to the database used by the MEDAS inference diagnostic sessions. The Feature Dictionary also provides capabilities for complex queries across multiple domains using the supported relations. The Feature Dictionary supports three methods for feature representation: (1) for binary features, (2) for continuous valued features, and (3) for derived features.

  14. The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience

    PubMed Central

    Akil, Huda; Ascoli, Giorgio A.; Bowden, Douglas M.; Bug, William; Donohue, Duncan E.; Goldberg, David H.; Grafstein, Bernice; Grethe, Jeffrey S.; Gupta, Amarnath; Halavi, Maryam; Kennedy, David N.; Marenco, Luis; Martone, Maryann E.; Miller, Perry L.; Müller, Hans-Michael; Robert, Adrian; Shepherd, Gordon M.; Sternberg, Paul W.; Van Essen, David C.; Williams, Robert W.

    2009-01-01

    With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinformatic tools and resources to aid scientific inquiry, builds upon prior development of neuroinformatics by the Human Brain Project and others, and directly derives from the Society for Neuroscience’s Neuroscience Database Gateway. Partnered with the Society, its Neuroinformatics Committee, and volunteer consultant-collaborators, our multi-site consortium has developed: (1) a comprehensive, dynamic, inventory of Web-accessible neuroscience resources, (2) an extended and integrated terminology describing resources and contents, and (3) a framework accepting and aiding concept-based queries. Evolving instantiations of the Framework may be viewed at http://nif.nih.gov, http://neurogateway.org, and other sites as they come on line. PMID:18946742

  15. The neuroscience information framework: a data and knowledge environment for neuroscience.

    PubMed

    Gardner, Daniel; Akil, Huda; Ascoli, Giorgio A; Bowden, Douglas M; Bug, William; Donohue, Duncan E; Goldberg, David H; Grafstein, Bernice; Grethe, Jeffrey S; Gupta, Amarnath; Halavi, Maryam; Kennedy, David N; Marenco, Luis; Martone, Maryann E; Miller, Perry L; Müller, Hans-Michael; Robert, Adrian; Shepherd, Gordon M; Sternberg, Paul W; Van Essen, David C; Williams, Robert W

    2008-09-01

    With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinformatic tools and resources to aid scientific inquiry, builds upon prior development of neuroinformatics by the Human Brain Project and others, and directly derives from the Society for Neuroscience's Neuroscience Database Gateway. Partnered with the Society, its Neuroinformatics Committee, and volunteer consultant-collaborators, our multi-site consortium has developed: (1) a comprehensive, dynamic, inventory of Web-accessible neuroscience resources, (2) an extended and integrated terminology describing resources and contents, and (3) a framework accepting and aiding concept-based queries. Evolving instantiations of the Framework may be viewed at http://nif.nih.gov , http://neurogateway.org , and other sites as they come on line.

  16. A systematic review of the effects of euthanasia and occupational stress in personnel working with animals in animal shelters, veterinary clinics, and biomedical research facilities.

    PubMed

    Scotney, Rebekah L; McLaughlin, Deirdre; Keates, Helen L

    2015-11-15

    The study of occupational stress and compassion fatigue in personnel working in animal-related occupations has gained momentum over the last decade. However, there remains incongruence in understanding what is currently termed compassion fatigue and the associated unique contributory factors. Furthermore, there is minimal established evidence of the likely influence of these conditions on the health and well-being of individuals working in various animal-related occupations. To assess currently available evidence and terminology regarding occupational stress and compassion fatigue in personnel working in animal shelters, veterinary clinics, and biomedical research facilities. Studies were identified by searching the following electronic databases with no publication date restrictions: ProQuest Research Library, ProQuest Social Science Journals, PsycARTICLES, Web of Science, Science Direct, Scopus, PsychINFO databases, and Google Scholar. Search terms included (euthanasia AND animals) OR (compassion fatigue AND animals) OR (occupational stress AND animals). Only articles published in English in peer-reviewed journals that included use of quantitative or qualitative techniques to investigate the incidence of occupational stress or compassion fatigue in the veterinary profession or animal-related occupations were included. On the basis of predefined criteria, 1 author extracted articles, and the data set was then independently reviewed by the other 2 authors. 12 articles met the selection criteria and included a variety of study designs and methods of data analysis. Seven studies evaluated animal shelter personnel, with the remainder evaluating veterinary nurses and technicians (2), biomedical research technicians (1), and personnel in multiple animal-related occupations (2). There was a lack of consistent terminology and agreed definitions for the articles reviewed. Personnel directly engaged in euthanasia reported significantly higher levels of work stress and lower levels of job satisfaction, which may have resulted in higher employee turnover, psychological distress, and other stress-related conditions. Results of this review suggested a high incidence of occupational stress and euthanasia-related strain in animal care personnel. The disparity of nomenclature and heterogeneity of research methods may contribute to general misunderstanding and confusion and impede the ability to generate high-quality evidence regarding the unique stressors experienced by personnel working with animals. The present systematic review provided insufficient foundation from which to identify consistent causal factors and outcomes to use as a basis for development of evidence-based stress management programs, and it highlights the need for further research.

  17. New concepts for building vocabulary for cell image ontologies.

    PubMed

    Plant, Anne L; Elliott, John T; Bhat, Talapady N

    2011-12-21

    There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments. Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations. Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web.

  18. New concepts for building vocabulary for cell image ontologies

    PubMed Central

    2011-01-01

    Background There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments. Results Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations. Conclusions Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web. PMID:22188658

  19. COEUS: “semantic web in a box” for biomedical applications

    PubMed Central

    2012-01-01

    Background As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. Results COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. Conclusions The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/. PMID:23244467

  20. COEUS: "semantic web in a box" for biomedical applications.

    PubMed

    Lopes, Pedro; Oliveira, José Luís

    2012-12-17

    As the "omics" revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter's complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a "semantic web in a box" approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.

  1. Biomedical Applications of Nanomaterials as Therapeutics.

    PubMed

    Ng, Cheng-Teng; Baeg, Gyeong-Hun; Yu, Liya E; Ong, Choon-Nam; Bay, Boon-Huat

    2018-01-01

    As nanomaterials possess attractive physicochemical properties, immense research efforts have been channeled towards their development for biological and biomedical applications. In particular, zinc nanomaterials (nZnOs) have shown great potential for use in in the medical and pharmaceutical fields, and as tools for novel antimicrobial treatment, thereby capitalizing on their unique antimicrobial effects. We conducted a literature search using databases to retrieve the relevant articles related to the synthesis, properties and current applications of nZnOs in the diagnosis and treatment of diseases. A total of 86 publications were selected for inclusion in this review. Besides studies on the properties and the methodology for the synthesis of nZnOs, many studies have focused on the application of nZnOs as delivery agents, biosensors and antimicrobial agents, as well as in bioimaging. This review gives an overview of the current development of nZnOs for their potential use as theranostic agents. However, more comprehensive studies are needed to better assess the valuable contributions and the safety of nZnOs in nanomedicine. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Semantic similarity measure in biomedical domain leverage web search engine.

    PubMed

    Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei

    2010-01-01

    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.

  3. The importance of trace element speciation in biomedical science.

    PubMed

    Templeton, Douglas M

    2003-04-01

    According to IUPAC terminology, trace element speciation reflects differences in chemical composition at multiple levels from nuclear and electronic structure to macromolecular complexation. In the medical sciences, all levels of composition are important in various circumstances, and each can affect the bioavailability, distribution, physiological function, toxicity, diagnostic utility, and therapeutic potential of an element. Here we discuss, with specific examples, three biological principles in the intimate relation between speciation and biological behavior: i) the kinetics of interconversion of species determines distribution within the organism, ii) speciation governs transport across various biological barriers, and iii) speciation can limit potentially undesirable interactions between physiologically essential elements. We will also describe differences in the speciation of iron in states of iron overload, to illustrate how speciation analysis can provide insight into cellular processes in human disease.

  4. Comparison of health conditions treated with traditional and biomedical health care in a Quechua community in rural Bolivia.

    PubMed

    Vandebroek, Ina; Thomas, Evert; Sanca, Sabino; Van Damme, Patrick; Puyvelde, Luc Van; De Kimpe, Norbert

    2008-01-14

    The objective of the present study was to reveal patterns in the treatment of health conditions in a Quechua-speaking community in the Bolivian Andes based on plant use data from traditional healers and patient data from a primary health care (PHC) service, and to demonstrate similarities and differences between the type of illnesses treated with traditional and biomedical health care, respectively. A secondary analysis of plant use data from semi-structured interviews with eight healers was conducted and diagnostic data was collected from 324 patients in the community PHC service. Health conditions were ranked according to: (A) the percentage of patients in the PHC service diagnosed with these conditions; and (B) the citation frequency of plant use reports to treat these conditions by healers. Healers were also queried about the payment modalities they offer to their patients. Plant use reports from healers yielded 1166 responses about 181 medicinal plant species, which are used to treat 67 different health conditions, ranging from general symptoms (e.g. fever and body pain), to more specific ailments, such as arthritis, biliary colic and pneumonia. The results show that treatment offered by traditional medicine overlaps with biomedical health care in the case of respiratory infections, wounds and bruises, fever and biliary colic/cholecystitis. Furthermore, traditional health care appears to be complementary to biomedical health care for chronic illnesses, especially arthritis, and for folk illnesses that are particularly relevant within the local cultural context. Payment from patients to healers included flexible, outcome contingent and non-monetary options. Traditional medicine in the study area is adaptive because it corresponds well with local patterns of morbidity, health care needs in relation to chronic illnesses, cultural perceptions of health conditions and socio-economic aspects of health care. The quantitative analysis of plant use reports and patient data represents a novel approach to compare the contribution of traditional and biomedical health care to treatment of particular health conditions.

  5. Comparison of health conditions treated with traditional and biomedical health care in a Quechua community in rural Bolivia

    PubMed Central

    Vandebroek, Ina; Thomas, Evert; Sanca, Sabino; Van Damme, Patrick; Puyvelde, Luc Van; De Kimpe, Norbert

    2008-01-01

    Background The objective of the present study was to reveal patterns in the treatment of health conditions in a Quechua-speaking community in the Bolivian Andes based on plant use data from traditional healers and patient data from a primary health care (PHC) service, and to demonstrate similarities and differences between the type of illnesses treated with traditional and biomedical health care, respectively. Methods A secondary analysis of plant use data from semi-structured interviews with eight healers was conducted and diagnostic data was collected from 324 patients in the community PHC service. Health conditions were ranked according to: (A) the percentage of patients in the PHC service diagnosed with these conditions; and (B) the citation frequency of plant use reports to treat these conditions by healers. Healers were also queried about the payment modalities they offer to their patients. Results Plant use reports from healers yielded 1166 responses about 181 medicinal plant species, which are used to treat 67 different health conditions, ranging from general symptoms (e.g. fever and body pain), to more specific ailments, such as arthritis, biliary colic and pneumonia. The results show that treatment offered by traditional medicine overlaps with biomedical health care in the case of respiratory infections, wounds and bruises, fever and biliary colic/cholecystitis. Furthermore, traditional health care appears to be complementary to biomedical health care for chronic illnesses, especially arthritis, and for folk illnesses that are particularly relevant within the local cultural context. Payment from patients to healers included flexible, outcome contingent and non-monetary options. Conclusion Traditional medicine in the study area is adaptive because it corresponds well with local patterns of morbidity, health care needs in relation to chronic illnesses, cultural perceptions of health conditions and socio-economic aspects of health care. The quantitative analysis of plant use reports and patient data represents a novel approach to compare the contribution of traditional and biomedical health care to treatment of particular health conditions. PMID:18194568

  6. Development of an information retrieval tool for biomedical patents.

    PubMed

    Alves, Tiago; Rodrigues, Rúben; Costa, Hugo; Rocha, Miguel

    2018-06-01

    The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipeline's main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Architecture for knowledge-based and federated search of online clinical evidence.

    PubMed

    Coiera, Enrico; Walther, Martin; Nguyen, Ken; Lovell, Nigel H

    2005-10-24

    It is increasingly difficult for clinicians to keep up-to-date with the rapidly growing biomedical literature. Online evidence retrieval methods are now seen as a core tool to support evidence-based health practice. However, standard search engine technology is not designed to manage the many different types of evidence sources that are available or to handle the very different information needs of various clinical groups, who often work in widely different settings. The objectives of this paper are (1) to describe the design considerations and system architecture of a wrapper-mediator approach to federate search system design, including the use of knowledge-based, meta-search filters, and (2) to analyze the implications of system design choices on performance measurements. A trial was performed to evaluate the technical performance of a federated evidence retrieval system, which provided access to eight distinct online resources, including e-journals, PubMed, and electronic guidelines. The Quick Clinical system architecture utilized a universal query language to reformulate queries internally and utilized meta-search filters to optimize search strategies across resources. We recruited 227 family physicians from across Australia who used the system to retrieve evidence in a routine clinical setting over a 4-week period. The total search time for a query was recorded, along with the duration of individual queries sent to different online resources. Clinicians performed 1662 searches over the trial. The average search duration was 4.9 +/- 3.2 s (N = 1662 searches). Mean search duration to the individual sources was between 0.05 s and 4.55 s. Average system time (ie, system overhead) was 0.12 s. The relatively small system overhead compared to the average time it takes to perform a search for an individual source shows that the system achieves a good trade-off between performance and reliability. Furthermore, despite the additional effort required to incorporate the capabilities of each individual source (to improve the quality of search results), system maintenance requires only a small additional overhead.

  8. A unified framework for managing provenance information in translational research

    PubMed Central

    2011-01-01

    Background A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis. PMID:22126369

  9. Rcupcake: an R package for querying and analyzing biomedical data through the BD2K PIC-SURE RESTful API.

    PubMed

    Gutiérrez-Sacristán, Alba; Guedj, Romain; Korodi, Gabor; Stedman, Jason; Furlong, Laura I; Patel, Chirag J; Kohane, Isaac S; Avillach, Paul

    2018-04-15

    In the era of big data and precision medicine, the number of databases containing clinical, environmental, self-reported and biochemical variables is increasing exponentially. Enabling the experts to focus on their research questions rather than on computational data management, access and analysis is one of the most significant challenges nowadays. We present Rcupcake, an R package that contains a variety of functions for leveraging different databases through the BD2K PIC-SURE RESTful API and facilitating its query, analysis and interpretation. The package offers a variety of analysis and visualization tools, including the study of the phenotype co-occurrence and prevalence, according to multiple layers of data, such as phenome, exposome or genome. The package is implemented in R and is available under Mozilla v2 license from GitHub (https://github.com/hms-dbmi/Rcupcake). Two reproducible case studies are also available (https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/SSCcaseStudy_v01.ipynb, https://github.com/hms-dbmi/Rcupcake-case-studies/blob/master/NHANEScaseStudy_v01.ipynb). paul_avillach@hms.harvard.edu. Supplementary data are available at Bioinformatics online.

  10. A network medicine approach to quantify distance between hereditary disease modules on the interactome

    NASA Astrophysics Data System (ADS)

    Caniza, Horacio; Romero, Alfonso E.; Paccanaro, Alberto

    2015-12-01

    We introduce a MeSH-based method that accurately quantifies similarity between heritable diseases at molecular level. This method effectively brings together the existing information about diseases that is scattered across the vast corpus of biomedical literature. We prove that sets of MeSH terms provide a highly descriptive representation of heritable disease and that the structure of MeSH provides a natural way of combining individual MeSH vocabularies. We show that our measure can be used effectively in the prediction of candidate disease genes. We developed a web application to query more than 28.5 million relationships between 7,574 hereditary diseases (96% of OMIM) based on our similarity measure.

  11. Using the Weighted Keyword Model to Improve Information Retrieval for Answering Biomedical Questions

    PubMed Central

    Yu, Hong; Cao, Yong-gang

    2009-01-01

    Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data. PMID:21347188

  12. Using the weighted keyword model to improve information retrieval for answering biomedical questions.

    PubMed

    Yu, Hong; Cao, Yong-Gang

    2009-03-01

    Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.

  13. From wholes to fragments to wholes-what gets lost in translation?

    PubMed

    Kirkengen, Anna Luise

    2018-05-31

    The highly demanding and, in a certain sense, unique, working conditions of general practitioners (GPs) are characterized by two phenomena: First, they involve an increasing familiarity with individual patients over time, which promotes a deepening of insight. Second, they enable the GP to encounter all kinds of health problems, which in turn facilitates pattern recognition, at both individual and group levels, particularly the kind of patterns currently termed "multimorbidity." Whereas the term "comorbidity" is used to denote states of bad health in which 1 disease is considered to predate and evoke other ailments or diseases, the term multimorbidity is applied when finding several presumably separate diseases in a person who suffers from them either sequentially or simultaneously. Encounters with patients whose suffering fits the biomedical concept and terminology of multimorbidity are among the most common which GPs face, presenting them with some of their most demanding tasks. The term multimorbidity needs to be examined, however. As it alludes to a multiplicity of diseases, it rests on an assumption of separateness of states of bad health that might not be well founded. An adequate determination of what to deem a "separate" state of bad health would require that the biomedical concept of causation be scrutinized. © 2018 John Wiley & Sons, Ltd.

  14. Characterizing upper urinary tract dilation on ultrasound: a survey of North American pediatric radiologists' practices.

    PubMed

    Swenson, David W; Darge, Kassa; Ziniel, Sonja I; Chow, Jeanne S

    2015-04-01

    Radiologists commonly evaluate children first diagnosed with urinary tract dilation on prenatal ultrasound (US). To establish how North American pediatric radiologists define and report findings of urinary tract dilation on US. A web-based survey was sent to North American members of the Society for Pediatric Radiology (SPR) from January to February 2014. Reporting practices and interpretation of three image-based cases using free text were queried. Responses to close-ended questions were analyzed with descriptive statistics, while free-text responses to the three cases were categorized and analyzed as (1) using either descriptive terminology or an established numerical grading system and (2) as providing a quantitative term for the degree of dilation. Two hundred eighty-four pediatric radiologists answered the survey resulting in a response rate of 19.0%. There is a great variety in the terms used to describe urinary tract dilation with 66.2% using descriptive terminology, 35.6% using Society for Fetal Urology (SFU) grading system and 35.9% measuring anterior-posterior diameter (APD) of the renal pelvis. There is no consensus for a normal postnatal APD or the meaning of hydronephrosis. For the same images, descriptions vary widely in degree of severity ranging from normal to mild to severe. Similar variability exists among those using the SFU system. Ninety-seven percent say they believe a unified descriptive system would be helpful and 87.7% would use it if available. Pediatric radiologists do not have a standardized method for describing urinary tract dilation but have a great desire for such a system and would follow it if available.

  15. Effect of Obesity on Complication Rate After Elbow Arthroscopy in a Medicare Population.

    PubMed

    Werner, Brian C; Fashandi, Ahmad H; Chhabra, A Bobby; Deal, D Nicole

    2016-03-01

    To use a national insurance database to explore the association of obesity with the incidence of complications after elbow arthroscopy in a Medicare population. Using Current Procedural Terminology (CPT) and International Classification of Diseases, 9th Revision (ICD-9) procedure codes, we queried the PearlDiver database for patients undergoing elbow arthroscopy. Patients were divided into obese (body mass index [BMI] >30) and nonobese (BMI <30) cohorts using ICD-9 codes for BMI and obesity. Nonobese patients were matched to obese patients based on age, sex, tobacco use, diabetes, and rheumatoid arthritis. Postoperative complications were assessed with ICD-9 and Current Procedural Terminology codes, including infection, nerve injury, stiffness, and medical complications. A total of 2,785 Medicare patients who underwent elbow arthroscopy were identified from 2005 to 2012; 628 patients (22.5%) were coded as obese or morbidly obese, and 628 matched nonobese patients formed the control group. There were no differences between the obese patients and matched control nonobese patients regarding type of elbow arthroscopy, previous elbow fracture or previous elbow arthroscopy. Obese patients had greater rates of all assessed complications, including infection (odds ratio [OR] 2.8, P = .037), nerve injury (OR 5.4, P = .001), stiffness (OR 1.9, P = .016) and medical complications (OR 6.9, P < .0001). Obesity is associated with significantly increased rates of all assessed complications after elbow arthroscopy in a Medicare population, including infection, nerve injury, stiffness, and medical complications. Therapeutic Level III, case-control study. Copyright © 2016 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.

  16. OntoADR a semantic resource describing adverse drug reactions to support searching, coding, and information retrieval.

    PubMed

    Souvignet, Julien; Declerck, Gunnar; Asfari, Hadyl; Jaulent, Marie-Christine; Bousquet, Cédric

    2016-10-01

    Efficient searching and coding in databases that use terminological resources requires that they support efficient data retrieval. The Medical Dictionary for Regulatory Activities (MedDRA) is a reference terminology for several countries and organizations to code adverse drug reactions (ADRs) for pharmacovigilance. Ontologies that are available in the medical domain provide several advantages such as reasoning to improve data retrieval. The field of pharmacovigilance does not yet benefit from a fully operational ontology to formally represent the MedDRA terms. Our objective was to build a semantic resource based on formal description logic to improve MedDRA term retrieval and aid the generation of on-demand custom groupings by appropriately and efficiently selecting terms: OntoADR. The method consists of the following steps: (1) mapping between MedDRA terms and SNOMED-CT, (2) generation of semantic definitions using semi-automatic methods, (3) storage of the resource and (4) manual curation by pharmacovigilance experts. We built a semantic resource for ADRs enabling a new type of semantics-based term search. OntoADR adds new search capabilities relative to previous approaches, overcoming the usual limitations of computation using lightweight description logic, such as the intractability of unions or negation queries, bringing it closer to user needs. Our automated approach for defining MedDRA terms enabled the association of at least one defining relationship with 67% of preferred terms. The curation work performed on our sample showed an error level of 14% for this automated approach. We tested OntoADR in practice, which allowed us to build custom groupings for several medical topics of interest. The methods we describe in this article could be adapted and extended to other terminologies which do not benefit from a formal semantic representation, thus enabling better data retrieval performance. Our custom groupings of MedDRA terms were used while performing signal detection, which suggests that the graphical user interface we are currently implementing to process OntoADR could be usefully integrated into specialized pharmacovigilance software that rely on MedDRA. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. Minimizing the semantic gap in biomedical content-based image retrieval

    NASA Astrophysics Data System (ADS)

    Guan, Haiying; Antani, Sameer; Long, L. Rodney; Thoma, George R.

    2010-03-01

    A major challenge in biomedical Content-Based Image Retrieval (CBIR) is to achieve meaningful mappings that minimize the semantic gap between the high-level biomedical semantic concepts and the low-level visual features in images. This paper presents a comprehensive learning-based scheme toward meeting this challenge and improving retrieval quality. The article presents two algorithms: a learning-based feature selection and fusion algorithm and the Ranking Support Vector Machine (Ranking SVM) algorithm. The feature selection algorithm aims to select 'good' features and fuse them using different similarity measurements to provide a better representation of the high-level concepts with the low-level image features. Ranking SVM is applied to learn the retrieval rank function and associate the selected low-level features with query concepts, given the ground-truth ranking of the training samples. The proposed scheme addresses four major issues in CBIR to improve the retrieval accuracy: image feature extraction, selection and fusion, similarity measurements, the association of the low-level features with high-level concepts, and the generation of the rank function to support high-level semantic image retrieval. It models the relationship between semantic concepts and image features, and enables retrieval at the semantic level. We apply it to the problem of vertebra shape retrieval from a digitized spine x-ray image set collected by the second National Health and Nutrition Examination Survey (NHANES II). The experimental results show an improvement of up to 41.92% in the mean average precision (MAP) over conventional image similarity computation methods.

  18. Learning to rank diversified results for biomedical information retrieval from multiple features.

    PubMed

    Wu, Jiajin; Huang, Jimmy; Ye, Zheng

    2014-01-01

    Different from traditional information retrieval (IR), promoting diversity in IR takes consideration of relationship between documents in order to promote novelty and reduce redundancy thus to provide diversified results to satisfy various user intents. Diversity IR in biomedical domain is especially important as biologists sometimes want diversified results pertinent to their query. A combined learning-to-rank (LTR) framework is learned through a general ranking model (gLTR) and a diversity-biased model. The former is learned from general ranking features by a conventional learning-to-rank approach; the latter is constructed with diversity-indicating features added, which are extracted based on the retrieved passages' topics detected using Wikipedia and ranking order produced by the general learning-to-rank model; final ranking results are given by combination of both models. Compared with baselines BM25 and DirKL on 2006 and 2007 collections, the gLTR has 0.2292 (+16.23% and +44.1% improvement over BM25 and DirKL respectively) and 0.1873 (+15.78% and +39.0% improvement over BM25 and DirKL respectively) in terms of aspect level of mean average precision (Aspect MAP). The LTR method outperforms gLTR on 2006 and 2007 collections with 4.7% and 2.4% improvement in terms of Aspect MAP. The learning-to-rank method is an efficient way for biomedical information retrieval and the diversity-biased features are beneficial for promoting diversity in ranking results.

  19. Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery

    PubMed Central

    Peng, Yufang; Bonifield, Gary; Smalheiser, Neil R.

    2017-01-01

    Within well-established fields of biomedical science, we identify “gaps”, topical areas of investigation that might be expected to occur but are missing. We define a field by carrying out a topical PubMed query, and analyze Medical Subject Headings by which the set of retrieved articles are indexed. Medical Subject headings (MeSH terms) which occur in >1% of the articles are examined pairwise to see how often they are predicted to co-occur within individual articles (assuming that they are independent of each other). A pair of MeSH terms that are predicted to co-occur in at least 10 articles, yet are not observed to co-occur in any article, are “gaps” and were studied further in a corpus of 10 disease-related article sets and 10 related to biological processes. Overall, articles that filled gaps were cited more heavily than non-gap-filling articles and were 61% more likely to be published in multidisciplinary high-impact journals. Nine different features of these “gaps” were characterized and tested to learn which, if any, correlate with the appearance of one or more articles containing both MeSH terms within the next five years. Several different types of gaps were identified, each having distinct combinations of predictive features: a) those arising as a byproduct of MeSH indexing rules; b) those having little biological meaning; c) those representing “low hanging fruit” for immediate exploitation; and d) those representing gaps across disciplines or sub-disciplines that do not talk to each other or work together. We have built a free, open tool called “Mine the Gap!” that identifies and characterizes the “gaps” for any PubMed query, which can be accessed via the Anne O’Tate value-added PubMed search interface (http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi). PMID:29271976

  20. [Comparison of Japanese Notation and Meanings among Three Terminologies in Radiological Technology Domain].

    PubMed

    Yagahara, Ayako; Tsuji, Shintaro; Hukuda, Akihisa; Nishimoto, Naoki; Ogasawara, Katsuhiko

    2016-03-01

    The purpose of this study is to investigate the differences in the notation of technical terms and their meanings among three terminologies in Japanese radiology-related societies. The three terminologies compared in this study were "radiological technology terminology" and its supplement published by the Japan Society of Radiological Technology, "medical physics terminology" published by the Japan Society of Medical Physics, and "electric radiation terminology" published by the Japan Radiological Society. Terms were entered into spreadsheets and classified into the following three categories: Japanese notation, English notation, and meanings. In the English notation, terms were matched to character strings in the three terminologies and were extracted and compared. The Japanese notations were compared among three terminologies, and the difference between the meanings of the two terminologies radiological technology terminology and electric radiation terminology were compared. There were a total of 14,982 terms in the three terminologies. In English character strings, 2,735 terms were matched to more than two terminologies, with 801 of these terms matched to all the three terminologies. Of those terms in English character strings matched to three terminologies, 752 matched to Japanese character strings. Of the terms in English character strings matched to two terminologies, 1,240 matched to Japanese character strings. With regard to the meanings category, eight terms had mismatched meanings between the two terminologies. For these terms, there were common concepts between two different meaning terms, and it was considered that the derived concepts were described based on domain.

  1. GOVERNING GENETIC DATABASES: COLLECTION, STORAGE AND USE

    PubMed Central

    Gibbons, Susan M.C.; Kaye, Jane

    2008-01-01

    This paper provides an introduction to a collection of five papers, published as a special symposium journal issue, under the title: “Governing Genetic Databases: Collection, Storage and Use”. It begins by setting the scene, to provide a backdrop and context for the papers. It describes the evolving scientific landscape around genetic databases and genomic research, particularly within the biomedical and criminal forensic investigation fields. It notes the lack of any clear, coherent or coordinated legal governance regime, either at the national or international level. It then identifies and reflects on key cross-cutting issues and themes that emerge from the five papers, in particular: terminology and definitions; consent; special concerns around population genetic databases (biobanks) and forensic databases; international harmonisation; data protection; data access; boundary-setting; governance; and issues around balancing individual interests against public good values. PMID:18841252

  2. Use of NLM medical subject headings with the MeSH2010 thesaurus in the PORTAL-DOORS system.

    PubMed

    Taswell, Carl

    2010-01-01

    The NLM MeSH Thesaurus has been incorporated for use in the PORTAL-DOORS System (PDS) for resource metadata management on the semantic web. All 25588 descriptor records from the NLM 2010 MeSH Thesaurus have been exposed as web accessible resources by the PDS MeSH2010 Thesaurus implemented as a PDS PORTAL Registry operating as a RESTful web service. Examples of records from the PDS MeSH2010 PORTAL are demonstrated along with their use by records in other PDS PORTAL Registries that reference the concepts from the MeSH2010 Thesaurus. Use of this important biomedical terminology will greatly enhance the quality of metadata content of other PDS records thus improving cross-domain searches between different problem oriented domains and amongst different clinical specialty fields.

  3. Molecular genetics made simple

    PubMed Central

    Kassem, Heba Sh.; Girolami, Francesca; Sanoudou, Despina

    2012-01-01

    Abstract Genetics have undoubtedly become an integral part of biomedical science and clinical practice, with important implications in deciphering disease pathogenesis and progression, identifying diagnostic and prognostic markers, as well as designing better targeted treatments. The exponential growth of our understanding of different genetic concepts is paralleled by a growing list of genetic terminology that can easily intimidate the unfamiliar reader. Rendering genetics incomprehensible to the clinician however, defeats the very essence of genetic research: its utilization for combating disease and improving quality of life. Herein we attempt to correct this notion by presenting the basic genetic concepts along with their usefulness in the cardiology clinic. Bringing genetics closer to the clinician will enable its harmonious incorporation into clinical care, thus not only restoring our perception of its simple and elegant nature, but importantly ensuring the maximal benefit for our patients. PMID:25610837

  4. SchizConnect: Mediating Neuroimaging Databases on Schizophrenia and Related Disorders for Large-Scale Integration

    PubMed Central

    Wang, Lei; Alpert, Kathryn I.; Calhoun, Vince D.; Cobia, Derin J.; Keator, David B.; King, Margaret D.; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D.; Potkin, Steven G.; Turner, Jessica A.; Ambite, Jose Luis

    2015-01-01

    SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation—translating across data sources—so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information are being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. PMID:26142271

  5. Using RDF and Git to Realize a Collaborative Metadata Repository.

    PubMed

    Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas

    2018-01-01

    The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.

  6. Hippocampome.org: a knowledge base of neuron types in the rodent hippocampus.

    PubMed

    Wheeler, Diek W; White, Charise M; Rees, Christopher L; Komendantov, Alexander O; Hamilton, David J; Ascoli, Giorgio A

    2015-09-24

    Hippocampome.org is a comprehensive knowledge base of neuron types in the rodent hippocampal formation (dentate gyrus, CA3, CA2, CA1, subiculum, and entorhinal cortex). Although the hippocampal literature is remarkably information-rich, neuron properties are often reported with incompletely defined and notoriously inconsistent terminology, creating a formidable challenge for data integration. Our extensive literature mining and data reconciliation identified 122 neuron types based on neurotransmitter, axonal and dendritic patterns, synaptic specificity, electrophysiology, and molecular biomarkers. All ∼3700 annotated properties are individually supported by specific evidence (∼14,000 pieces) in peer-reviewed publications. Systematic analysis of this unprecedented amount of machine-readable information reveals novel correlations among neuron types and properties, the potential connectivity of the full hippocampal circuitry, and outstanding knowledge gaps. User-friendly browsing and online querying of Hippocampome.org may aid design and interpretation of both experiments and simulations. This powerful, simple, and extensible neuron classification endeavor is unique in its detail, utility, and completeness.

  7. Biomedical and veterinary science can increase our understanding of coral disease

    USGS Publications Warehouse

    Work, Thierry M.; Richardson, Laurie L.; Reynolds, T.L.; Willis, Bette L.

    2008-01-01

    A balanced approach to coral disease investigation is critical for understanding the global decline of corals. Such an approach should involve the proper use of biomedical concepts, tools, and terminology to address confusion and promote clarity in the coral disease literature. Investigating disease in corals should follow a logical series of steps including identification of disease, systematic morphologic descriptions of lesions at the gross and cellular levels, measurement of health indices, and experiments to understand disease pathogenesis and the complex interactions between host, pathogen, and the environment. This model for disease investigation is widely accepted in the medical, veterinary and invertebrate pathology disciplines. We present standard biomedical rationale behind the detection, description, and naming of diseases and offer examples of the application of Koch's postulates to elucidate the etiology of some infectious diseases. Basic epidemiologic concepts are introduced to help investigators think systematically about the cause(s) of complex diseases. A major goal of disease investigation in corals and other organisms is to gather data that will enable the establishment of standardized case definitions to distinguish among diseases. Concepts and facts amassed from empirical studies over the centuries by medical and veterinary pathologists have standardized disease investigation and are invaluable to coral researchers because of the robust comparisons they enable; examples of these are given throughout this paper. Arguments over whether coral diseases are caused by primary versus opportunistic pathogens reflect the lack of data available to prove or refute such hypotheses and emphasize the need for coral disease investigations that focus on: characterizing the normal microbiota and physiology of the healthy host; defining ecological interactions within the microbial community associated with the host; and investigating host immunity, host-agent interactions, pathology, pathogenesis, and factors that promote the pathogenicity of the causative agent(s) of disease.

  8. Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

    PubMed

    Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin M

    2016-09-09

    Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

  9. National Medical Terminology Server in Korea

    NASA Astrophysics Data System (ADS)

    Lee, Sungin; Song, Seung-Jae; Koh, Soonjeong; Lee, Soo Kyoung; Kim, Hong-Gee

    Interoperable EHR (Electronic Health Record) necessitates at least the use of standardized medical terminologies. This paper describes a medical terminology server, LexCare Suite, which houses terminology management applications, such as a terminology editor, and a terminology repository populated with international standard terminology systems such as Systematized Nomenclature of Medicine (SNOMED). The server is to satisfy the needs of quality terminology systems to local primary to tertiary hospitals. Our partner general hospitals have used the server to test its applicability. This paper describes the server and the results of the applicability test.

  10. The Function Biomedical Informatics Research Network Data Repository

    PubMed Central

    Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.

    2015-01-01

    The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863

  11. Combating unethical publications with plagiarism detection services

    PubMed Central

    Garner, H.R.

    2010-01-01

    About 3,000 new citations that are highly similar to citations in previously published manuscripts that appear each year in the biomedical literature (Medline) alone. This underscores the importance for the opportunity for editors and reviewers to have detection system to identify highly similar text in submitted manuscripts so that they can then review them for novelty. New software-based services, both commercial and free, provide this capability. The availability of such tools provides both a way to intercept suspect manuscripts and serve as a deterrent. Unfortunately, the capabilities of these services vary considerably, mainly as a consequence of the availability and completeness of the literature bases to which new queries are compared. Most of the commercial software has been designed for detection of plagiarism in high school and college papers, however, there is at least one fee-based service (CrossRef) and one free service (etblast.org) which are designed to target the needs of the biomedical publication industry. Information on these various services, examples of the type of operability and output, and things that need to be considered by publishers, editors and reviewers before selecting and using these services is provided. PMID:21194644

  12. SCALEUS: Semantic Web Services Integration for Biomedical Applications.

    PubMed

    Sernadela, Pedro; González-Castro, Lorena; Oliveira, José Luís

    2017-04-01

    In recent years, we have witnessed an explosion of biological data resulting largely from the demands of life science research. The vast majority of these data are freely available via diverse bioinformatics platforms, including relational databases and conventional keyword search applications. This type of approach has achieved great results in the last few years, but proved to be unfeasible when information needs to be combined or shared among different and scattered sources. During recent years, many of these data distribution challenges have been solved with the adoption of semantic web. Despite the evident benefits of this technology, its adoption introduced new challenges related with the migration process, from existent systems to the semantic level. To facilitate this transition, we have developed Scaleus, a semantic web migration tool that can be deployed on top of traditional systems in order to bring knowledge, inference rules, and query federation to the existent data. Targeted at the biomedical domain, this web-based platform offers, in a single package, straightforward data integration and semantic web services that help developers and researchers in the creation process of new semantically enhanced information systems. SCALEUS is available as open source at http://bioinformatics-ua.github.io/scaleus/ .

  13. A semantic web ontology for small molecules and their biological targets.

    PubMed

    Choi, Jooyoung; Davis, Melissa J; Newman, Andrew F; Ragan, Mark A

    2010-05-24

    A wide range of data on sequences, structures, pathways, and networks of genes and gene products is available for hypothesis testing and discovery in biological and biomedical research. However, data describing the physical, chemical, and biological properties of small molecules have not been well-integrated with these resources. Semantically rich representations of chemical data, combined with Semantic Web technologies, have the potential to enable the integration of small molecule and biomolecular data resources, expanding the scope and power of biomedical and pharmacological research. We employed the Semantic Web technologies Resource Description Framework (RDF) and Web Ontology Language (OWL) to generate a Small Molecule Ontology (SMO) that represents concepts and provides unique identifiers for biologically relevant properties of small molecules and their interactions with biomolecules, such as proteins. We instanced SMO using data from three public data sources, i.e., DrugBank, PubChem and UniProt, and converted to RDF triples. Evaluation of SMO by use of predetermined competency questions implemented as SPARQL queries demonstrated that data from chemical and biomolecular data sources were effectively represented and that useful knowledge can be extracted. These results illustrate the potential of Semantic Web technologies in chemical, biological, and pharmacological research and in drug discovery.

  14. Combating unethical publications with plagiarism detection services.

    PubMed

    Garner, H R

    2011-01-01

    About 3,000 new citations that are highly similar to citations in previously published manuscripts that appear each year in the biomedical literature (Medline) alone. This underscores the importance for the opportunity for editors and reviewers to have detection system to identify highly similar text in submitted manuscripts so that they can then review them for novelty. New software-based services, both commercial and free, provide this capability. The availability of such tools provides both a way to intercept suspect manuscripts and serve as a deterrent. Unfortunately, the capabilities of these services vary considerably, mainly as a consequence of the availability and completeness of the literature bases to which new queries are compared. Most of the commercial software has been designed for detection of plagiarism in high school and college papers; however, there is at least 1 fee-based service (CrossRef) and 1 free service (etblast.org), which are designed to target the needs of the biomedical publication industry. Information on these various services, examples of the type of operability and output, and things that need to be considered by publishers, editors, and reviewers before selecting and using these services is provided. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. Learning to rank-based gene summary extraction.

    PubMed

    Shang, Yue; Hao, Huihui; Wu, Jiajin; Lin, Hongfei

    2014-01-01

    In recent years, the biomedical literature has been growing rapidly. These articles provide a large amount of information about proteins, genes and their interactions. Reading such a huge amount of literature is a tedious task for researchers to gain knowledge about a gene. As a result, it is significant for biomedical researchers to have a quick understanding of the query concept by integrating its relevant resources. In the task of gene summary generation, we regard automatic summary as a ranking problem and apply the method of learning to rank to automatically solve this problem. This paper uses three features as a basis for sentence selection: gene ontology relevance, topic relevance and TextRank. From there, we obtain the feature weight vector using the learning to rank algorithm and predict the scores of candidate summary sentences and obtain top sentences to generate the summary. ROUGE (a toolkit for summarization of automatic evaluation) was used to evaluate the summarization result and the experimental results showed that our method outperforms the baseline techniques. According to the experimental result, the combination of three features can improve the performance of summary. The application of learning to rank can facilitate the further expansion of features for measuring the significance of sentences.

  16. A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma.

    PubMed

    Stripelis, Dimitris; Ambite, José Luis; Chiang, Yao-Yi; Eckel, Sandrah P; Habre, Rima

    2017-04-01

    According to the Centers for Disease Control, in the United States there are 6.8 million children living with asthma. Despite the importance of the disease, the available prognostic tools are not sufficient for biomedical researchers to thoroughly investigate the potential risks of the disease at scale. To overcome these challenges we present a big data integration and analysis infrastructure developed by our Data and Software Coordination and Integration Center (DSCIC) of the NIBIB-funded Pediatric Research using Integrated Sensor Monitoring Systems (PRISMS) program. Our goal is to help biomedical researchers to efficiently predict and prevent asthma attacks. The PRISMS-DSCIC is responsible for collecting, integrating, storing, and analyzing real-time environmental, physiological and behavioral data obtained from heterogeneous sensor and traditional data sources. Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS. A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized schema. The system provides both batch and stream analytic capabilities over the massive data generated by wearable and fixed sensors.

  17. An introduction to information retrieval: applications in genomics

    PubMed Central

    Nadkarni, P M

    2011-01-01

    Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user’s query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics. PMID:12049181

  18. Pathology data integration with eXtensible Markup Language.

    PubMed

    Berman, Jules J

    2005-02-01

    It is impossible to overstate the importance of XML (eXtensible Markup Language) as a data organization tool. With XML, pathologists can annotate all of their data (clinical and anatomic) in a format that can transform every pathology report into a database, without compromising narrative structure. The purpose of this manuscript is to provide an overview of XML for pathologists. Examples will demonstrate how pathologists can use XML to annotate individual data elements and to structure reports in a common format that can be merged with other XML files or queried using standard XML tools. This manuscript gives pathologists a glimpse into how XML allows pathology data to be linked to other types of biomedical data and reduces our dependence on centralized proprietary databases.

  19. Enabling Graph Appliance for Genome Assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Rina; Graves, Jeffrey A; Lee, Sangkeun

    2015-01-01

    In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers developed have used de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to storemore » and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multi-threaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray s Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing a de Bruin graphs as RDF graphs and propose an iterative querying approach for finding Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.« less

  20. Exploration of Global Trend on Biomedical Application of Polyhydroxyalkanoate (PHA): A Patent Survey.

    PubMed

    Ponnaiah, Paulraj; Vnoothenei, Nagiah; Chandramohan, Muruganandham; Thevarkattil, Mohamed Javad Pazhayakath

    2018-01-30

    Polyhydroxyalkanoates are bio-based, biodegradable naturally occurring polymers produced by a wide range of organisms, from bacteria to higher mammals. The properties and biocompatibility of PHA make it possible for a wide spectrum of applications. In this context, we analyze the potential applications of PHA in biomedical science by exploring the global trend through the patent survey. The survey suggests that PHA is an attractive candidate in such a way that their applications are widely distributed in the medical industry, drug delivery system, dental material, tissue engineering, packaging material as well as other useful products. In our present study, we explored patents associated with various biomedical applications of polyhydroxyalkanoates. Patent databases of European Patent Office, United States Patent and Trademark Office and World Intellectual Property Organization were mined. We developed an intensive exploration approach to eliminate overlapping patents and sort out significant patents. We demarcated the keywords and search criterions and established search patterns for the database request. We retrieved documents within the recent 6 years, 2010 to 2016 and sort out the collected data stepwise to gather the most appropriate documents in patent families for further scrutiny. By this approach, we retrieved 23,368 patent documents from all the three databases and the patent titles were further analyzed for the relevance of polyhydroxyalkanoates in biomedical applications. This ensued in the documentation of approximately 226 significant patents associated with biomedical applications of polyhydroxyalkanoates and the information was classified into six major groups. Polyhydroxyalkanoates has been patented in such a way that their applications are widely distributed in the medical industry, drug delivery system, dental material, tissue engineering, packaging material as well as other useful products. There are many avenues through which PHA & PHB could be used. Our analysis shows patent information can be used to identify various applications of PHA and its representatives in the biomedical field. Upcoming studies can focus on the application of PHA in the different field to discover the related topics and associate to this study. We believe that this approach of analysis and findings can initiate new researchers to undertake similar kind of studies in their represented field to fill the gap between the patent articles and researchpublications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. Current biomedical scientific impact (2013) of institutions, academic journals and researchers in the Republic of Macedonia.

    PubMed

    Spiroski, Mirko

    2014-01-01

    To analyse current ranking (2013) of institutions, journals and researchers in the Republic of Macedonia. the country rankings of R. Macedonia were analyzed with SCImago Country & Journal Rank (SJR) for subject area Medicine in the years 1996-2013, and ordered by H-index. SCImago Institutions Rankings for 2013 was used for the scientific impact of biomedical institutions in the Republic of Macedonia. Journal metrics from Elsevier for the Macedonian scholarly journals for the period 2009-2013 were performed. Source Normalized Impact per Paper (SNIP), the Impact per Publication (IPP), and SCImago Journal Rank (SJR) were analysed. Macedonian scholarly biomedical journals included in Google Scholar metrics (2013, 2012) were analysed with h5-index and h5-median (June 2014). A semantic analysis of the PubMed database was performed with GoPubMed on November 2, 2014 in order to identify published papers from the field of biomedical sciences affiliated with the country of Macedonia. Harzing's Publish or Perish software was used for author impact analysis and the calculation of the Hirsh-index based on Google Scholar query. The rank of subject area Medicine of R. Macedonia according to the SCImago Journal & Country Rank (SJR) is 110th in the world and 17th in Eastern Europe. Of 20 universities in Macedonia, only Ss Cyril and Methodius University, Skopje, and the University St Clement of Ohrid, Bitola, are listed in the SCImago Institutions Rankings (SIR) for 2013. A very small number of Macedonian scholarly journals is included in Web of Sciences (2), PubMed (1), PubMed Central (1), SCOPUS (6), SCImago (6), and Google Scholar metrics (6). The rank of Hirsh index (h-index) was different from the rank of number of abstracts indexed in PubMed for the top 20 authors from R. Macedonia. The current biomedical scientific impact (2013) of institutions, academic journals and researchers in R. Macedonia is very low. There is an urgent need for organized measures to improve the quality and output of institutions, scholarly journals, and researchers in R. Macedonia in order to achieve higher international standards.

  2. Supporting cognition in systems biology analysis: findings on users' processes and design implications.

    PubMed

    Mirel, Barbara

    2009-02-13

    Current usability studies of bioinformatics tools suggest that tools for exploratory analysis support some tasks related to finding relationships of interest but not the deep causal insights necessary for formulating plausible and credible hypotheses. To better understand design requirements for gaining these causal insights in systems biology analyses a longitudinal field study of 15 biomedical researchers was conducted. Researchers interacted with the same protein-protein interaction tools to discover possible disease mechanisms for further experimentation. Findings reveal patterns in scientists' exploratory and explanatory analysis and reveal that tools positively supported a number of well-structured query and analysis tasks. But for several of scientists' more complex, higher order ways of knowing and reasoning the tools did not offer adequate support. Results show that for a better fit with scientists' cognition for exploratory analysis systems biology tools need to better match scientists' processes for validating, for making a transition from classification to model-based reasoning, and for engaging in causal mental modelling. As the next great frontier in bioinformatics usability, tool designs for exploratory systems biology analysis need to move beyond the successes already achieved in supporting formulaic query and analysis tasks and now reduce current mismatches with several of scientists' higher order analytical practices. The implications of results for tool designs are discussed.

  3. Capturing domain knowledge from multiple sources: the rare bone disorders use case.

    PubMed

    Groza, Tudor; Tudorache, Tania; Robinson, Peter N; Zankl, Andreas

    2015-01-01

    Lately, ontologies have become a fundamental building block in the process of formalising and storing complex biomedical information. The community-driven ontology curation process, however, ignores the possibility of multiple communities building, in parallel, conceptualisations of the same domain, and thus providing slightly different perspectives on the same knowledge. The individual nature of this effort leads to the need of a mechanism to enable us to create an overarching and comprehensive overview of the different perspectives on the domain knowledge. We introduce an approach that enables the loose integration of knowledge emerging from diverse sources under a single coherent interoperable resource. To accurately track the original knowledge statements, we record the provenance at very granular levels. We exemplify the approach in the rare bone disorders domain by proposing the Rare Bone Disorders Ontology (RBDO). Using RBDO, researchers are able to answer queries, such as: "What phenotypes describe a particular disorder and are common to all sources?" or to understand similarities between disorders based on divergent groupings (classifications) provided by the underlying sources. RBDO is available at http://purl.org/skeletome/rbdo. In order to support lightweight query and integration, the knowledge captured by RBDO has also been made available as a SPARQL Endpoint at http://bio-lark.org/se_skeldys.html.

  4. Cafe Variome: general-purpose software for making genotype-phenotype data discoverable in restricted or open access contexts.

    PubMed

    Lancaster, Owen; Beck, Tim; Atlan, David; Swertz, Morris; Thangavelu, Dhiwagaran; Veal, Colin; Dalgleish, Raymond; Brookes, Anthony J

    2015-10-01

    Biomedical data sharing is desirable, but problematic. Data "discovery" approaches-which establish the existence rather than the substance of data-precisely connect data owners with data seekers, and thereby promote data sharing. Cafe Variome (http://www.cafevariome.org) was therefore designed to provide a general-purpose, Web-based, data discovery tool that can be quickly installed by any genotype-phenotype data owner, or network of data owners, to make safe or sensitive content appropriately discoverable. Data fields or content of any type can be accommodated, from simple ID and label fields through to extensive genotype and phenotype details based on ontologies. The system provides a "shop window" in front of data, with main interfaces being a simple search box and a powerful "query-builder" that enable very elaborate queries to be formulated. After a successful search, counts of records are reported grouped by "openAccess" (data may be directly accessed), "linkedAccess" (a source link is provided), and "restrictedAccess" (facilitated data requests and subsequent provision of approved records). An administrator interface provides a wide range of options for system configuration, enabling highly customized single-site or federated networks to be established. Current uses include rare disease data discovery, patient matchmaking, and a Beacon Web service. © 2015 WILEY PERIODICALS, INC.

  5. Publication of nuclear magnetic resonance experimental data with semantic web technology and the application thereof to biomedical research of proteins.

    PubMed

    Yokochi, Masashi; Kobayashi, Naohiro; Ulrich, Eldon L; Kinjo, Akira R; Iwata, Takeshi; Ioannidis, Yannis E; Livny, Miron; Markley, John L; Nakamura, Haruki; Kojima, Chojiro; Fujiwara, Toshimichi

    2016-05-05

    The nuclear magnetic resonance (NMR) spectroscopic data for biological macromolecules archived at the BioMagResBank (BMRB) provide a rich resource of biophysical information at atomic resolution. The NMR data archived in NMR-STAR ASCII format have been implemented in a relational database. However, it is still fairly difficult for users to retrieve data from the NMR-STAR files or the relational database in association with data from other biological databases. To enhance the interoperability of the BMRB database, we present a full conversion of BMRB entries to two standard structured data formats, XML and RDF, as common open representations of the NMR-STAR data. Moreover, a SPARQL endpoint has been deployed. The described case study demonstrates that a simple query of the SPARQL endpoints of the BMRB, UniProt, and Online Mendelian Inheritance in Man (OMIM), can be used in NMR and structure-based analysis of proteins combined with information of single nucleotide polymorphisms (SNPs) and their phenotypes. We have developed BMRB/XML and BMRB/RDF and demonstrate their use in performing a federated SPARQL query linking the BMRB to other databases through standard semantic web technologies. This will facilitate data exchange across diverse information resources.

  6. Terminology, a Translational Discipline.

    ERIC Educational Resources Information Center

    Ahrens, Helga

    1994-01-01

    Discusses the importance of qualified terminology and its implications for terminological activity. Argues that students have to learn how to organize their terminological activity. Suggests that translation is a special kind of intercultural communication and is an indispensable part of translational action. Argues that terminology be examined…

  7. Liquid-liquid two phase systems for the production of porous hydrogels and hydrogel microspheres for biomedical applications: A tutorial review

    PubMed Central

    Elbert, Donald L.

    2010-01-01

    Macroporous hydrogels may have direct applications in regenerative medicine as scaffolds to support tissue formation. Hydrogel microspheres may be used as drug delivery vehicles or as building blocks to assemble modular scaffolds. A variety of techniques exist to produce macroporous hydrogels and hydrogel microspheres. A subset of these relies on liquid-liquid two phase systems. Within this subset, vastly different types of polymerization processes are found. In this review, the history, terminology and classification of liquid-liquid two phase polymerization and crosslinking are described. Instructive examples of hydrogel microsphere and macroporous scaffold formation by precipitation/dispersion, emulsion and suspension polymerizations are used to illustrate the nature of these processes. The role of the kinetics of phase separation in determining the morphology of scaffolds and microspheres is also delineated. Brief descriptions of miniemulsion, microemulsion polymerization and ionotropic gelation are also included. PMID:20659596

  8. The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering Plant1[W][OA

    PubMed Central

    Ilic, Katica; Kellogg, Elizabeth A.; Jaiswal, Pankaj; Zapata, Felipe; Stevens, Peter F.; Vincent, Leszek P.; Avraham, Shulamit; Reiser, Leonore; Pujar, Anuradha; Sachs, Martin M.; Whitman, Noah T.; McCouch, Susan R.; Schaeffer, Mary L.; Ware, Doreen H.; Stein, Lincoln D.; Rhee, Seung Y.

    2007-01-01

    Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological representation of anatomy and morphology of a flowering plant. The PSO is intended for a broad plant research community, including bench scientists, curators in genomic databases, and bioinformaticians. The initial releases of the PSO integrated existing ontologies for Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and rice (Oryza sativa); more recent versions of the ontology encompass terms relevant to Fabaceae, Solanaceae, additional cereal crops, and poplar (Populus spp.). Databases such as The Arabidopsis Information Resource, Nottingham Arabidopsis Stock Centre, Gramene, MaizeGDB, and SOL Genomics Network are using the PSO to describe expression patterns of genes and phenotypes of mutants and natural variants and are regularly contributing new annotations to the Plant Ontology database. The PSO is also used in specialized public databases, such as BRENDA, GENEVESTIGATOR, NASCArrays, and others. Over 10,000 gene annotations and phenotype descriptions from participating databases can be queried and retrieved using the Plant Ontology browser. The PSO, as well as contributed gene associations, can be obtained at www.plantontology.org. PMID:17142475

  9. Advancing the Framework: Use of Health Data—A Report of a Working Conference of the American Medical Informatics Association

    PubMed Central

    Bloomrosen, Meryl; Detmer, Don

    2008-01-01

    The fields of health informatics and biomedical research increasingly depend on the availability of aggregated health data. Yet, despite over fifteen years of policy work on health data issues, the United States (U.S.) lacks coherent policy to guide users striving to navigate the ethical, political, technical, and economic challenges associated with health data use. In 2007, building on more than a decade of previous work, the American Medical Informatics Association (AMIA) convened a panel of experts to stimulate discussion about and action on a national framework for health data use. This initiative is being carried out in the context of rapidly accelerating advances in the fields of health informatics and biomedical research, many of which are dependent on the availability of aggregated health data. Use of these data poses complex challenges that must be addressed by public policy. This paper highlights the results of the meeting, presents data stewardship as a key building block in the national framework, and outlines stewardship principles for the management of health information. The authors also introduce a taxonomy developed to focus definitions and terminology in the evolving field of health data applications. Finally, they identify areas for further policy analysis and recommend that public and private sector organizations elevate consideration of a national framework on the uses of health data to a top priority. PMID:18755988

  10. Aggregated Indexing of Biomedical Time Series Data

    PubMed Central

    Woodbridge, Jonathan; Mortazavi, Bobak; Sarrafzadeh, Majid; Bui, Alex A.T.

    2016-01-01

    Remote and wearable medical sensing has the potential to create very large and high dimensional datasets. Medical time series databases must be able to efficiently store, index, and mine these datasets to enable medical professionals to effectively analyze data collected from their patients. Conventional high dimensional indexing methods are a two stage process. First, a superset of the true matches is efficiently extracted from the database. Second, supersets are pruned by comparing each of their objects to the query object and rejecting any objects falling outside a predetermined radius. This pruning stage heavily dominates the computational complexity of most conventional search algorithms. Therefore, indexing algorithms can be significantly improved by reducing the amount of pruning. This paper presents an online algorithm to aggregate biomedical times series data to significantly reduce the search space (index size) without compromising the quality of search results. This algorithm is built on the observation that biomedical time series signals are composed of cyclical and often similar patterns. This algorithm takes in a stream of segments and groups them to highly concentrated collections. Locality Sensitive Hashing (LSH) is used to reduce the overall complexity of the algorithm, allowing it to run online. The output of this aggregation is used to populate an index. The proposed algorithm yields logarithmic growth of the index (with respect to the total number of objects) while keeping sensitivity and specificity simultaneously above 98%. Both memory and runtime complexities of time series search are improved when using aggregated indexes. In addition, data mining tasks, such as clustering, exhibit runtimes that are orders of magnitudes faster when run on aggregated indexes. PMID:27617298

  11. Rembrandt: Helping Personalized Medicine Become a Reality Through Integrative Translational Research

    PubMed Central

    Madhavan, Subha; Zenklusen, Jean-Claude; Kotliarov, Yuri; Sahni, Himanso; Fine, Howard A.; Buetow, Kenneth

    2009-01-01

    Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic. PMID:19208739

  12. Role of Nanoparticles in Drug Delivery and Regenerative Therapy for Bone Diseases.

    PubMed

    Gera, Sonia; Sampathi, Sunitha; Dodoala, Sujatha

    2017-01-01

    Osteoporosis is a disease characterized by progressive bone loss due to aging and menopause in women leading to bone fragility with increased susceptibility towards fractures. The silent disease weakens the bone by altering its microstructure and mass. Therapy is based on either promoting strength (via osteoblast action) or preventing disease (via osteoclast action). Current therapy with different drugs belonging to antiresorptive, anabolic and hormonal classification suffers from poor pharmacokinetic and pharmacodynamic profile. Nanoparticles provide breakthrough as an alternative therapeutic carrier and biomedical imaging tool in bone diseases. The current review highlights bone physiology and pathology along with potential applications of nanoparticles in osteoporosis through use of organic and inorganic particles for drug delivery, biomedical imaging as well as bone tissue regeneration therapy. Inorganic nanoparticles of gold, cerium, platinum and silica have effects on osteoblastic and osteoclastic lineage. Labelling and tracking of bone cells by quantum dots and gold nanoparticles are advanced and non-invasive techniques. Incorporation of nanoparticles into the scaffolds is a more recent technique for improving mechanical strength as well as regeneration during bone grafting. Promising results by in vitro and in vivo studies depicts effects of nanoparticles on biochemical markers and biomechanical parameters during osteoporosis suggesting the bright future of nanoparticles in bone applications. Any therapy which improves the drug profile and delivery to bone tissue will be promising approach. Superparamagnetic, gold, mesoporous silica nanoparticles and quantum dots provide golden opportunities for biomedical imaging by replacing the traditional invasive radionuclide techniques. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. Understanding terminological systems. II: Experience with conceptual and formal representation of structure.

    PubMed

    de Keizer, N F; Abu-Hanna, A

    2000-03-01

    This article describes the application of two popular conceptual and formal representation formalisms, as part of a framework for understanding terminological systems. A precise understanding of the structure of a terminological system is essential to assess existing terminological systems, to recognize patterns in various systems and to build new terminological systems. Our experience with the application of this framework to five well-known terminological systems is described.

  14. Architecture for Knowledge-Based and Federated Search of Online Clinical Evidence

    PubMed Central

    Walther, Martin; Nguyen, Ken; Lovell, Nigel H

    2005-01-01

    Background It is increasingly difficult for clinicians to keep up-to-date with the rapidly growing biomedical literature. Online evidence retrieval methods are now seen as a core tool to support evidence-based health practice. However, standard search engine technology is not designed to manage the many different types of evidence sources that are available or to handle the very different information needs of various clinical groups, who often work in widely different settings. Objectives The objectives of this paper are (1) to describe the design considerations and system architecture of a wrapper-mediator approach to federate search system design, including the use of knowledge-based, meta-search filters, and (2) to analyze the implications of system design choices on performance measurements. Methods A trial was performed to evaluate the technical performance of a federated evidence retrieval system, which provided access to eight distinct online resources, including e-journals, PubMed, and electronic guidelines. The Quick Clinical system architecture utilized a universal query language to reformulate queries internally and utilized meta-search filters to optimize search strategies across resources. We recruited 227 family physicians from across Australia who used the system to retrieve evidence in a routine clinical setting over a 4-week period. The total search time for a query was recorded, along with the duration of individual queries sent to different online resources. Results Clinicians performed 1662 searches over the trial. The average search duration was 4.9 ± 3.2 s (N = 1662 searches). Mean search duration to the individual sources was between 0.05 s and 4.55 s. Average system time (ie, system overhead) was 0.12 s. Conclusions The relatively small system overhead compared to the average time it takes to perform a search for an individual source shows that the system achieves a good trade-off between performance and reliability. Furthermore, despite the additional effort required to incorporate the capabilities of each individual source (to improve the quality of search results), system maintenance requires only a small additional overhead. PMID:16403716

  15. Perceptions of community-based field workers on the effect of a longitudinal biomedical research project on their sustainable livelihoods.

    PubMed

    Moyo, Christabelle S; Francis, Joseph; Bessong, Pascal O

    2017-03-17

    Researchers involved in biomedical community-based projects rarely seek the perspectives of community fieldworkers, who are the 'foot soldiers' in such projects. Understanding the effect of biomedical research on community-based field workers could identify benefits and shortfalls that may be crucial to the success of community-based studies. The present study explored the perceptions of community-based field workers on the effect of the Etiology, Risk Factors and Interactions of Enteric Infections and Malnutrition and the Consequences for Child Health and Development Project" (MAL-ED) South Africa on their tangible and intangible capital which together comprise sustainable livelihoods. The study was conducted in Dzimauli community in Limpopo Province of South Africa between January-February 2016. The sustainable livelihoods framework was used to query community-based field workers' perspectives of both tangible assets such as income and physical assets and intangible assets such as social capital, confidence, and skills. Data were collected through twenty one individual in-depth interviews and one focus group discussion. Data were analysed using the Thematic Content Analysis approach supported by ATLAS.ti, version 7.5.10 software. All the field workers indicated that they benefitted from the MAL-ED South Africa project. The benefits included intangible assets such as acquisition of knowledge and skills, stronger social capital and personal development. Additionally, all indicated that MAL-ED South Africa provided them with the tangible assets of increased income and physical assets. Observations obtained from the focus group discussion and the community-based leaders concurred with the findings from the in-depth interviews. Additionally, some field workers expressed the desire for training in public relations, communication, problem solving and confidence building. The MAL-ED South Africa, biomedical research project, had positive effects on tangible and intangible assets that compose the sustainable livelihoods of community-based fieldworkers. However, the field workers expressed the need to acquire social skills to enable them carry out their duties more efficiently.

  16. qPortal: A platform for data-driven biomedical research.

    PubMed

    Mohr, Christopher; Friedrich, Andreas; Wojnar, David; Kenar, Erhan; Polatkan, Aydin Can; Codrea, Marius Cosmin; Czemmel, Stefan; Kohlbacher, Oliver; Nahnsen, Sven

    2018-01-01

    Modern biomedical research aims at drawing biological conclusions from large, highly complex biological datasets. It has become common practice to make extensive use of high-throughput technologies that produce big amounts of heterogeneous data. In addition to the ever-improving accuracy, methods are getting faster and cheaper, resulting in a steadily increasing need for scalable data management and easily accessible means of analysis. We present qPortal, a platform providing users with an intuitive way to manage and analyze quantitative biological data. The backend leverages a variety of concepts and technologies, such as relational databases, data stores, data models and means of data transfer, as well as front-end solutions to give users access to data management and easy-to-use analysis options. Users are empowered to conduct their experiments from the experimental design to the visualization of their results through the platform. Here, we illustrate the feature-rich portal by simulating a biomedical study based on publically available data. We demonstrate the software's strength in supporting the entire project life cycle. The software supports the project design and registration, empowers users to do all-digital project management and finally provides means to perform analysis. We compare our approach to Galaxy, one of the most widely used scientific workflow and analysis platforms in computational biology. Application of both systems to a small case study shows the differences between a data-driven approach (qPortal) and a workflow-driven approach (Galaxy). qPortal, a one-stop-shop solution for biomedical projects offers up-to-date analysis pipelines, quality control workflows, and visualization tools. Through intensive user interactions, appropriate data models have been developed. These models build the foundation of our biological data management system and provide possibilities to annotate data, query metadata for statistics and future re-analysis on high-performance computing systems via coupling of workflow management systems. Integration of project and data management as well as workflow resources in one place present clear advantages over existing solutions.

  17. A framework for evaluating and utilizing medical terminology mappings.

    PubMed

    Hussain, Sajjad; Sun, Hong; Sinaci, Anil; Erturkmen, Gokce Banu Laleci; Mead, Charles; Gray, Alasdair J G; McGuinness, Deborah L; Prud'Hommeaux, Eric; Daniel, Christel; Forsberg, Kerstin

    2014-01-01

    Use of medical terminologies and mappings across them are considered to be crucial pre-requisites for achieving interoperable eHealth applications. Built upon the outcomes of several research projects, we introduce a framework for evaluating and utilizing terminology mappings that offers a platform for i) performing various mappings strategies, ii) representing terminology mappings together with their provenance information, and iii) enabling terminology reasoning for inferring both new and erroneous mappings. We present the results of the introduced framework from SALUS project where we evaluated the quality of both existing and inferred terminology mappings among standard terminologies.

  18. A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization

    PubMed Central

    He, Zhe; Geller, James; Chen, Yan

    2015-01-01

    Objectives Medical terminologies vary in the amount of concept information (the “density”) represented, even in the same sub-domains. This causes problems in terminology mapping, semantic harmonization and terminology integration. Moreover, complex clinical scenarios need to be encoded by a medical terminology with comprehensive content. SNOMED Clinical Terms (SNOMED CT), a leading clinical terminology, was reported to lack concepts and synonyms, problems that cannot be fully alleviated by using post-coordination. Therefore, a scalable solution is needed to enrich the conceptual content of SNOMED CT. We are developing a structure-based, algorithmic method to identify potential concepts for enriching the conceptual content of SNOMED CT and to support semantic harmonization of SNOMED CT with selected other Unified Medical Language System (UMLS) terminologies. Methods We first identified a subset of English terminologies in the UMLS that have ‘PAR’ relationship labeled with ‘IS_A’ and over 10% overlap with one or more of the 19 hierarchies of SNOMED CT. We call these “reference terminologies” and we note that our use of this name is different from the standard use. Next, we defined a set of topological patterns across pairs of terminologies, with SNOMED CT being one terminology in each pair and the other being one of the reference terminologies. We then explored how often these topological patterns appear between SNOMED CT and each reference terminology, and how to interpret them. Results Four viable reference terminologies were identified. Large density differences between terminologies were found. Expected interpretations of these differences were indeed observed, as follows. A random sample of 299 instances of special topological patterns (“2:3 and 3:2 trapezoids”) showed that 39.1% and 59.5% of analyzed concepts in SNOMED CT and in a reference terminology, respectively, were deemed to be alternative classifications of the same conceptual content. In 30.5% and 17.6% of the cases, it was found that intermediate concepts could be imported into SNOMED CT or into the reference terminology, respectively, to enhance their conceptual content, if approved by a human curator. Other cases included synonymy and errors in one of the terminologies. Conclusion These results show that structure-based algorithmic methods can be used to identify potential concepts to enrich SNOMED CT and the four reference terminologies. The comparative analysis has the future potential of supporting terminology authoring by suggesting new content to improve content coverage and semantic harmonization between terminologies. PMID:25890688

  19. The role of local terminologies in electronic health records. The HEGP experience.

    PubMed

    Daniel-Le Bozec, Christel; Steichen, Olivier; Dart, Thierry; Jaulent, Marie-Christine

    2007-01-01

    Despite decades of work, there is no universally accepted standard medical terminology and no generally usable terminological tools have yet emerged. The local dictionary of concepts of the Georges Pompidou European Hospital (HEGP) is a Terminological System (TS) designed to support clinical data entry. It covers 93 data entry forms and contains definitions and synonyms of more than 5000 concepts, sometimes linked to reference terminologies such as ICD-10. In this article, we evaluate to which extend SNOMED CT could fully replace or rather be mapped to the local terminology system. We first describe the local dictionary of concepts of HEGP according to some published TS characterization framework. Then we discuss the specific role that a local terminology system plays with regards to reference terminologies.

  20. SQLGEN: a framework for rapid client-server database application development.

    PubMed

    Nadkarni, P M; Cheung, K H

    1995-12-01

    SQLGEN is a framework for rapid client-server relational database application development. It relies on an active data dictionary on the client machine that stores metadata on one or more database servers to which the client may be connected. The dictionary generates dynamic Structured Query Language (SQL) to perform common database operations; it also stores information about the access rights of the user at log-in time, which is used to partially self-configure the behavior of the client to disable inappropriate user actions. SQLGEN uses a microcomputer database as the client to store metadata in relational form, to transiently capture server data in tables, and to allow rapid application prototyping followed by porting to client-server mode with modest effort. SQLGEN is currently used in several production biomedical databases.

  1. CrowdMapping: A Crowdsourcing-Based Terminology Mapping Method for Medical Data Standardization.

    PubMed

    Mao, Huajian; Chi, Chenyang; Huang, Boyu; Meng, Haibin; Yu, Jinghui; Zhao, Dongsheng

    2017-01-01

    Standardized terminology is the prerequisite of data exchange in analysis of clinical processes. However, data from different electronic health record systems are based on idiosyncratic terminology systems, especially when the data is from different hospitals and healthcare organizations. Terminology standardization is necessary for the medical data analysis. We propose a crowdsourcing-based terminology mapping method, CrowdMapping, to standardize the terminology in medical data. CrowdMapping uses a confidential model to determine how terminologies are mapped to a standard system, like ICD-10. The model uses mappings from different health care organizations and evaluates the diversity of the mapping to determine a more sophisticated mapping rule. Further, the CrowdMapping model enables users to rate the mapping result and interact with the model evaluation. CrowdMapping is a work-in-progress system, we present initial results mapping terminologies.

  2. Information retrieval and terminology extraction in online resources for patients with diabetes.

    PubMed

    Seljan, Sanja; Baretić, Maja; Kucis, Vlasta

    2014-06-01

    Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy. Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology lists or list of "browsing phrases" useful in information retrieval or in document indexing. Specific terminology lists represent an intermediate step between free text search and controlled vocabulary, between user's demands and existing online resources in native and foreign language. The research aiming to detect the role of terminology in online resources, is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i) comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure.

  3. A usability evaluation of a SNOMED CT based compositional interface terminology for intensive care.

    PubMed

    Bakhshi-Raiez, F; de Keizer, N F; Cornet, R; Dorrepaal, M; Dongelmans, D; Jaspers, M W M

    2012-05-01

    To evaluate the usability of a large compositional interface terminology based on SNOMED CT and the terminology application for registration of the reasons for intensive care admission in a Patient Data Management System. Observational study with user-based usability evaluations before and 3 months after the system was implemented and routinely used. Usability was defined by five aspects: effectiveness, efficiency, learnability, overall user satisfaction, and experienced usability problems. Qualitative (the Think-Aloud user testing method) and quantitative (the System Usability Scale questionnaire and Time-on-Task analyses) methods were used to examine these usability aspects. The results of the evaluation study revealed that the usability of the interface terminology fell short (SUS scores before and after implementation of 47.2 out of 100 and 37.5 respectively out of 100). The qualitative measurements revealed a high number (n=35) of distinct usability problems, leading to ineffective and inefficient registration of reasons for admission. The effectiveness and efficiency of the system did not change over time. About 14% (n=5) of the revealed usability problems were related to the terminology content based on SNOMED CT, while the remaining 86% (n=30) was related to the terminology application. The problems related to the terminology content were more severe than the problems related to the terminology application. This study provides a detailed insight into how clinicians interact with a controlled compositional terminology through a terminology application. The extensiveness, complexity of the hierarchy, and the language usage of an interface terminology are defining for its usability. Carefully crafted domain-specific subsets and a well-designed terminology application are needed to facilitate the use of a complex compositional interface terminology based on SNOMED CT. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  4. MOPED enables discoveries through consistently processed proteomics data

    PubMed Central

    Higdon, Roger; Stewart, Elizabeth; Stanberry, Larissa; Haynes, Winston; Choiniere, John; Montague, Elizabeth; Anderson, Nathaniel; Yandl, Gregory; Janko, Imre; Broomall, William; Fishilevich, Simon; Lancet, Doron; Kolker, Natali; Kolker, Eugene

    2014-01-01

    The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org), is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration, as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project’s efforts to generate chromosome and diseases specific proteomes by providing links from proteins to chromosome and disease information, as well as many complementary resources. MOPED supports a new omics metadata checklist in order to harmonize data integration, analysis and use. MOPED’s development is driven by the user community, which spans 90 countries guiding future development that will transform MOPED into a multi-omics resource. MOPED encourages users to submit data in a simple format. They can use the metadata a checklist generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries. PMID:24350770

  5. Ontology patterns for tabular representations of biomedical knowledge on neglected tropical diseases

    PubMed Central

    Santana, Filipe; Schober, Daniel; Medeiros, Zulma; Freitas, Fred; Schulz, Stefan

    2011-01-01

    Motivation: Ontology-like domain knowledge is frequently published in a tabular format embedded in scientific publications. We explore the re-use of such tabular content in the process of building NTDO, an ontology of neglected tropical diseases (NTDs), where the representation of the interdependencies between hosts, pathogens and vectors plays a crucial role. Results: As a proof of concept we analyzed a tabular compilation of knowledge about pathogens, vectors and geographic locations involved in the transmission of NTDs. After a thorough ontological analysis of the domain of interest, we formulated a comprehensive design pattern, rooted in the biomedical domain upper level ontology BioTop. This pattern was implemented in a VBA script which takes cell contents of an Excel spreadsheet and transforms them into OWL-DL. After minor manual post-processing, the correctness and completeness of the ontology was tested using pre-formulated competence questions as description logics (DL) queries. The expected results could be reproduced by the ontology. The proposed approach is recommended for optimizing the acquisition of ontological domain knowledge from tabular representations. Availability and implementation: Domain examples, source code and ontology are freely available on the web at http://www.cin.ufpe.br/~ntdo. Contact: fss3@cin.ufpe.br PMID:21685092

  6. The Function Biomedical Informatics Research Network Data Repository.

    PubMed

    Keator, David B; van Erp, Theo G M; Turner, Jessica A; Glover, Gary H; Mueller, Bryon A; Liu, Thomas T; Voyvodic, James T; Rasmussen, Jerod; Calhoun, Vince D; Lee, Hyo Jong; Toga, Arthur W; McEwen, Sarah; Ford, Judith M; Mathalon, Daniel H; Diaz, Michele; O'Leary, Daniel S; Jeremy Bockholt, H; Gadde, Syam; Preda, Adrian; Wible, Cynthia G; Stern, Hal S; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G

    2016-01-01

    The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical data sets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 data set consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 T scanners. The FBIRN Phase 2 and Phase 3 data sets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN's multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. An Ebola virus-centered knowledge base

    PubMed Central

    Kamdar, Maulik R.; Dumontier, Michel

    2015-01-01

    Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard. Database URL: http://ebola.semanticscience.org. PMID:26055098

  8. An Ebola virus-centered knowledge base.

    PubMed

    Kamdar, Maulik R; Dumontier, Michel

    2015-01-01

    Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard. © The Author(s) 2015. Published by Oxford University Press.

  9. DEVELOPING THE TRANSDISCIPLINARY AGING RESEARCH AGENDA: NEW DEVELOPMENTS IN BIG DATA.

    PubMed

    Callaghan, Christian William

    2017-07-19

    In light of dramatic advances in big data analytics and the application of these advances in certain scientific fields, new potentialities exist for breakthroughs in aging research. Translating these new potentialities to research outcomes for aging populations, however, remains a challenge, as underlying technologies which have enabled exponential increases in 'big data' have not yet enabled a commensurate era of 'big knowledge,' or similarly exponential increases in biomedical breakthroughs. Debates also reveal differences in the literature, with some arguing big data analytics heralds a new era associated with the 'end of theory' or which makes the scientific method obsolete, where correlation supercedes causation, whereby science can advance without theory and hypotheses testing. On the other hand, others argue theory cannot be subordinate to data, no matter how comprehensive data coverage can ultimately become. Given these two tensions, namely between exponential increases in data absent exponential increases in biomedical research outputs, and between the promise of comprehensive data coverage and data-driven inductive versus theory-driven deductive modes of enquiry, this paper seeks to provide a critical review of certain theory and literature that offers useful perspectives of certain developments in big data analytics and their theoretical implications for aging research. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. Authors report lack of time as main reason for unpublished research presented at biomedical conferences: a systematic review.

    PubMed

    Scherer, Roberta W; Ugarte-Gil, Cesar; Schmucker, Christine; Meerpohl, Joerg J

    2015-07-01

    To systematically review reports that queried abstract authors about reasons for not subsequently publishing abstract results as full-length articles. Systematic review of MEDLINE, EMBASE, The Cochrane Library, ISI Web of Science, and study bibliographies for empirical studies in which investigators examined subsequent full publication of results presented at a biomedical conference and reasons for nonpublication. The mean full publication rate was 55.9% [95% confidence interval (CI): 54.8%, 56.9%] for 24 of 27 eligible reports providing this information and 73.0% (95% CI: 71.2%, 74.7%) for seven reports of abstracts describing clinical trials. Twenty-four studies itemized 1,831 reasons for nonpublication, and six itemized 428 reasons considered the most important reason. "Lack of time" was the most frequently reported reason [weighted average = 30.2% (95% CI: 27.9%, 32.4%)] and the most important reason [weighted average = 38.4% (95% CI: 33.7%, 43.2%)]. Other commonly stated reasons were "lack of time and/or resources," "publication not an aim," "low priority," "incomplete study," and "trouble with co-authors." Across medical specialties, the main reasons for not subsequently publishing an abstract in full lie with factors related to the abstract author rather than with journals. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

    PubMed Central

    2013-01-01

    Background The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. Results In this paper, we present our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services. We expose our model, services, prototype, and datasets at http://biotea.idiginfo.org/ Conclusions The semantic processing of biomedical literature presented in this paper embeds documents within the Web of Data and facilitates the execution of concept-based queries against the entire digital library. Our approach delivers a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. Our model delivers a semantically rich and highly interconnected dataset with self-describing content so that software can make effective use of it. PMID:23734622

  12. GEOGLE: context mining tool for the correlation between gene expression and the phenotypic distinction.

    PubMed

    Yu, Yao; Tu, Kang; Zheng, Siyuan; Li, Yun; Ding, Guohui; Ping, Jie; Hao, Pei; Li, Yixue

    2009-08-25

    In the post-genomic era, the development of high-throughput gene expression detection technology provides huge amounts of experimental data, which challenges the traditional pipelines for data processing and analyzing in scientific researches. In our work, we integrated gene expression information from Gene Expression Omnibus (GEO), biomedical ontology from Medical Subject Headings (MeSH) and signaling pathway knowledge from sigPathway entries to develop a context mining tool for gene expression analysis - GEOGLE. GEOGLE offers a rapid and convenient way for searching relevant experimental datasets, pathways and biological terms according to multiple types of queries: including biomedical vocabularies, GDS IDs, gene IDs, pathway names and signature list. Moreover, GEOGLE summarizes the signature genes from a subset of GDSes and estimates the correlation between gene expression and the phenotypic distinction with an integrated p value. This approach performing global searching of expression data may expand the traditional way of collecting heterogeneous gene expression experiment data. GEOGLE is a novel tool that provides researchers a quantitative way to understand the correlation between gene expression and phenotypic distinction through meta-analysis of gene expression datasets from different experiments, as well as the biological meaning behind. The web site and user guide of GEOGLE are available at: http://omics.biosino.org:14000/kweb/workflow.jsp?id=00020.

  13. Standardization of terminology in dermoscopy/dermatoscopy: Results of the third consensus conference of the International Society of Dermoscopy.

    PubMed

    Kittler, Harald; Marghoob, Ashfaq A; Argenziano, Giuseppe; Carrera, Cristina; Curiel-Lewandrowski, Clara; Hofmann-Wellenhof, Rainer; Malvehy, Josep; Menzies, Scott; Puig, Susana; Rabinovitz, Harold; Stolz, Wilhelm; Saida, Toshiaki; Soyer, H Peter; Siegel, Eliot; Stoecker, William V; Scope, Alon; Tanaka, Masaru; Thomas, Luc; Tschandl, Philipp; Zalaudek, Iris; Halpern, Allan

    2016-06-01

    Evolving dermoscopic terminology motivated us to initiate a new consensus. We sought to establish a dictionary of standardized terms. We reviewed the medical literature, conducted a survey, and convened a discussion among experts. Two competitive terminologies exist, a more metaphoric terminology that includes numerous terms and a descriptive terminology based on 5 basic terms. In a survey among members of the International Society of Dermoscopy (IDS) 23.5% (n = 201) participants preferentially use descriptive terminology, 20.1% (n = 172) use metaphoric terminology, and 484 (56.5%) use both. More participants who had been initially trained by metaphoric terminology prefer using descriptive terminology than vice versa (9.7% vs 2.6%, P < .001). Most new terms that were published since the last consensus conference in 2003 were unknown to the majority of the participants. There was uniform consensus that both terminologies are suitable, that metaphoric terms need definitions, that synonyms should be avoided, and that the creation of new metaphoric terms should be discouraged. The expert panel proposed a dictionary of standardized terms taking account of metaphoric and descriptive terms. A consensus seeks a workable compromise but does not guarantee its implementation. The new consensus provides a revised framework of standardized terms to enhance the consistent use of dermoscopic terminology. Copyright © 2015 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.

  14. Standardization of terminology in dermoscopy/dermatoscopy: Results of the third consensus conference of the International Society of Dermoscopy

    PubMed Central

    Kittler, Harald; Marghoob, Ashfaq A.; Argenziano, Giuseppe; Carrera, Cristina; Curiel-Lewandrowski, Clara; Hofmann-Wellenhof, Rainer; Malvehy, Josep; Menzies, Scott; Puig, Susana; Rabinovitz, Harold; Stolz, Wilhelm; Saida, Toshiaki; Soyer, H. Peter; Siegel, Eliot; Stoecker, William V.; Scope, Alon; Tanaka, Masaru; Thomas, Luc; Tschandl, Philipp; Zalaudek, Iris; Halpern, Allan

    2017-01-01

    Background Evolving dermoscopic terminology motivated us to initiate a new consensus. Objective We sought to establish a dictionary of standardized terms. Methods We reviewed the medical literature, conducted a survey, and convened a discussion among experts. Results Two competitive terminologies exist, a more metaphoric terminology that includes numerous terms and a descriptive terminology based on 5 basic terms. In a survey among members of the International Society of Dermoscopy (IDS) 23.5% (n = 201) participants preferentially use descriptive terminology, 20.1% (n = 172) use metaphoric terminology, and 484 (56.5%) use both. More participants who had been initially trained by metaphoric terminology prefer using descriptive terminology than vice versa (9.7% vs 2.6%, P < .001). Most new terms that were published since the last consensus conference in 2003 were unknown to the majority of the participants. There was uniform consensus that both terminologies are suitable, that metaphoric terms need definitions, that synonyms should be avoided, and that the creation of new metaphoric terms should be discouraged. The expert panel proposed a dictionary of standardized terms taking account of metaphoric and descriptive terms. Limitations A consensus seeks a workable compromise but does not guarantee its implementation. Conclusion The new consensus provides a revised framework of standardized terms to enhance the consistent use of dermoscopic terminology. PMID:26896294

  15. TERMTrial--terminology-based documentation systems for cooperative clinical trials.

    PubMed

    Merzweiler, A; Weber, R; Garde, S; Haux, R; Knaup-Gregori, P

    2005-04-01

    Within cooperative groups of multi-center clinical trials a standardized documentation is a prerequisite for communication and sharing of data. Standardizing documentation systems means standardizing the underlying terminology. The management and consistent application of terminology systems is a difficult and fault-prone task, which should be supported by appropriate software tools. Today, documentation systems for clinical trials are often implemented as so-called Remote-Data-Entry-Systems (RDE-systems). Although there are many commercial systems, which support the development of RDE-systems there is none offering a comprehensive terminological support. Therefore, we developed the software system TERMTrial which consists of a component for the definition and management of terminology systems for cooperative groups of clinical trials and two components for the terminology-based automatic generation of trial databases and terminology-based interactive design of electronic case report forms (eCRFs). TERMTrial combines the advantages of remote data entry with a comprehensive terminological control.

  16. Using multi-terminology indexing for the assignment of MeSH descriptors to health resources in a French online catalogue.

    PubMed

    Pereira, Suzanne; Névéol, Aurélie; Kerdelhué, Gaétan; Serrot, Elisabeth; Joubert, Michel; Darmoni, Stéfan J

    2008-11-06

    To assist with the development of a French online quality-controlled health gateway(CISMeF), an automatic indexing tool assigning MeSH descriptors to medical text in French was created. The French Multi-Terminology Indexer (FMTI) relies on a multi-terminology approach involving four prominent medical terminologies and the mappings between them. In this paper,we compare lemmatization and stemming as methods to process French medical text for indexing. We also evaluate the multi-terminology approach implemented in F-MTI. The indexing strategies were assessed on a corpus of 18,814 resources indexed manually. There is little difference in the indexing performance when lemmatization or stemming is used. However, the multi-terminology approach outperforms indexing relying on a single terminology in terms of recall. F-MTI will soon be used in the CISMeF production environment and in a Health MultiTerminology Server in French.

  17. Assessment of incidental learning of medical terminology in a veterinary curriculum.

    PubMed

    Ainsworth, A Jerald; Hardin, Laura; Robertson, Stanley

    2007-01-01

    The objective of this study was to determine whether students in a veterinary curriculum at Mississippi State University would gain an understanding of medical terminology, as they matriculate through their courses, comparable to that obtained during a focused medical terminology unit of study. Evaluation of students' incidental learning related to medical terminology during the 2004/2005 and 2005/2006 academic years indicated that 88.7% and 81.9% of students, respectively, scored above 70% on a medical terminology exam by the end of the first year of the curriculum. For the 2004/2005 academic, 67.6% increased their percentage of correct answers above 70% from the first medical terminology exam to the third. For the 2005/2006 academic year, 61.1% of students increased their score above 70% from the first to the third exam. Our data indicate that students can achieve comprehension of medical terminology in the absence of a formal terminology course.

  18. SchizConnect: Mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration.

    PubMed

    Wang, Lei; Alpert, Kathryn I; Calhoun, Vince D; Cobia, Derin J; Keator, David B; King, Margaret D; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D; Potkin, Steven G; Turner, Jessica A; Ambite, Jose Luis

    2016-01-01

    SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation--translating across data sources--so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information is being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Hippocampome.org: a knowledge base of neuron types in the rodent hippocampus

    PubMed Central

    Wheeler, Diek W; White, Charise M; Rees, Christopher L; Komendantov, Alexander O; Hamilton, David J; Ascoli, Giorgio A

    2015-01-01

    Hippocampome.org is a comprehensive knowledge base of neuron types in the rodent hippocampal formation (dentate gyrus, CA3, CA2, CA1, subiculum, and entorhinal cortex). Although the hippocampal literature is remarkably information-rich, neuron properties are often reported with incompletely defined and notoriously inconsistent terminology, creating a formidable challenge for data integration. Our extensive literature mining and data reconciliation identified 122 neuron types based on neurotransmitter, axonal and dendritic patterns, synaptic specificity, electrophysiology, and molecular biomarkers. All ∼3700 annotated properties are individually supported by specific evidence (∼14,000 pieces) in peer-reviewed publications. Systematic analysis of this unprecedented amount of machine-readable information reveals novel correlations among neuron types and properties, the potential connectivity of the full hippocampal circuitry, and outstanding knowledge gaps. User-friendly browsing and online querying of Hippocampome.org may aid design and interpretation of both experiments and simulations. This powerful, simple, and extensible neuron classification endeavor is unique in its detail, utility, and completeness. DOI: http://dx.doi.org/10.7554/eLife.09960.001 PMID:26402459

  20. A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data.

    PubMed

    Tao, Cui; Jiang, Guoqian; Oniki, Thomas A; Freimuth, Robert R; Zhu, Qian; Sharma, Deepak; Pathak, Jyotishman; Huff, Stanley M; Chute, Christopher G

    2013-05-01

    The clinical element model (CEM) is an information model designed for representing clinical information in electronic health records (EHR) systems across organizations. The current representation of CEMs does not support formal semantic definitions and therefore it is not possible to perform reasoning and consistency checking on derived models. This paper introduces our efforts to represent the CEM specification using the Web Ontology Language (OWL). The CEM-OWL representation connects the CEM content with the Semantic Web environment, which provides authoring, reasoning, and querying tools. This work may also facilitate the harmonization of the CEMs with domain knowledge represented in terminology models as well as other clinical information models such as the openEHR archetype model. We have created the CEM-OWL meta ontology based on the CEM specification. A convertor has been implemented in Java to automatically translate detailed CEMs from XML to OWL. A panel evaluation has been conducted, and the results show that the OWL modeling can faithfully represent the CEM specification and represent patient data.

  1. A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data

    PubMed Central

    Tao, Cui; Jiang, Guoqian; Oniki, Thomas A; Freimuth, Robert R; Zhu, Qian; Sharma, Deepak; Pathak, Jyotishman; Huff, Stanley M; Chute, Christopher G

    2013-01-01

    The clinical element model (CEM) is an information model designed for representing clinical information in electronic health records (EHR) systems across organizations. The current representation of CEMs does not support formal semantic definitions and therefore it is not possible to perform reasoning and consistency checking on derived models. This paper introduces our efforts to represent the CEM specification using the Web Ontology Language (OWL). The CEM-OWL representation connects the CEM content with the Semantic Web environment, which provides authoring, reasoning, and querying tools. This work may also facilitate the harmonization of the CEMs with domain knowledge represented in terminology models as well as other clinical information models such as the openEHR archetype model. We have created the CEM-OWL meta ontology based on the CEM specification. A convertor has been implemented in Java to automatically translate detailed CEMs from XML to OWL. A panel evaluation has been conducted, and the results show that the OWL modeling can faithfully represent the CEM specification and represent patient data. PMID:23268487

  2. Standard terminology in the laboratory and classroom

    NASA Technical Reports Server (NTRS)

    Strehlow, Richard A.

    1992-01-01

    Each of the materials produced by modern technologists is associated with a family of immaterials--all the concepts of substance, process, and purpose. It is concepts that are essential to transfer knowledge. It is concepts that are the stuff of terminology. Terminology is standardized today by companies, standards organizations, governments, and other groups. Simply described, it is the pre-negotiation of the meanings of terms. Terminology has become a key issue in businesses, and terminology knowledge is essential in understanding the modern world. The following is a introductory workshop discussing the concepts of terminology and methods of its standardization.

  3. Standard Anatomic Terminologies: Comparison for Use in a Health Information Exchange–Based Prior Computed Tomography (CT) Alerting System

    PubMed Central

    Lowry, Tina; Vreeman, Daniel J; Loo, George T; Delman, Bradley N; Thum, Frederick L; Slovis, Benjamin H; Shapiro, Jason S

    2017-01-01

    Background A health information exchange (HIE)–based prior computed tomography (CT) alerting system may reduce avoidable CT imaging by notifying ordering clinicians of prior relevant studies when a study is ordered. For maximal effectiveness, a system would alert not only for prior same CTs (exams mapped to the same code from an exam name terminology) but also for similar CTs (exams mapped to different exam name terminology codes but in the same anatomic region) and anatomically proximate CTs (exams in adjacent anatomic regions). Notification of previous same studies across an HIE requires mapping of local site CT codes to a standard terminology for exam names (such as Logical Observation Identifiers Names and Codes [LOINC]) to show that two studies with different local codes and descriptions are equivalent. Notifying of prior similar or proximate CTs requires an additional mapping of exam codes to anatomic regions, ideally coded by an anatomic terminology. Several anatomic terminologies exist, but no prior studies have evaluated how well they would support an alerting use case. Objective The aim of this study was to evaluate the fitness of five existing standard anatomic terminologies to support similar or proximate alerts of an HIE-based prior CT alerting system. Methods We compared five standard anatomic terminologies (Foundational Model of Anatomy, Systematized Nomenclature of Medicine Clinical Terms, RadLex, LOINC, and LOINC/Radiological Society of North America [RSNA] Radiology Playbook) to an anatomic framework created specifically for our use case (Simple ANatomic Ontology for Proximity or Similarity [SANOPS]), to determine whether the existing terminologies could support our use case without modification. On the basis of an assessment of optimal terminology features for our purpose, we developed an ordinal anatomic terminology utility classification. We mapped samples of 100 random and the 100 most frequent LOINC CT codes to anatomic regions in each terminology, assigned utility classes for each mapping, and statistically compared each terminology’s utility class rankings. We also constructed seven hypothetical alerting scenarios to illustrate the terminologies’ differences. Results Both RadLex and the LOINC/RSNA Radiology Playbook anatomic terminologies ranked significantly better (P<.001) than the other standard terminologies for the 100 most frequent CTs, but no terminology ranked significantly better than any other for 100 random CTs. Hypothetical scenarios illustrated instances where no standard terminology would support appropriate proximate or similar alerts, without modification. Conclusions LOINC/RSNA Radiology Playbook and RadLex’s anatomic terminologies appear well suited to support proximate or similar alerts for commonly ordered CTs, but for less commonly ordered tests, modification of the existing terminologies with concepts and relations from SANOPS would likely be required. Our findings suggest SANOPS may serve as a framework for enhancing anatomic terminologies in support of other similar use cases. PMID:29242174

  4. Kaiser Permanente's Convergent Medical Terminology.

    PubMed

    Dolin, Robert H; Mattison, John E; Cohn, Simon; Campbell, Keith E; Wiesenthal, Andrew M; Hochhalter, Brad; LaBerge, Diane; Barsoum, Rita; Shalaby, James; Abilla, Alan; Clements, Robert J; Correia, Carol M; Esteva, Diane; Fedack, John M; Goldberg, Bruce J; Gopalarao, Sridhar; Hafeza, Eza; Hendler, Peter; Hernandez, Enrique; Kamangar, Ron; Kahn, Rafique A; Kurtovich, Georgina; Lazzareschi, Gerry; Lee, Moon H; Lee, Tracy; Levy, David; Lukoff, Jonathan Y; Lundberg, Cyndie; Madden, Michael P; Ngo, Trongtu L; Nguyen, Ben T; Patel, Nikhilkumar P; Resneck, Jim; Ross, David E; Schwarz, Kathleen M; Selhorst, Charles C; Snyder, Aaron; Umarji, Mohamed I; Vilner, Max; Zer-Chen, Roy; Zingo, Chris

    2004-01-01

    This paper describes Kaiser Permanente's (KP) enterprise-wide medical terminology solution, referred to as our Convergent Medical Terminology (CMT). Initially developed to serve the needs of a regional electronic health record, CMT has evolved into a core KP asset, serving as the common terminology across all applications. CMT serves as the definitive source of concept definitions for the organization, provides a consistent structure and access method to all codes used by the organization, and is KP's language of interoperability, with cross-mappings to regional ancillary systems and administrative billing codes. The core of CMT is comprised of SNOMED CT, laboratory LOINC, and First DataBank drug terminology. These are integrated into a single poly-hierarchically structured knowledge base. Cross map sets provide bi-directional translations between CMT and ancillary applications and administrative billing codes. Context sets provide subsets of CMT for use in specific contexts. Our experience with CMT has lead us to conclude that a successful terminology solution requires that: (1) usability considerations are an organizational priority; (2) "interface" terminology is differentiated from "reference" terminology; (3) it be easy for clinicians to find the concepts they need; (4) the immediate value of coded data be apparent to clinician user; (5) there be a well defined approach to terminology extensions. Over the past several years, there has been substantial progress made in the domain coverage and standardization of medical terminology. KP has learned to exploit that terminology in ways that are clinician-acceptable and that provide powerful options for data analysis and reporting.

  5. Ontology-Based Search of Genomic Metadata.

    PubMed

    Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano

    2016-01-01

    The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

  6. Recent Trends in Nanotechnology-Based Drugs and Formulations for Targeted Therapeutic Delivery.

    PubMed

    Iqbal, Hafiz M N; Rodriguez, Angel M V; Khandia, Rekha; Munjal, Ashok; Dhama, Kuldeep

    2017-01-01

    In the recent past, a wider spectrum of nanotechnologybased drugs or drug-loaded devices and systems has been engineered and investigated with high interests. The key objective is to help for an enhanced/better quality of patient life in a secure way by avoiding/limiting drug abuse, or severe adverse effects of some in practice traditional therapies. Various methodological approaches including in vitro, in vivo, and ex vivo techniques have been exploited, so far. Among them, nanoparticles-based therapeutic agents are of supreme interests for an enhanced and efficient delivery in the current biomedical sector of the modern world. The development of new types of novel, effective and highly reliable therapeutic drug delivery system (DDS) for multipurpose applications is essential and a core demand to tackle many human health related diseases. In this context, nanotechnology-based several advanced DDS have been engineered with novel characteristics for biomedical, pharmaceutical and cosmeceutical applications that include but not limited to the enhanced/improved bioactivity, bioavailability, drug efficacy, targeted delivery, and therapeutically safer with an extra advantage of overcoming demerits of traditional drug formulations/designs. This review work is focused on recent trends/advances in nanotechnology-based drugs and formulations designed for targeted therapeutic delivery. Moreover, information is also reviewed and given from recent patents and summarized or illustrated diagrammatically to depict a better understanding. Recent patents covering various nanotechnology-based approaches for several applications have also been reviewed. The drug-loaded nanoparticles are among versatile candidates with multifunctional characteristics for potential applications in biomedical, and tissue engineering sector. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. CLO: The cell line ontology

    PubMed Central

    2014-01-01

    Background Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions. Construction and content Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms. Utility and discussion The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development. PMID:25852852

  8. Architecture for biomedical multimedia information delivery on the World Wide Web

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Goh, Gin-Hua; Neve, Leif; Thoma, George R.

    1997-10-01

    Research engineers at the National Library of Medicine are building a prototype system for the delivery of multimedia biomedical information on the World Wide Web. This paper discuses the architecture and design considerations for the system, which will be used initially to make images and text from the third National Health and Nutrition Examination Survey (NHANES) publicly available. We categorized our analysis as follows: (1) fundamental software tools: we analyzed trade-offs among use of conventional HTML/CGI, X Window Broadway, and Java; (2) image delivery: we examined the use of unconventional TCP transmission methods; (3) database manager and database design: we discuss the capabilities and planned use of the Informix object-relational database manager and the planned schema for the HNANES database; (4) storage requirements for our Sun server; (5) user interface considerations; (6) the compatibility of the system with other standard research and analysis tools; (7) image display: we discuss considerations for consistent image display for end users. Finally, we discuss the scalability of the system in terms of incorporating larger or more databases of similar data, and the extendibility of the system for supporting content-based retrieval of biomedical images. The system prototype is called the Web-based Medical Information Retrieval System. An early version was built as a Java applet and tested on Unix, PC, and Macintosh platforms. This prototype used the MiniSQL database manager to do text queries on a small database of records of participants in the second NHANES survey. The full records and associated x-ray images were retrievable and displayable on a standard Web browser. A second version has now been built, also a Java applet, using the MySQL database manager.

  9. Marine-Derived Bioactive Peptides for Biomedical Sectors: A Review.

    PubMed

    Ruiz-Ruiz, Federico; Mancera-Andrade, Elena I; Iqbal, Hafiz M N

    2017-01-01

    Marine-based resources such as algae and other marine by-products have been recognized as rich sources of structurally diverse bioactive peptides. Evidently, their structural characteristics including unique amino acid residues are responsible for their biological activity. Several of the above-mentioned marine-origin species show multi-functional bioactivities that are useful for a new discovery and/or reinvention of biologically active ingredients, nutraceuticals and/or pharmaceuticals. Therefore, in recent years, marine-derived bioactive peptides have gained a considerable attention with high-value biomedical and/or pharmaceutical potentials. Furthermore, a wider spectrum of bioactive peptides can be produced through proteolytic-assisted hydrolysis of various marine resources under controlled physicochemical (pH and temperature of the reaction media) environment. Owing to their numerous health-related beneficial effects and therapeutic potential in the treatment and/or prevention of many diseases, such marine-derived bioactive peptides exhibit a wider spectrum of biological activities such as anti-cancerous, anti-proliferative, anti-coagulant, antibacterial, antifungal, and anti-tumor activities among many others. Based on emerging evidence of marine-derived peptide mining, the above-mentioned marine resources contain noteworthy levels of high-value protein. The present review article mainly summarizes the marine-derived bioactive peptides and emphasizing their potential applications in biomedical and/or pharmaceutical sectors of the modern world. In conclusion, recent literature has provided evidence that marine-derived bioactive peptides play a critical role in human health along with many possibilities of designing new functional nutraceuticals and/or pharmaceuticals to clarify potent mechanisms of action for a wider spectrum of diseases. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. The "Terminology Market."

    ERIC Educational Resources Information Center

    Galinski, Christian

    This paper examines needs, resources, and trends in the computer-based development of field-specific terminologies in varied languages. The range of special terminologies, their users, and their producers is noted, and the kinds of resources produced (data and tools) are outlined. Data types include: terminological information proper (information…

  11. Terminology Manual.

    ERIC Educational Resources Information Center

    Felber, Helmut

    A product of the International Information Center for Terminology (Infoterm), this manual is designed to serve as a reference tool for practitioners active in terminology work and documentation. The manual explores the basic ideas of the Vienna School of Terminology and explains developments in the area of applied computer aided terminography…

  12. Using multi-terminology indexing for the assignment of MeSH descriptors to health resources in a French online catalogue

    PubMed Central

    Pereira, Suzanne; Névéol, Aurélie; Kerdelhué, Gaétan; Serrot, Elisabeth; Joubert, Michel; Darmoni, Stéfan J.

    2008-01-01

    Background: To assist with the development of a French online quality-controlled health gateway (CISMeF), an automatic indexing tool assigning MeSH descriptors to medical text in French was created. The French Multi-Terminology Indexer (F-MTI) relies on a multi-terminology approach involving four prominent medical terminologies and the mappings between them. Objective: In this paper, we compare lemmatization and stemming as methods to process French medical text for indexing. We also evaluate the multi-terminology approach implemented in F-MTI. Methods: The indexing strategies were assessed on a corpus of 18,814 resources indexed manually. Results: There is little difference in the indexing performance when lemmatization or stemming is used. However, the multi-terminology approach outperforms indexing relying on a single terminology in terms of recall. Conclusion: F-MTI will soon be used in the CISMeF production environment and in a Health MultiTerminology Server in French. PMID:18998933

  13. Partitioning an object-oriented terminology schema.

    PubMed

    Gu, H; Perl, Y; Halper, M; Geller, J; Kuo, F; Cimino, J J

    2001-07-01

    Controlled medical terminologies are increasingly becoming strategic components of various healthcare enterprises. However, the typical medical terminology can be difficult to exploit due to its extensive size and high density. The schema of a medical terminology offered by an object-oriented representation is a valuable tool in providing an abstract view of the terminology, enhancing comprehensibility and making it more usable. However, schemas themselves can be large and unwieldy. We present a methodology for partitioning a medical terminology schema into manageably sized fragments that promote increased comprehension. Our methodology has a refinement process for the subclass hierarchy of the terminology schema. The methodology is carried out by a medical domain expert in conjunction with a computer. The expert is guided by a set of three modeling rules, which guarantee that the resulting partitioned schema consists of a forest of trees. This makes it easier to understand and consequently use the medical terminology. The application of our methodology to the schema of the Medical Entities Dictionary (MED) is presented.

  14. Leveraging Terminologies for Retrieval of Radiology Reports with Critical Imaging Findings

    PubMed Central

    Warden, Graham I.; Lacson, Ronilda; Khorasani, Ramin

    2011-01-01

    Introduction: Communication of critical imaging findings is an important component of medical quality and safety. A fundamental challenge includes retrieval of radiology reports that contain these findings. This study describes the expressiveness and coverage of existing medical terminologies for critical imaging findings and evaluates radiology report retrieval using each terminology. Methods: Four terminologies were evaluated: National Cancer Institute Thesaurus (NCIT), Radiology Lexicon (RadLex), Systemized Nomenclature of Medicine (SNOMED-CT), and International Classification of Diseases (ICD-9-CM). Concepts in each terminology were identified for 10 critical imaging findings. Three findings were subsequently selected to evaluate document retrieval. Results: SNOMED-CT consistently demonstrated the highest number of overall terms (mean=22) for each of ten critical findings. However, retrieval rate and precision varied between terminologies for the three findings evaluated. Conclusion: No single terminology is optimal for retrieving radiology reports with critical findings. The expressiveness of a terminology does not consistently correlate with radiology report retrieval. PMID:22195212

  15. Use artificial neural network to align biological ontologies.

    PubMed

    Huang, Jingshan; Dang, Jiangbo; Huhns, Michael N; Zheng, W Jim

    2008-09-16

    Being formal, declarative knowledge representation models, ontologies help to address the problem of imprecise terminologies in biological and biomedical research. However, ontologies constructed under the auspices of the Open Biomedical Ontologies (OBO) group have exhibited a great deal of variety, because different parties can design ontologies according to their own conceptual views of the world. It is therefore becoming critical to align ontologies from different parties. During automated/semi-automated alignment across biological ontologies, different semantic aspects, i.e., concept name, concept properties, and concept relationships, contribute in different degrees to alignment results. Therefore, a vector of weights must be assigned to these semantic aspects. It is not trivial to determine what those weights should be, and current methodologies depend a lot on human heuristics. In this paper, we take an artificial neural network approach to learn and adjust these weights, and thereby support a new ontology alignment algorithm, customized for biological ontologies, with the purpose of avoiding some disadvantages in both rule-based and learning-based aligning algorithms. This approach has been evaluated by aligning two real-world biological ontologies, whose features include huge file size, very few instances, concept names in numerical strings, and others. The promising experiment results verify our proposed hypothesis, i.e., three weights for semantic aspects learned from a subset of concepts are representative of all concepts in the same ontology. Therefore, our method represents a large leap forward towards automating biological ontology alignment.

  16. Translational research: understanding the continuum from bench to bedside.

    PubMed

    Drolet, Brian C; Lorenzi, Nancy M

    2011-01-01

    The process of translating basic scientific discoveries to clinical applications, and ultimately to public health improvements, has emerged as an important, but difficult, objective in biomedical research. The process is best described as a "translation continuum" because various resources and actions are involved in this progression of knowledge, which advances discoveries from the bench to the bedside. The current model of this continuum focuses primarily on translational research, which is merely one component of the overall translation process. This approach is ineffective. A revised model to address the entire continuum would provide a methodology to identify and describe all translational activities (eg, implementation, adoption translational research, etc) as well their place within the continuum. This manuscript reviews and synthesizes the literature to provide an overview of the current terminology and model for translation. A modification of the existing model is proposed to create a framework called the Biomedical Research Translation Continuum, which defines the translation process and describes the progression of knowledge from laboratory to health gains. This framework clarifies translation for readers who have not followed the evolving and complicated models currently described. Authors and researchers may use the continuum to understand and describe their research better as well as the translational activities within a conceptual framework. Additionally, the framework may increase the advancement of knowledge by refining discussions of translation and allowing more precise identification of barriers to progress. Copyright © 2011 Mosby, Inc. All rights reserved.

  17. Revised terminology for cervical histopathology and its implications for management of high-grade squamous intraepithelial lesions of the cervix.

    PubMed

    Waxman, Alan G; Chelmow, David; Darragh, Teresa M; Lawson, Herschel; Moscicki, Anna-Barbara

    2012-12-01

    In March 2012, the College of American Pathologists and American Society for Colposcopy and Cervical Pathology, in collaboration with 35 stakeholder organizations, convened a consensus conference called the Lower Anogenital Squamous Terminology (LAST) Project. The recommendations of this project include using a uniform, two-tiered terminology to describe the histology of human papillomavirus-associated squamous disease across all anogenital tract tissues: vulva, vagina, cervix, penis, perianus, and anus. The recommended terminology is "low-grade" or "high-grade squamous intraepithelial lesion (SIL)." This terminology is familiar to clinicians, because it parallels the terminology of the Bethesda System cytologic reports. Biopsy results using SIL terminology may be further qualified using "intraepithelial neoplasia" (IN) terminology in parentheses. Laboratory p16 tissue immunostaining is recommended to better classify histopathology lesions that morphologically would earlier have been diagnosed as IN 2. p16 is also recommended for differentiating between high-grade squamous intraepithelial lesions and benign mimics. The LAST Project recommendations potentially affect the application of current guidelines for managing cervical squamous intraepithelial lesions. The authors offer interim guidance for managing cervical lesions diagnosed using this new terminology with special attention paid to managing young women with cervical high-grade squamous intraepithelial lesions on biopsy. Clinicians should be aware of the LAST Project recommendations, which include important changes from prior terminology.

  18. Revised Terminology for Cervical Histopathology and Its Implications for Management of High-Grade Squamous Intraepithelial Lesions of the Cervix

    PubMed Central

    Waxman, Alan G.; Chelmow, David; Darragh, Teresa M.; Lawson, Herschel; Moscicki, Anna-Barbara

    2014-01-01

    In March 2012, the College of American Pathologists and American Society for Colposcopy and Cervical Pathology, in collaboration with 35 stakeholder organizations, convened a consensus conference called the Lower Anogenital Squamous Terminology (LAST) Project. The recommendations of this project include using a uniform, two-tiered terminology to describe the histology of human papillomavirus-associated squamous disease across all anogenital tract tissues: vulva, vagina, cervix, penis, perianus, and anus. The recommended terminology is “low-grade” or “high-grade squamous intraepithelial lesion (SIL).” This terminology is familiar to clinicians, because it parallels the terminology of the Bethesda System cytologic reports. Biopsy results using SIL terminology may be further qualified using “intraepithelial neoplasia” (IN) terminology in parentheses. Laboratory p16 tissue immunostaining is recommended to better classify histopathology lesions that morphologically would earlier have been diagnosed as IN 2. p16 is also recommended for differentiating between high-grade squamous intraepithelial lesions and benign mimics. The LAST Project recommendations potentially affect the application of current guidelines for managing cervical squamous intraepithelial lesions. The authors offer interim guidance for managing cervical lesions diagnosed using this new terminology with special attention paid to managing young women with cervical high-grade squamous intraepithelial lesions on biopsy. Clinicians should be aware of the LAST Project recommendations, which include important changes from prior terminology. PMID:23168774

  19. Automated extraction and semantic analysis of mutation impacts from the biomedical literature

    PubMed Central

    2012-01-01

    Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648

  20. Application of Fused Deposition Modelling (FDM) Method of 3D Printing in Drug Delivery.

    PubMed

    Long, Jingjunjiao; Gholizadeh, Hamideh; Lu, Jun; Bunt, Craig; Seyfoddin, Ali

    2017-01-01

    Three-dimensional (3D) printing is an emerging manufacturing technology for biomedical and pharmaceutical applications. Fused deposition modelling (FDM) is a low cost extrusion-based 3D printing technique that can deposit materials layer-by-layer to create solid geometries. This review article aims to provide an overview of FDM based 3D printing application in developing new drug delivery systems. The principle methodology, suitable polymers and important parameters in FDM technology and its applications in fabrication of personalised tablets and drug delivery devices are discussed in this review. FDM based 3D printing is a novel and versatile manufacturing technique for creating customised drug delivery devices that contain accurate dose of medicine( s) and provide controlled drug released profiles. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

Top