Intertwining thesauri and dictionaries
NASA Technical Reports Server (NTRS)
Buchan, R. L.
1989-01-01
The use of dictionaries and thesauri in information retrieval is discussed. The structure and functions of thesauri and dictionaries are described. Particular attention is given to the format of the NASA Thesaurus. The relationship between thesauri and dictionaries, the need to regularize terminology, and the capitalization of words are examined.
Multilingual Thesauri for the Modern World - No Ideal Solution?
ERIC Educational Resources Information Center
Jorna, Kerstin; Davies, Sylvie
2001-01-01
Discusses thesauri as tools for multilingual information retrieval and cross-cultural communication. Considers the need for multilingual thesauri and the importance of explicit conceptual structures, and introduces a pilot thesaurus, InfoDEFT (Information Deutsch-English-Francais Thesaurus), as a possible model for new online thesauri which are…
NASA Astrophysics Data System (ADS)
Mortier, S.; Van Daele, K.; Meganck, L.
2017-08-01
Heritage organizations in Flanders started using thesauri fairly recently compared to other countries. This paper starts with examining the historical use of thesauri and controlled vocabularies in computer systems by the Flemish Government dealing with immovable cultural heritage. Their evolution from simple, flat, controlled lists to actual thesauri with scope notes, hierarchical and equivalence relations and links to other thesauri will be discussed. An explanation will be provided for the evolution in our approach to controlled vocabularies, and how they radically changed querying and the way data is indexed in our systems. Technical challenges inherent to complex thesauri and how to overcome them will be outlined. These issues being solved, thesauri have become an essential feature of the Flanders Heritage inventory management system. The number of vocabularies rose over the years and became an essential tool for integrating heritage from different disciplines. As a final improvement, thesauri went from being a core part of one application (the inventory management system) to forming an essential part of a new general resource oriented system architecture for Flanders Heritage influenced by Linked Data. For this purpose, a generic SKOS based editor was created. Due to the SKOS model being generic enough to be used outside of Flanders Heritage, the decision was made early on to develop this editor as an open source project called Atramhasis and share it with the larger heritage world.
Content Classification: Leveraging New Tools and Librarians' Expertise.
ERIC Educational Resources Information Center
Starr, Jennie
1999-01-01
Presents factors for librarians to consider when decision-making about information retrieval. Discusses indexing theory; thesauri aids; controlled vocabulary or thesauri to increase access; humans versus machines; automated tools; product evaluations and evaluation criteria; automated classification tools; content server products; and document…
Thesauri Used by SLA Documentation Division Members.
ERIC Educational Resources Information Center
Pope, Nolan F.; And Others
This bibliography lists 115 citations for thesauri most frequently used by members of the Special Libraries Association (SLA) Documentation Division. Entries are arranged alphabetically by author/corporate author, followed by title, imprint and/or alternate source of availability if known, date, pagination, and subject index terms assigned…
ERIC Educational Resources Information Center
Bartol, Tomaz
2012-01-01
Purpose: The paper aims to assess the utility of non-agriculture-specific information systems, databases, and respective controlled vocabularies (thesauri) in organising and retrieving agricultural information. The purpose is to identify thesaurus-linked tree structures, controlled subject headings/terms (heading words, descriptors), and principal…
ERIC Educational Resources Information Center
Gaffuri, Ann
A practicum was developed to expand written vocabulary for third graders through training in using a data base and brainstorming strategies. Individual thesauri were written and published to demonstrate the results of collecting vocabulary and applying it to specific topics. Daily process writing became an integral part of the curriculum. Class…
Guidelines for the Establishment and Development of Monolingual Thesauri. Second Revised Edition.
ERIC Educational Resources Information Center
Austin, Derek; Dale, Peter
Intended to facilitate the preparation of thesauri for the indexing of documents in any subject field, this report presents a set of rules pertaining to the complex issues and questions involved in indexing vocabulary development and construction. After a brief discussion of the nature, functions, and components of indexing vocabularies, a set of…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schatz, B.R.; Johnson, E.H.; Cochrane, P.A.
The basic problem in information retrieval is that large-scale searches can only match terms specified by the user to terms appearing in documents in the digital library collection. Intermediate sources that support term suggestion can thus enhance retrieval by providing alternative search terms for the user. Term suggestion increases the recall, while interaction enables the user to attempt to not decrease the precision. We are building a prototype user interface that will become the Web interface for the University of Illinois Digital Library Initiative (DLI) testbed. It supports the principal of multiple views, where different kinds of term suggestors canmore » be used to complement search and each other. This paper discusses its operation with two complementary term suggestors, subject thesauri and co-occurrence lists, and compared their utility. Thesauri are generated by human indexers and place selected terms in a subject hierarchy. Co-occurrence lists are generated by computer and place all terms in frequency order of occurrence together. This paper concludes with a discussion of how multiple views can help provide good quality Search for the Net. This is a paper about the design of a retrieval system prototype that allows users to simultaneously combine terms offered by different suggestion techniques, not about comparing the merits of each in a systematic and controlled way. It offers no experimental results.« less
Computer-Based Indexing on a Small Scale: Bibliography.
ERIC Educational Resources Information Center
Douglas, Kimberly; Wismer, Don
The 131 references on small scale computer-based indexing cited in this bibliography are subdivided as follows: general, general (computer), index structure, microforms, specific systems, KWIC KWAC KWOC, and thesauri. (RAA)
Benchmarking Ontologies: Bigger or Better?
Yao, Lixia; Divoli, Anna; Mayzus, Ilya; Evans, James A.; Rzhetsky, Andrey
2011-01-01
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them. PMID:21249231
ERIC Educational Resources Information Center
Proceedings of the ASIS Annual Meeting, 1993
1993-01-01
Presents abstracts of 34 special interest group (SIG) sessions. Highlights include humanities scholars and electronic texts; information retrieval and indexing systems design; automated indexing; domain analysis; query expansion in document retrieval systems; thesauri; business intelligence; Americans with Disabilities Act; management;…
Multi-Agent Information Classification Using Dynamic Acquaintance Lists.
ERIC Educational Resources Information Center
Mukhopadhyay, Snehasis; Peng, Shengquan; Raje, Rajeev; Palakal, Mathew; Mostafa, Javed
2003-01-01
Discussion of automated information services focuses on information classification and collaborative agents, i.e. intelligent computer programs. Highlights include multi-agent systems; distributed artificial intelligence; thesauri; document representation and classification; agent modeling; acquaintances, or remote agents discovered through…
Thesaurus-Enhanced Search Interfaces.
ERIC Educational Resources Information Center
Shiri, Ali Asghar; Revie, Crawford; Chowdhury, Gobinda
2002-01-01
Discussion of user interfaces to information retrieval systems focuses on interfaces that incorporate thesauri as part of their searching and browsing facilities. Discusses research literature related to information searching behavior, information retrieval interface evaluation, search term selection, and query expansion; and compares thesaurus…
ERIC Educational Resources Information Center
Berry, John N., III
1998-01-01
Interviews with three CEOs--UMI (University Microfilms International), OCLC (Online Computer Library Center), and Gale Research--focus on outlooks for information and libraries. Discusses expanded educational Web services/courseware, library role in delivery, electronic dissertation publishing, digital data conversion, thesauri/indexing, union…
Bibliographic Databases Outside of the United States.
ERIC Educational Resources Information Center
McGinn, Thomas P.; And Others
1988-01-01
Eight articles describe the development, content, and structure of databases outside of the United States. Features discussed include library involvement, authority control, shared cataloging services, union catalogs, thesauri, abstracts, and distribution methods. Countries and areas represented are Latin America, Australia, the United Kingdom,…
Baneyx, Audrey; Charlet, Jean; Jaulent, Marie-Christine
2007-01-01
Pathologies and acts are classified in thesauri to help physicians to code their activity. In practice, the use of thesauri is not sufficient to reduce variability in coding and thesauri are not suitable for computer processing. We think the automation of the coding task requires a conceptual modeling of medical items: an ontology. Our task is to help lung specialists code acts and diagnoses with software that represents medical knowledge of this concerned specialty by an ontology. The objective of the reported work was to build an ontology of pulmonary diseases dedicated to the coding process. To carry out this objective, we develop a precise methodological process for the knowledge engineer in order to build various types of medical ontologies. This process is based on the need to express precisely in natural language the meaning of each concept using differential semantics principles. A differential ontology is a hierarchy of concepts and relationships organized according to their similarities and differences. Our main research hypothesis is to apply natural language processing tools to corpora to develop the resources needed to build the ontology. We consider two corpora, one composed of patient discharge summaries and the other being a teaching book. We propose to combine two approaches to enrich the ontology building: (i) a method which consists of building terminological resources through distributional analysis and (ii) a method based on the observation of corpus sequences in order to reveal semantic relationships. Our ontology currently includes 1550 concepts and the software implementing the coding process is still under development. Results show that the proposed approach is operational and indicates that the combination of these methods and the comparison of the resulting terminological structures give interesting clues to a knowledge engineer for the building of an ontology.
Efthimiadis, E N; Afifi, M
1996-01-01
OBJECTIVES: This study examined methods of accessing (for indexing and retrieval purposes) medical research on population groups in the major abstracting and indexing services of the health sciences literature. DESIGN: The study of diseases in specific population groups is facilitated by the indexing of both diseases and populations in a database. The MEDLINE, PsycINFO, and Embase databases were selected for the study. The published thesauri for these databases were examined to establish the vocabulary in use. Indexing terms were identified and examined as to their representation in the current literature. Terms were clustered further into groups thought to reflect an end user's perspective and to facilitate subsequent analysis. The medical literature contained in the three online databases was searched with both controlled vocabulary and natural language terms. RESULTS: The three thesauri revealed shallow pre-coordinated hierarchical structures, rather difficult-to-use terms for post-coordination, and a blurring of cultural, genetic, and racial facets of populations. Post-coordination is difficult because of the system-oriented terminology, which is intended mostly for information professionals. The terminology unintentionally restricts access by the end users who lack the knowledge needed to use the thesauri effectively for information retrieval. CONCLUSIONS: Population groups are not represented adequately in the index languages of health sciences databases. Users of these databases need to be alerted to the difficulties that may be encountered in searching for information on population groups. Information and health professionals may not be able to access the literature if they are not familiar with the indexing policies on population groups. Consequently, the study points to a problem that needs to be addressed, through either the redesign of existing systems or the design of new ones to meet the goals of Healthy People 2000 and beyond. PMID:8883987
Specifications for Thesaurus Software.
ERIC Educational Resources Information Center
Milstead, Jessica L.
1991-01-01
Presents specifications for software that is designed to support manual development and maintenance of information retrieval thesauri. Evaluation of existing software and design of custom software is discussed, requirements for integration with larger systems and for the user interface are described, and relationships among terms are discussed.…
Linked Forests: Semantic similarity of geographical concepts "forest"
NASA Astrophysics Data System (ADS)
Čerba, Otakar; Jedlička, Karel
2016-01-01
Linked Data represents the new trend in geoinformatics and geomatics. It produces a structure of objects (in a form of concepts or terms) interconnected by object relations expressing a type of semantic relationships of various concepts. The research published in this article studies, if objects connected by above mentioned relations are more similar than objects representing the same phenomenon, but standing alone. The phenomenon "forest" and relevant geographical concepts were chosen as the domain of the research. The concepts similarity (Tanimoto coefficient as a specification of Tversky index) was computed on the basis of explicit information provided by thesauri containing particular concepts. Overall in the seven thesauri (AGROVOC, EuroVoc, GEMET, LusTRE/EARTh, NAL, OECD and STW) there was tested if the "forest" concept interconnected by the relation skos:exactMatch are more similar than other, not interlinked concepts. The results of the research are important for the sharing and combining of geographical data, information and knowledge. The proposed methodology can be reused to a comparison of other geographical concepts.
Vocabulary Control and the Humanities: A Case Study of the "MLA International Bibliography."
ERIC Educational Resources Information Center
Stebelman, Scott
1994-01-01
Discussion of research in the humanities focuses on the "MLA International Bibliography," the primary database for literary research. Highlights include comparisons to research in the sciences; humanities vocabulary; database search techniques; contextual indexing; examples of searches; thesauri; and software. (43 references) (LRW)
Science and Technology Text Mining: Hypersonic and Supersonic Flow
2003-11-17
Saussure , 1949]. A summary of co-word origins, and evolution of co-word into computational linguistics, can be found in Kostoff [1993b]. Co-word...Global Thesauri. Information Processing and Management. 26:5. 1990. De Saussure , F. (1949). Cours de Linguistique Generale. 4eme Edition
Accessing Federal Government Documents Online.
ERIC Educational Resources Information Center
Hunt, Deborah S.
1982-01-01
Describes and presents in a table the contents, coverage, updating frequency, system availability, and thesauri available for 24 databases which provide business, medical, legal, criminal justice, and statistical information, as well as information from/on the U.S. Congress, the Federal Register, U.S. Government procurement, technical reports, and…
Integrating Borrowed Records into a Database: Impact on Thesaurus Development and Retrieval.
ERIC Educational Resources Information Center
And Others; Kirtland, Monika
1980-01-01
Discusses three approaches to thesaurus and indexing/retrieval language maintenance for combined databases: reindexing, merging, and initial standardization. Two thesauri for a combined database are evaluated in terms of their compatibility, and indexing practices are compared. Tables and figures help illustrate aspects of the comparison. (SW)
ERIC Educational Resources Information Center
Janke, Richard V.; And Others
1988-01-01
The first article describes SPORT, a database providing international coverage of athletics and physical education, and compares it to other online services in terms of coverage, thesauri, possible search strategies, and actual usage. The second article reviews available online information on sports medicine. (CLB)
Automatic Thesaurus Generation for an Electronic Community System.
ERIC Educational Resources Information Center
Chen, Hsinchun; And Others
1995-01-01
This research reports an algorithmic approach to the automatic generation of thesauri for electronic community systems. The techniques used include term filtering, automatic indexing, and cluster analysis. The Worm Community System, used by molecular biologists studying the nematode worm C. elegans, was used as the testbed for this research.…
ERIC Educational Resources Information Center
McIlwaine, I. C.
1997-01-01
Discusses the history and development of the Universal Decimal Classification (UDC). Topics include the relationship with Dewey Decimal Classification; revision process; structure; facet analysis; lack of standard rules for application; application in automated systems; influence of UDC on classification development; links with thesauri; and use…
Techniques of Document Management: A Review of Text Retrieval and Related Technologies.
ERIC Educational Resources Information Center
Veal, D. C.
2001-01-01
Reviews present and possible future developments in the techniques of electronic document management, the major ones being text retrieval and scanning and OCR (optical character recognition). Also addresses document acquisition, indexing and thesauri, publishing and dissemination standards, impact of the Internet, and the document management…
Language Pedagogy and Non-Transience in the Flipped Classroom
ERIC Educational Resources Information Center
Cunningham, Una
2016-01-01
High connectivity at tertiary institutions, and students who are often equipped with laptops and/or tablets as well as smartphones, have resulted in language learners being able to freely access technology and the internet. Reference tools such as dictionaries, concordancers, translators, and thesauri, with pronunciation and usage tips, are…
Term Based Comparison Metrics for Controlled and Uncontrolled Indexing Languages
ERIC Educational Resources Information Center
Good, B. M.; Tennis, J. T.
2009-01-01
Introduction: We define a collection of metrics for describing and comparing sets of terms in controlled and uncontrolled indexing languages and then show how these metrics can be used to characterize a set of languages spanning folksonomies, ontologies and thesauri. Method: Metrics for term set characterization and comparison were identified and…
ERIC Educational Resources Information Center
Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.
1998-01-01
Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…
Development and Evaluation of Thesauri-Based Bibliographic Biomedical Search Engine
ERIC Educational Resources Information Center
Alghoson, Abdullah
2017-01-01
Due to the large volume and exponential growth of biomedical documents (e.g., books, journal articles), it has become increasingly challenging for biomedical search engines to retrieve relevant documents based on users' search queries. Part of the challenge is the matching mechanism of free-text indexing that performs matching based on…
A Thesaurus for Use in a Computer-Aided Abstracting Tool Kit.
ERIC Educational Resources Information Center
Craven, Timothy C.
1993-01-01
Discusses the use of thesauri in automatic indexing and describes the development of a prototype computerized abstractor's assistant. Topics addressed include TEXNET, a text network management system; the use of TEXNET for abstracting; the structure and use of a thesaurus for abstracting in TEXNET; and weighted terms. (Contains 26 references.)…
Subject Access to "Pornography" for Serious Research Purposes.
ERIC Educational Resources Information Center
Moya, Cynde
2001-01-01
Examines some of the research needs in academic disciplines for access to pornographic materials, and looks at tools, such as thesauri and Web directories, which have been built to help searchers find materials. Discusses research needs for access to materials; tools built by librarians to subject analyze material; and Internet words for indexing…
ERIC Educational Resources Information Center
Ortiz, Eduardo; Basile, Anne
Based on educational administration textbooks and on thesauri and dictionaries published by the United Nations Educational, Scientific, and Cultural Organization (UNESCO), the International Bureau of Education (IBE), and other institutions, this document presents a trilingual (English, French, and Spanish) glossary of approximately 2,500 terms or…
ERIC Educational Resources Information Center
Pastor-Sanchez, Juan-Antonio; Martinez Mendez, Francisco Javier; Rodriguez-Munoz, Jose Vicente
2009-01-01
Introduction: This paper presents an analysis of the Simple Knowledge Organization System (SKOS) compared with other alternatives for thesaurus representation in the Semantic Web. Method: Based on functional and structural changes of thesauri, provides an overview of the current context in which lexical paradigm is abandoned in favour of the…
Mind Maps: Hot New Tools Proposed for Cyberspace Librarians.
ERIC Educational Resources Information Center
Humphreys, Nancy K.
1999-01-01
Describes how online searchers can use a software tool based on back-of-the-book indexes to assist in dealing with search engine databases compiled by spiders that crawl across the entire Internet or through large Web sites. Discusses human versus machine knowledge, conversion of indexes to mind maps or mini-thesauri, middleware, eXtensible Markup…
ERIC Educational Resources Information Center
Kirtland, Monika
1981-01-01
Outlines a methodology for standardizing word lists of subject- related fields using a macrothesaurus which provides basic classification structure and terminology for the subject at large and which adapts to the specific needs of its subfields. The example of the Cancer Information Thesaurus (CIT) is detailed. Six references are listed. (FM)
ERIC Educational Resources Information Center
Crouch, Dora; And Others
Funded by the Council on Library Resources, this project surveyed thesauri in the fields of art and architecture to seek out existing projects and analyze their content and form. It found that no comprehensive or standardized thesaurus presently exists for art and architecture; rather individual subjects are tailored to meet the needs of a…
DoD Net-Centric Services Strategy Implementation in the C2 Domain
2010-02-01
those for monolingual thesauri indicated in ANSI/NISO Z39.19-2005 and ISO 2788-1986. Also, the versioning regimen in the KOS must be robust, a...Metadata Registry: Repository of all metadata related to data structures, models, dictionaries , taxonomies, schema, and other engineering artifacts that...access information, schemas, style sheets, controlled vocabularies, dictionaries , and other work products. It would normally be discovered via a
Consumer language, patient language, and thesauri: a review of the literature
Smith, Catherine A
2011-01-01
Objective: Online social networking sites are web services in which users create public or semipublic profiles and connect to build online communities, finding likeminded people through self-labeled personal attributes including ethnicity, leisure interests, political beliefs, and, increasingly, health status. Thirty-nine percent of patients in the United States identified themselves as users of social networks in a recent survey. “Tags,” user-generated descriptors functioning as labels for user-generated content, are increasingly important to social networking, and the language used by patients is thus becoming important for knowledge representation in these systems. However, patient language poses considerable challenges for health communication and networking. How have information systems traditionally incorporated these languages in their controlled vocabularies and thesauri? How do system builders know what consumers and patients say? Methods: This comprehensive review of the literature of health care (PubMed MEDLINE, CINAHL), library science, and information science (Library and Information Science and Technology Abstracts, Library and Information Science Abstracts, and Library Literature) examines the research domains in which consumer and patient language has been explored. Results: Consumer contributions to controlled vocabulary appear to be seriously under-researched inside and outside of health care. Conclusion: The author reflects on the implications of these findings for online social networks devoted to patients and the patient experience. PMID:21464851
Consumer language, patient language, and thesauri: a review of the literature.
Smith, Catherine A
2011-04-01
Online social networking sites are web services in which users create public or semipublic profiles and connect to build online communities, finding like-minded people through self-labeled personal attributes including ethnicity, leisure interests, political beliefs, and, increasingly, health status. Thirty-nine percent of patients in the United States identified themselves as users of social networks in a recent survey. "Tags," user-generated descriptors functioning as labels for user-generated content, are increasingly important to social networking, and the language used by patients is thus becoming important for knowledge representation in these systems. However, patient language poses considerable challenges for health communication and networking. How have information systems traditionally incorporated these languages in their controlled vocabularies and thesauri? How do system builders know what consumers and patients say? This comprehensive review of the literature of health care (PubMed MEDLINE, CINAHL), library science, and information science (Library and Information Science and Technology Abstracts, Library and Information Science Abstracts, and Library Literature) examines the research domains in which consumer and patient language has been explored. Consumer contributions to controlled vocabulary appear to be seriously under-researched inside and outside of health care. The author reflects on the implications of these findings for online social networks devoted to patients and the patient experience.
ERIC Educational Resources Information Center
Cochrane, Pauline A.; Kirtland, Monika
A comprehensive guide to the literature published between World War II and 1979 which critically evaluates the Library of Congress list of Subject Headings (LCSH), this bibliography has been prepared for information personnel involved with subject authority files, thesauri, or vocabulary control. A brief bibliometric analysis of the literature…
NASA Technical Reports Server (NTRS)
Krueger, Jonathan
1990-01-01
This document describes functionality to be developed to support the NATO technical thesaurus. Described are the specificity of the thesaurus structure and function; the distinction between the thesaurus information and its representation in a given online, machine readable, or printed form; the enhancement of the thesaurus with the assignment of COSATI codes (fields and groups) to posting terms, the integration of DTIC DRIT and NASA thesauri related terminology, translation of posting terms into French; and the provision of a basis for system design.
Creating, Using and Updating Thesauri Files for AutoMap and ORA
2012-07-26
occurrences or phenomena that happen. An Event could be 9-11, the JFK Assignation, the Super Bowl, a wedding, a funeral, or an inauguration. Specific events...a better place without Caesar (Belief). To kill Caesar (Task) they form a group of assassins (Organization). To accomplish their task they need to...know about Caesar’s daily routine (Knowledge) and how to get their knives (Resources) into the senate. Finally, the assassination (Event) takes place
Semantic-Based Information Retrieval of Biomedical Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiao, Yu; Potok, Thomas E; Hurson, Ali R.
In this paper, we propose to improve the effectiveness of biomedical information retrieval via a medical thesaurus. We analyzed the deficiencies of the existing medical thesauri and reconstructed a new thesaurus, called MEDTHES, which follows the ANSI/NISO Z39.19-2003 standard. MEDTHES also endows the users with fine-grained control of information retrieval by providing functions to calculate the semantic similarity between words. We demonstrate the usage of MEDTHES through an existing data search engine.
The Development of a Ram Air Decelerator for the Recovery of Artillery Shells
1993-01-01
number of specialized services help round out the - rogram’s diverse offerings, including creating custom thesauri, translating material to or from...the Internet to help((i sti.nasa.gov "* Write to: Accesslon For NASA Access Help Desk I IS QRA&I , NASA Center for AeroSpace Information DTIC TAB 0...radial linesNme N,,,no Defnii’ projecting from the center outwards midway aeaewute, tips Wn R• • 6. 7& Wm of each of die lobes. When spmn, the sea
Toward intelligent information system
NASA Astrophysics Data System (ADS)
Takano, Fumio; Hinatsu, Ken'ichi
This article describes the indexing aid system and project at JICST, API, NLM and BIOSIS. They are dealing with the very broad domain of science, medicine and technological literatures and indexing is done by use of controlled terms, the indexing is routinely performed by highly skilled indexers. Because of the high cost of controlled indexing of bibliographic information they have designed automated indexing system and/or expert-like system to take advantage of many years of experienced indexing using knowledge bases and /on thesauri.
Merrill, Gary H
2008-11-06
MedDRA (the Medical Dictionary for Regulatory Activities Terminology) is a controlled vocabulary widely used as a medical coding scheme. However, MedDRA's characterization of its structural hierarchy exhibits some confusing and paradoxical features. The goal of this paper is to examine these features, determine whether there is a coherent view of the MedDRA hierarchy that emerges, and explore what lessons are to be learned from this for using MedDRA and similar terminologies in a broad medical informatics context that includes relations among multiple disparate terminologies, thesauri, and ontologies.
Basic Lessons in ORA and AutoMap 2012
2012-06-11
boy named Dave. He has 2 balls. 1 ball is red. 1 ball is blue. milkAndCookies.txt: Dave wants milk and cookies. He drives to the store. He then buys... milk and cookies. 2. Create Concept List From the Pull Down Menu select Generate => Concept List => Concept List (per text). Navigate to where you...the thesaurus. Using the ThesauriContentOnly option You create a Meta-Network (Carley, 200) with the one-grams dog, cow , and farm. If you are going
Coordinating Council. Tenth Meeting: Information retrieval: The role of controlled vocabularies
NASA Technical Reports Server (NTRS)
1993-01-01
The theme of this NASA Scientific and Technical Information Program Coordinating Council meeting was the role of controlled vocabularies (thesauri) in information retrieval. Included are summaries of the presentations and the accompanying visuals. Dr. Raya Fidel addressed 'Retrieval: Free Text, Full Text, and Controlled Vocabularies.' Dr. Bella Hass Weinberg spoke on 'Controlled Vocabularies and Thesaurus Standards.' The presentations were followed by a panel discussion with participation from NASA, the National Library of Medicine, the Defense Technical Information Center, and the Department of Energy; this discussion, however, is not summarized in any detail in this document.
Parsaei-Mohammadi, Parastoo; Ghasemi, Ali Hossein; Hassanzadeh-Beheshtabad, Raziyeh
2017-01-01
Introduction: In the present era, thesauri as tools in indexing play an effective role in integrating retrieval preventing fragmentation as well as a multiplicity of terminologies and also in providing information content of documents. Goals: This study aimed to investigate the keywords of articles indexed in IranMedex in terms of origin, structure and indexing situation and their Compliance with the Persian Medical Thesaurus and Medical Subject Headings (MeSH). Materials and Methods: This study is an applied research, and a survey has been conducted. Statistical population includes 32,850 Persian articles which are indexed in the IranMedex during the years 1385–1391. 379 cases were selected as sample of the study. Data collection was done using a checklist. In analyzing the findings, the SPSS Software were used. Findings: Although there was no significant difference in terms of indexing origin between the proportion of different types of the Persian and English keywords of articles indexed in the IranMedex, the compliance rates of the Persian and English keywords with the Persian medical thesaurus and MeSH were different in different years. In the meantime, the structure of keywords is leaning more towards phrase structure, and a single word structure and the majority of keywords are selected from the titles and abstracts. Conclusion: The authors’ familiarity with the thesauri and controlled tools causes homogeneity in assigning keywords and also provides more precise, faster, and easier retrieval of the keywords. It's suggested that a mixture of natural and control languages to be used in this database in order to reach more comprehensive results. PMID:28546967
Parsaei-Mohammadi, Parastoo; Ghasemi, Ali Hossein; Hassanzadeh-Beheshtabad, Raziyeh
2017-01-01
In the present era, thesauri as tools in indexing play an effective role in integrating retrieval preventing fragmentation as well as a multiplicity of terminologies and also in providing information content of documents. This study aimed to investigate the keywords of articles indexed in IranMedex in terms of origin, structure and indexing situation and their Compliance with the Persian Medical Thesaurus and Medical Subject Headings (MeSH). This study is an applied research, and a survey has been conducted. Statistical population includes 32,850 Persian articles which are indexed in the IranMedex during the years 1385-1391. 379 cases were selected as sample of the study. Data collection was done using a checklist. In analyzing the findings, the SPSS Software were used. Although there was no significant difference in terms of indexing origin between the proportion of different types of the Persian and English keywords of articles indexed in the IranMedex, the compliance rates of the Persian and English keywords with the Persian medical thesaurus and MeSH were different in different years. In the meantime, the structure of keywords is leaning more towards phrase structure, and a single word structure and the majority of keywords are selected from the titles and abstracts. The authors' familiarity with the thesauri and controlled tools causes homogeneity in assigning keywords and also provides more precise, faster, and easier retrieval of the keywords. It's suggested that a mixture of natural and control languages to be used in this database in order to reach more comprehensive results.
Synonym set extraction from the biomedical literature by lexical pattern discovery.
McCrae, John; Collier, Nigel
2008-03-24
Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst 1, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy. The assumption of extant resources such as parsers is also a limiting factor for many languages, so it is desirable to find patterns that do not use syntactical analysis. Finally to give a more consistent and applicable result it is desirable to use these patterns to form synonym sets in a sound way. We present a method that automatically generates regular expression patterns by expanding seed patterns in a heuristic search and then develops a feature vector based on the occurrence of term pairs in each developed pattern. This allows for a binary classifications of term pairs as synonymous or non-synonymous. We then model this result as a probability graph to find synonym sets, which is equivalent to the well-studied problem of finding an optimal set cover. We achieved 73.2% precision and 29.7% recall by our method, out-performing hand-made resources such as MeSH and Wikipedia. We conclude that automatic methods can play a practical role in developing new thesauri or expanding on existing ones, and this can be done with only a small amount of training data and no need for resources such as parsers. We also concluded that the accuracy can be improved by grouping into synonym sets.
Methodology to build medical ontology from textual resources.
Baneyx, Audrey; Charlet, Jean; Jaulent, Marie-Christine
2006-01-01
In the medical field, it is now established that the maintenance of unambiguous thesauri goes through ontologies. Our research task is to help pneumologists code acts and diagnoses with a software that represents medical knowledge through a domain ontology. In this paper, we describe our general methodology aimed at knowledge engineers in order to build various types of medical ontologies based on terminology extraction from texts. The hypothesis is to apply natural language processing tools to textual patient discharge summaries to develop the resources needed to build an ontology in pneumology. Results indicate that the joint use of distributional analysis and lexico-syntactic patterns performed satisfactorily for building such ontologies.
Ontology Matching with Semantic Verification.
Jean-Mary, Yves R; Shironoshita, E Patrick; Kabuka, Mansur R
2009-09-01
ASMOV (Automated Semantic Matching of Ontologies with Verification) is a novel algorithm that uses lexical and structural characteristics of two ontologies to iteratively calculate a similarity measure between them, derives an alignment, and then verifies it to ensure that it does not contain semantic inconsistencies. In this paper, we describe the ASMOV algorithm, and then present experimental results that measure its accuracy using the OAEI 2008 tests, and that evaluate its use with two different thesauri: WordNet, and the Unified Medical Language System (UMLS). These results show the increased accuracy obtained by combining lexical, structural and extensional matchers with semantic verification, and demonstrate the advantage of using a domain-specific thesaurus for the alignment of specialized ontologies.
Hoelzer, Simon; Schweiger, Ralf K; Liu, Raymond; Rudolf, Dirk; Rieger, Joerg; Dudeck, Joachim
2005-01-01
With the introduction of the ICD-10 as the standard for diagnosis, the development of an electronic representation of its complete content, inherent semantics and coding rules is necessary. Our concept refers to current efforts of the CEN/TC 251 to establish a European standard for hierarchical classification systems in healthcare. We have developed an electronic representation of the ICD-10 with the extensible Markup Language (XML) that facilitates the integration in current information systems or coding software taking into account different languages and versions. In this context, XML offers a complete framework of related technologies and standard tools for processing that helps to develop interoperable applications.
An introduction to information retrieval: applications in genomics
Nadkarni, P M
2011-01-01
Information retrieval (IR) is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a user’s query. IR technology is the basis of Web-based search engines, and plays a vital role in biomedical research, because it is the foundation of software that supports literature search. Documents can be indexed by both the words they contain, as well as the concepts that can be matched to domain-specific thesauri; concept matching, however, poses several practical difficulties that make it unsuitable for use by itself. This article provides an introduction to IR and summarizes various applications of IR and related technologies to genomics. PMID:12049181
Lopetegui, Marcelo A; Lara, Barbara A; Yen, Po-Yin; Çatalyürek, Ümit V; Payne, Philip R O
2015-01-01
Multiple choice questions play an important role in training and evaluating biomedical science students. However, the resource intensive nature of question generation limits their open availability, reducing their contribution to evaluation purposes mainly. Although applied-knowledge questions require a complex formulation process, the creation of concrete-knowledge questions (i.e., definitions, associations) could be assisted by the use of informatics methods. We envisioned a novel and simple algorithm that exploits validated knowledge repositories and generates concrete-knowledge questions by leveraging concepts' relationships. In this manuscript we present the development and validation of a prototype which successfully produced meaningful concrete-knowledge questions, opening new applications for existing knowledge repositories, potentially benefiting students of all biomedical sciences disciplines.
CoPub: a literature-based keyword enrichment tool for microarray data analysis.
Frijters, Raoul; Heupers, Bart; van Beek, Pieter; Bouwhuis, Maurice; van Schaik, René; de Vlieg, Jacob; Polman, Jan; Alkema, Wynand
2008-07-01
Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl.
The Pitfalls of Thesaurus Ontologization – the Case of the NCI Thesaurus
Schulz, Stefan; Schober, Daniel; Tudose, Ilinca; Stenzhorn, Holger
2010-01-01
Thesauri that are “ontologized” into OWL-DL semantics are highly amenable to modeling errors resulting from falsely interpreting existential restrictions. We investigated the OWL-DL representation of the NCI Thesaurus (NCIT) in order to assess the correctness of existential restrictions. A random sample of 354 axioms using the someValuesFrom operator was taken. According to a rating performed by two domain experts, roughly half of these examples, and in consequence more than 76,000 axioms in the OWL-DL version, make incorrect assertions if interpreted according to description logics semantics. These axioms therefore constitute a huge source for unintended models, rendering most logic-based reasoning unreliable. After identifying typical error patterns we discuss some possible improvements. Our recommendation is to either amend the problematic axioms in the OWL-DL formalization or to consider some less strict representational format. PMID:21347074
Ontologies in medicinal chemistry: current status and future challenges.
Gómez-Pérez, Asunción; Martínez-Romero, Marcos; Rodríguez-González, Alejandro; Vázquez, Guillermo; Vázquez-Naya, José M
2013-01-01
Recent years have seen a dramatic increase in the amount and availability of data in the diverse areas of medicinal chemistry, making it possible to achieve significant advances in fields such as the design, synthesis and biological evaluation of compounds. However, with this data explosion, the storage, management and analysis of available data to extract relevant information has become even a more complex task that offers challenging research issues to Artificial Intelligence (AI) scientists. Ontologies have emerged in AI as a key tool to formally represent and semantically organize aspects of the real world. Beyond glossaries or thesauri, ontologies facilitate communication between experts and allow the application of computational techniques to extract useful information from available data. In medicinal chemistry, multiple ontologies have been developed during the last years which contain knowledge about chemical compounds and processes of synthesis of pharmaceutical products. This article reviews the principal standards and ontologies in medicinal chemistry, analyzes their main applications and suggests future directions.
Altmann, U.; Tafazzoli, A. G.; Noelle, G.; Huybrechts, T.; Schweiger, R.; Wächter, W.; Dudeck, J. W.
1999-01-01
In oncology various international and national standards exist for the documentation of different aspects of a disease. Since elements of these standards are repeated in different contexts, a common data dictionary could support consistent representation in any context. For the construction of such a dictionary existing documents have to be worked up in a complex procedure, that considers aspects of hierarchical decomposition of documents and of domain control as well as aspects of user presentation and models of the underlying model of patient data. In contrast to other thesauri, text chunks like definitions or explanations are very important and have to be preserved, since oncologic documentation often means coding and classification on an aggregate level and the safe use of coding systems is an important precondition for comparability of data. This paper discusses the potentials of the use of XML in combination with a dictionary for the promotion and development of standard conformable applications for tumor documentation. PMID:10566311
Experiments in automatic word class and word sense identification for information retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gauch, S.; Futrelle, R.P.
Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval system4ms. Large online corpora and increased computational capabilities make new techniques based on corpus linguisitics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classesmore » without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology.« less
Approaches to the construction of a medical informatics glossary and thesaurus.
Rada, R; Ghaoui, C; Russell, J; Taylor, M
1993-01-01
In a project concerned with establishing a glossary and thesaurus for the medical informatics domain, various approaches to the task have been investigated. The developers take the view that a glossary should be a coherent system of terms, reflecting a coherent system of concepts that underlies a body of knowledge about a domain. A framework for the conceptual analysis of the concepts/terms underlying the domain has been developed. The emphasis of this framework is on how the concepts relate together. This work has given an important insight into how the practical task of establishing well-structured vocabularies for a field can be better achieved. An eclectic approach to term selection was adopted. Criteria for assessing what constitutes good definitions for concepts in a field were examined. Using all these approaches glossaries, thesauri and domain models of the medical informatics field are being developed. Another aspect of our work of particular interest is the development of attributed definitions from which inheritance patterns can be defined.
NASA Astrophysics Data System (ADS)
Ebner, M.; Schiegl, M.; Stöckl, W.; Heger, H.
2012-04-01
The Geological Survey of Austria is legally obligated by the INSPIRE directive to provide data that fall under this directive (geology, mineral resources and natural risk zones) to the European commission in a semantically harmonized and technically interoperable way. Until recently the focus was entirely on the publication of high quality printed cartographic products. These have a complex (carto-)graphic data-model, which allows visualizing several thematic aspects, such as lithology, stratigraphy, tectonics, geologic age, mineral resources, mass movements, geomorphology etc. in a single planar map/product. Nonetheless these graphic data-models do not allow retrieving individual thematic aspects since these were coded in a complex portrayal scheme. Automatic information retrieval is thus impossible; and domain knowledge is necessary to interpret these "encrypted datasets". With INSPIRE becoming effective and a variety of conceptual models (e.g. GeoSciML), built around a semantic framework (i.e. controlled vocabularies), being available it is necessary to develop a strategy and workflow for semantic harmonization of such datasets. In this contribution we demonstrate the development of a multistage workflow which will allow us to transform our printed maps to semantically enabled datasets and services and discuss some prerequisites, foundations and problems. In a first step in our workflow we analyzed our maps and developed controlled vocabularies that describe the thematic content of our data. We then developed a physical data-model which we use to attribute our spatial data with thematic information from our controlled vocabularies to form core thematic data sets. This physical data model is geared towards use on an organizational level but builds upon existing standards (INSPIRE, GeoSciML) to allow transformation to international standards. In a final step we will develop a standardized mapping scheme to publish INSPIRE conformant services from our core datasets. This two-step transformation is necessary since a direct mapping to international standards is not possible for traditional map-based data. Controlled vocabularies provide the foundation of a semantic harmonization. For the encoding of the vocabularies we build upon the W3C standard SKOS (=Simple Knowledge Organisation System), a thesaurus specification for the semantic web, which is itself based on the Resource Description Framework (RDF) and RDF Schema and added some DublinCore and VoID for the metadata of our vocabularies and resources. For the development of these thesauri we use the commercial software PoolParty, which is a tool specially build to develop, manage and publish multilingual thesauri. The corporate thesauri of the Austrian Geological Survey are exposed via a web-service that is conformant with the linked data principles. This web-service gives access to a (1) RDF/HTML representation of the resources via a simple, robust and thus persistent http URIs (2) a download of the complete vocabularies in RDF-format (3) a full-fledged SPARQL-Endpoint to query the thesaurus. With the development of physical data-models (based on preexisting conceptual models) one must dismiss the classical schemes of map-based portrayal of data. E.g. for individual Geological units on traditional geological maps usually a single age range is given (e.g. formation age). But one might want to attribute several geologic ages (formation age, metamorphic age, cooling ages etc.) to individual units. Such issues have to be taken into account when developing robust physical data-models. Based on our experience we are convinced that individual institutions need to develop their own controlled vocabularies and individual data-models that fit the specific needs on an organizational level. If externally developed vocabularies and data-models are introduced to established workflows newly generated and existing data may be diverging and it will be hard to achieve or maintain a common standard. We thus suggest that it is necessary for institutions to keep (or develop) to their organizational standards and vocabularies and map them to generally agreed international standards such as INSPIRE or GeoSciML in a fashion suggested by the linked data principles.
NASA Astrophysics Data System (ADS)
Gray, A. J. G.; Gray, N.; Ounis, I.
2009-09-01
There are multiple vocabularies and thesauri within astronomy, of which the best known are the 1993 IAU Thesaurus and the keyword list maintained by A&A, ApJ and MNRAS. The IVOA has agreed on a standard for publishing vocabularies, based on the W3C skos standard, to allow greater automated interaction with them, in particular on the Web. This allows links with the Semantic Web and looks forward to richer applications using the technologies of that domain. Vocabulary-aware applications can benefit from improvements in both precision and recall when searching for bibliographic or science data, and lightweight intelligent filtering for services such as VOEvent streams. In this paper we present two applications, the Vocabulary Explorer and its companion the Mapping Editor, which have been developed to support the use of vocabularies in the Virtual Observatory. These combine Semantic Web and Information Retrieval technologies to illustrate the way in which formal vocabularies might be used in a practical application, provide an online service which will allow astronomers to explore and relate existing vocabularies, and provide a service which translates free text user queries into vocabulary terms.
Specialist Bibliographic Databases
2016-01-01
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485
Specialist Bibliographic Databases.
Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D
2016-05-01
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.
Environmental/Biomedical Terminology Index
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huffstetler, J.K.; Dailey, N.S.; Rickert, L.W.
1976-12-01
The Information Center Complex (ICC), a centrally administered group of information centers, provides information support to environmental and biomedical research groups and others within and outside Oak Ridge National Laboratory. In-house data base building and development of specialized document collections are important elements of the ongoing activities of these centers. ICC groups must be concerned with language which will adequately classify and insure retrievability of document records. Language control problems are compounded when the complexity of modern scientific problem solving demands an interdisciplinary approach. Although there are several word lists, indexes, and thesauri specific to various scientific disciplines usually groupedmore » as Environmental Sciences, no single generally recognized authority can be used as a guide to the terminology of all environmental science. If biomedical terminology for the description of research on environmental effects is also needed, the problem becomes even more complex. The building of a word list which can be used as a general guide to the environmental/biomedical sciences has been a continuing activity of the Information Center Complex. This activity resulted in the publication of the Environmental Biomedical Terminology Index (EBTI).« less
NASA Astrophysics Data System (ADS)
Cuttler, R. T. H.; Tonner, T. W. W.; Al-Naimi, F. A.; Dingwall, L. M.; Al-Hemaidi, N.
2013-07-01
The development of the Qatar National Historic Environment Record (QNHER) by the Qatar Museums Authority and the University of Birmingham in 2008 was based on a customised, bilingual Access database and ArcGIS. While both platforms are stable and well supported, neither was designed for the documentation and retrieval of cultural heritage data. As a result it was decided to develop a custom application using Open Source code. The core module of this application is now completed and is orientated towards the storage and retrieval of geospatial heritage data for the curation of heritage assets. Based on MIDAS Heritage data standards and regionally relevant thesauri, it is a truly bilingual system. Significant attention has been paid to the user interface, which is userfriendly and intuitive. Based on a suite of web services and accessed through a web browser, the system makes full use of internet resources such as Google Maps and Bing Maps. The application avoids long term vendor ''tie-ins'' and as a fully integrated data management system, is now an important tool for both cultural resource managers and heritage researchers in Qatar.
Challenges of interoperability using HL7 v3 in Czech healthcare.
Nagy, Miroslav; Preckova, Petra; Seidl, Libor; Zvarova, Jana
2010-01-01
The paper describes several classification systems that could improve patient safety through semantic interoperability among contemporary electronic health record systems (EHR-Ss) with support of the HL7 v3 standard. We describe a proposal and a pilot implementation of a semantic interoperability platform (SIP) interconnecting current EHR-Ss by using HL7 v3 messages and concepts mappings on most widely used classification systems. The increasing number of classification systems and nomenclatures requires designing of various conversion tools for transfer between main classification systems. We present the so-called LIM filler module and the HL7 broker, which are parts of the SIP, playing the role of such conversion tools. The analysis of suitability and usability of individual terminological thesauri has been started by mapping of clinical contents of the Minimal Data Model for Cardiology (MDMC) to various terminological classification systems. A national-wide implementation of the SIP would include adopting and translating international coding systems and nomenclatures, and developing implementation guidelines facilitating the migration from national standards to international ones. Our research showed that creation of such a platform is feasible; however, it will require a huge effort to adapt fully the Czech healthcare system to the European environment.
[Big data, medical language and biomedical terminology systems].
Schulz, Stefan; López-García, Pablo
2015-08-01
A variety of rich terminology systems, such as thesauri, classifications, nomenclatures and ontologies support information and knowledge processing in health care and biomedical research. Nevertheless, human language, manifested as individually written texts, persists as the primary carrier of information, in the description of disease courses or treatment episodes in electronic medical records, and in the description of biomedical research in scientific publications. In the context of the discussion about big data in biomedicine, we hypothesize that the abstraction of the individuality of natural language utterances into structured and semantically normalized information facilitates the use of statistical data analytics to distil new knowledge out of textual data from biomedical research and clinical routine. Computerized human language technologies are constantly evolving and are increasingly ready to annotate narratives with codes from biomedical terminology. However, this depends heavily on linguistic and terminological resources. The creation and maintenance of such resources is labor-intensive. Nevertheless, it is sensible to assume that big data methods can be used to support this process. Examples include the learning of hierarchical relationships, the grouping of synonymous terms into concepts and the disambiguation of homonyms. Although clear evidence is still lacking, the combination of natural language technologies, semantic resources, and big data analytics is promising.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Antonio
Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less
Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases
Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand
2010-01-01
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778
Logic-based assessment of the compatibility of UMLS ontology sources
2011-01-01
Background The UMLS Metathesaurus (UMLS-Meta) is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies. UMLS-Meta is being used in many applications, including PubMed and ClinicalTrials.gov. The integration of new sources combines automatic techniques, expert assessment, and auditing protocols. The automatic techniques currently in use, however, are mostly based on lexical algorithms and often disregard the semantics of the sources being integrated. Results In this paper, we argue that UMLS-Meta’s current design and auditing methodologies could be significantly enhanced by taking into account the logic-based semantics of the ontology sources. We provide empirical evidence suggesting that UMLS-Meta in its 2009AA version contains a significant number of errors; these errors become immediately apparent if the rich semantics of the ontology sources is taken into account, manifesting themselves as unintended logical consequences that follow from the ontology sources together with the information in UMLS-Meta. We then propose general principles and specific logic-based techniques to effectively detect and repair such errors. Conclusions Our results suggest that the methodologies employed in the design of UMLS-Meta are not only very costly in terms of human effort, but also error-prone. The techniques presented here can be useful for both reducing human effort in the design and maintenance of UMLS-Meta and improving the quality of its contents. PMID:21388571
Linking Publications to Instruments, Field Campaigns, Sites and Working Groups: The ARM Experience
NASA Astrophysics Data System (ADS)
Lehnert, K.; Parsons, M. A.; Ramachandran, R.; Fils, D.; Narock, T.; Fox, P. A.; Troyan, D.; Cialella, A. T.; Gregory, L.; Lazar, K.; Liang, M.; Ma, L.; Tilp, A.; Wagener, R.
2017-12-01
For the past 25 years, the ARM Climate Research Facility - a US Department of Energy scientific user facility - has been collecting atmospheric data in different climatic regimes using both in situ and remote instrumentation. Configuration of the facility's components has been designed to improve the understanding and representation, in climate and earth system models, of clouds and aerosols. Placing a premium on long-term continuous data collection resulted in terabytes of data having been collected, stored, and made accessible to any interested person. All data is accessible via the ARM.gov website and the ARM Data Discovery Tool. A team of metadata professionals assign appropriate tags to help facilitate searching the databases for desired data. The knowledge organization tools and concepts are used to create connections between data, instruments, field campaigns, sites, and measurements are familiar to informatics professionals. Ontology, taxonomy, classification, and thesauri are among the customized concepts put into practice for ARM's purposes. In addition to the multitude of data available, there have been approximately 3,000 journal articles that utilize ARM data. These have been linked to specific ARM web pages. Searches of the complete ARM publication database can be done using a separate interface. This presentation describes how ARM data is linked to instruments, sites, field campaigns, and publications through the application of standard knowledge organization tools and concepts.
OLIVER: an online library of images for veterinary education and research.
McGreevy, Paul; Shaw, Tim; Burn, Daniel; Miller, Nick
2007-01-01
As part of a strategic move by the University of Sydney toward increased flexibility in learning, the Faculty of Veterinary Science undertook a number of developments involving Web-based teaching and assessment. OLIVER underpins them by providing a rich, durable repository for learning objects. To integrate Web-based learning, case studies, and didactic presentations for veterinary and animal science students, we established an online library of images and other learning objects for use by academics in the Faculties of Veterinary Science and Agriculture. The objectives of OLIVER were to maximize the use of the faculty's teaching resources by providing a stable archiving facility for graphic images and other multimedia learning objects that allows flexible and precise searching, integrating indexing standards, thesauri, pull-down lists of preferred terms, and linking of objects within cases. OLIVER offers a portable and expandable Web-based shell that facilitates ongoing storage of learning objects in a range of media. Learning objects can be downloaded in common, standardized formats so that they can be easily imported for use in a range of applications, including Microsoft PowerPoint, WebCT, and Microsoft Word. OLIVER now contains more than 9,000 images relating to many facets of veterinary science; these are annotated and supported by search engines that allow rapid access to both images and relevant information. The Web site is easily updated and adapted as required.
Toward a common language for biobanking.
Fransson, Martin N; Rial-Sebbag, Emmanuelle; Brochhausen, Mathias; Litton, Jan-Eric
2015-01-01
To encourage the process of harmonization, the biobank community should support and use a common terminology. Relevant terms may be found in general thesauri for medicine, legal instruments or specific glossaries for biobanking. A comparison of the use of these sources has so far not been conducted and would be a useful instrument to further promote harmonization and data sharing. Thus, the purpose of the present study was to investigate the preference of definitions important for sharing biological samples and data. Definitions for 10 terms -[human] biobank, sample/specimen, sample collection, study, aliquot, coded, identifying information, anonymised, personal data and informed consent-were collected from several sources. A web-based questionnaire was sent to 560 European individuals working with biobanks asking to select their preferred definition for the terms. A total of 123 people participated in the survey, giving a response rate of 23%. The result was evaluated from four aspects: scope of definitions, potential regional differences, differences in semantics and definitions in the context of ontologies, guided by comments from responders. Indicative from the survey is the risk of focusing only on the research aspect of biobanking in definitions. Hence, it is recommended that important terms should be formulated in such a way that all areas of biobanking are covered to improve the bridges between research and clinical application. Since several of the terms investigated here within can also be found in a legal context, which may differ between countries, establishing what is a proper definition on how it adheres to law is also crucial.
Widmayer, Sonja; Sowislo, Julia F; Jungfer, Hermann A; Borgwardt, Stefan; Lang, Undine E; Stieglitz, Rolf D; Huber, Christian G
2018-01-01
Background: Aggression in psychoses is of high clinical importance, and volumetric MRI techniques have been used to explore its structural brain correlates. Methods: We conducted a systematic review searching EMBASE, ScienceDirect, and PsycINFO through September 2017 using thesauri representing aggression, psychosis, and brain imaging. We calculated effect sizes for each study and mean Hedge's g for whole brain (WB) volume. Methodological quality was established using the PRISMA checklist (PROSPERO: CRD42014014461). Results: Our sample consisted of 12 studies with 470 patients and 155 healthy controls (HC). After subtracting subjects due to cohort overlaps, 314 patients and 96 HC remained. Qualitative analyses showed lower volumes of WB, prefrontal regions, temporal lobe, hippocampus, thalamus and cerebellum, and higher volumes of lateral ventricles, amygdala, and putamen in violent vs. non-violent people with schizophrenia. In quantitative analyses, violent persons with schizophrenia exhibited a significantly lower WB volume than HC ( p = 0.004), and also lower than non-violent persons with schizophrenia ( p = 0.007). Conclusions: We reviewed evidence for differences in brain volume correlates of aggression in persons with schizophrenia. Our results point toward a reduced whole brain volume in violent as opposed to non-violent persons with schizophrenia. However, considerable sample overlap in the literature, lack of reporting of potential confounding variables, and missing research on affective psychoses limit our explanatory power. To permit stronger conclusions, further studies evaluating structural correlates of aggression in psychotic disorders are needed.
Consensus methods: review of original methods and their main alternatives used in public health
Bourrée, Fanny; Michel, Philippe; Salmi, Louis Rachid
2008-01-01
Summary Background Consensus-based studies are increasingly used as decision-making methods, for they have lower production cost than other methods (observation, experimentation, modelling) and provide results more rapidly. The objective of this paper is to describe the principles and methods of the four main methods, Delphi, nominal group, consensus development conference and RAND/UCLA, their use as it appears in peer-reviewed publications and validation studies published in the healthcare literature. Methods A bibliographic search was performed in Pubmed/MEDLINE, Banque de Données Santé Publique (BDSP), The Cochrane Library, Pascal and Francis. Keywords, headings and qualifiers corresponding to a list of terms and expressions related to the consensus methods were searched in the thesauri, and used in the literature search. A search with the same terms and expressions was performed on Internet using the website Google Scholar. Results All methods, precisely described in the literature, are based on common basic principles such as definition of subject, selection of experts, and direct or remote interaction processes. They sometimes use quantitative assessment for ranking items. Numerous variants of these methods have been described. Few validation studies have been implemented. Not implementing these basic principles and failing to describe the methods used to reach the consensus were both frequent reasons contributing to raise suspicion regarding the validity of consensus methods. Conclusion When it is applied to a new domain with important consequences in terms of decision making, a consensus method should be first validated. PMID:19013039
Global POPIN Advisory Committee meeting considers broad range of information issues.
1995-09-01
The Global Population Information Network (POPIN) Advisory Committee met in Bangkok in June 1995. The Bali Declaration on Population and Sustainable Development adopted in 1992, and the Program of Action of the International Conference on Population and Development in 1994 recognized the importance of information in achieving information, education, and communication (IEC) goals in the population field. The information revolution would help achieve the population goals of individual countries, as more regional information networks are being created. POPIN's primary objective is to increase the awareness, knowledge, and understanding of population-related issues. There has been discussion about merging the POPLINE and POPIN thesauri and translating the Population Multilingual Thesaurus in several more languages. At this meeting a set of recommendations were adopted about POPIN's future operations concerning the continued growth of the network as a decentralized organization; the continued functioning of the Global POPIN Coordinating Unit as the central coordinating node; that the specialized agencies of the United Nations should participate actively in POPIN activities; and that the networks in each region should share information. In the production and dissemination of information various media should be used, including print, CD-ROM, e-mail, and the Internet; documents should be created in electronic format with developed guidelines; meetings should held on topics in information technology to provide training opportunities and exchange experience; presentations at conferences and symposia should be available in electronic format and made available through the Internet via the POPIN Gopher/Web Server. Follow-up activities should include the diversification of funding, the avoidance of duplication, and the achievement of cost-effectiveness.
[RadLex - German version: a radiological lexicon for indexing image and report information].
Marwede, D; Daumke, P; Marko, K; Lobsien, D; Schulz, S; Kahn, T
2009-01-01
Since 2003 the Radiological Society of North America (RSNA) has been developing a lexicon of standardized radiological terms (RadLex) intended to support the structured reporting of imaging observations and the indexing of teaching cases. The aim of this study was to translate the first version of the lexicon (1 - 2007) into German and to implement a language-independent online term browser. RadLex version 1 - 2007 contains 6303 terms in nine main categories. Two radiologists independently translated the lexicon using medical dictionaries. Terms translated differently were revised and translated by consensus. For the development of an online term browser, a text processing algorithm called morphosemantic indexing was used which splits up words into small semantic units and compares those units to language-specific subword thesauri. In total 6240 of 6303 terms (99 %) were translated. Of those terms 3965 were German, 1893 were Latin, 359 were multilingual, and 23 were English terms that are also used in German and were therefore maintained. The online term browser supports a language-independent term search in RadLex (German/English) and other common medical terminology (e. g., ICD 10). The term browser displays term hierarchies and translations in different frames and the complexity of the result lists can be adapted by the user. RadLex version 1 - 2007 developed by the RSNA is now available in German and can be accessed online through a term browser with an efficient search function. This is an important precondition for the future comparison of national and international indexed radiological examination results and the interoperability between digital teaching resources.
Data-driven systems and system-driven data: the story of the Flanders Heritage Inventory (1995-2015)
NASA Astrophysics Data System (ADS)
Van Daele, K.; Meganck, L.; Mortier, S.
2015-08-01
Over the past 20 years, heritage inventories in Flanders (Belgium) have evolved from printed books to digital inventories. It is obvious that a system that publishes a digital inventory needs to adapt to the user requirements. But, after years of working with a digital inventory system, it has become apparent that not only has the system been developed to the users needs, but also that user practice and the resulting data have been shaped by the system. Thinking about domain models and thesauri influenced our thinking about our methodology of surveying. Seeing our data projected on a common basemap led us to realise how intertwined and interdependent different types of heritage can be. The need for structured metadata has impressed upon us the need for good quality data, guaranteed by data entry standards, validation tools, and a strict editing workflow. Just as the researchers have transitioned from seeing their respective inventories as being significantly different to actually seeing the similarities between them, the information specialists have come to the realisation that there are synergies that can be achieved with other systems, both within and outside of our organisation. Deploying our inventories on the web has also changed how we communicate with the general public. Newer channels such as email and social media have enabled a more interactive way of communicating. But throughout the years, one constant has remained. While we do not expect the systems to live on, we do want the data in them to be available to future generations.
The difficulties of systematic reviews.
Westgate, Martin J; Lindenmayer, David B
2017-10-01
The need for robust evidence to support conservation actions has driven the adoption of systematic approaches to research synthesis in ecology. However, applying systematic review to complex or open questions remains challenging, and this task is becoming more difficult as the quantity of scientific literature increases. We drew on the science of linguistics for guidance as to why the process of identifying and sorting information during systematic review remains so labor intensive, and to provide potential solutions. Several linguistic properties of peer-reviewed corpora-including nonrandom selection of review topics, small-world properties of semantic networks, and spatiotemporal variation in word meaning-greatly increase the effort needed to complete the systematic review process. Conversely, the resolution of these semantic complexities is a common motivation for narrative reviews, but this process is rarely enacted with the rigor applied during linguistic analysis. Therefore, linguistics provides a unifying framework for understanding some key challenges of systematic review and highlights 2 useful directions for future research. First, in cases where semantic complexity generates barriers to synthesis, ecologists should consider drawing on existing methods-such as natural language processing or the construction of research thesauri and ontologies-that provide tools for mapping and resolving that complexity. These tools could help individual researchers classify research material in a more robust manner and provide valuable guidance for future researchers on that topic. Second, a linguistic perspective highlights that scientific writing is a rich resource worthy of detailed study, an observation that can sometimes be lost during the search for data during systematic review or meta-analysis. For example, mapping semantic networks can reveal redundancy and complementarity among scientific concepts, leading to new insights and research questions. Consequently, wider adoption of linguistic approaches may facilitate improved rigor and richness in research synthesis. © 2017 Society for Conservation Biology.
Ng, Reuben; Allore, Heather G.; Trentalange, Mark; Monin, Joan K.; Levy, Becca R.
2015-01-01
Scholars argue about whether age stereotypes (beliefs about old people) are becoming more negative or positive over time. No previous study has systematically tested the trend of age stereotypes over more than 20 years, due to lack of suitable data. Our aim was to fill this gap by investigating whether age stereotypes have changed over the last two centuries and, if so, what may be associated with this change. We hypothesized that age stereotypes have increased in negativity due, in part, to the increasing medicalization of aging. This study applied computational linguistics to the recently compiled Corpus of Historical American English (COHA), a database of 400 million words that includes a range of printed sources from 1810 to 2009. After generating a comprehensive list of synonyms for the term elderly for these years from two historical thesauri, we identified 100 collocates (words that co-occurred most frequently with these synonyms) for each of the 20 decades. Inclusion criteria for the collocates were: (1) appeared within four words of the elderly synonym, (2) referred to an old person, and (3) had a stronger association with the elderly synonym than other words appearing in the database for that decade. This yielded 13,100 collocates that were rated for negativity and medicalization. We found that age stereotypes have become more negative in a linear way over 200 years. In 1880, age stereotypes switched from being positive to being negative. In addition, support was found for two potential explanations. Medicalization of aging and the growing proportion of the population over the age of 65 were both significantly associated with the increase in negative age stereotypes. The upward trajectory of age-stereotype negativity makes a case for remedial action on a societal level. PMID:25675438
Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains
Walk, Simon; Singer, Philipp; Strohmaier, Markus; Tudorache, Tania; Musen, Mark A.; Noy, Natalya F.
2014-01-01
Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50, 000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain. PMID:24953242
Semantics in NETMAR (open service NETwork for MARine environmental data)
NASA Astrophysics Data System (ADS)
Leadbetter, Adam; Lowry, Roy; Clements, Oliver
2010-05-01
Over recent years, there has been a proliferation of environmental data portals utilising a wide range of systems and services, many of which cannot interoperate. The European Union Framework 7 project NETMAR (that commenced February 2010) aims to provide a toolkit for building such portals in a coherent manner through the use of chained Open Geospatial Consortium Web Services (WxS), OPeNDAP file access and W3C standards controlled by a Business Process Execution Language workflow. As such, the end product will be configurable by user communities interested in developing a portal for marine environmental data, and will offer search, download and integration tools for a range of satellite, model and observed data from open ocean and coastal areas. Further processing of these data will also be available in order to provide statistics and derived products suitable for decision making in the chosen environmental domain. In order to make the resulting portals truly interoperable, the NETMAR programme requires a detailed definition of the semantics of the services being called and the data which are being requested. A key goal of the NETMAR programme is, therefore, to develop a multi-domain and multilingual ontology of marine data and services. This will allow searches across both human languages and across scientific domains. The approach taken will be to analyse existing semantic resources and provide mappings between them, gluing together the definitions, semantics and workflows of the WxS services. The mappings between terms aim to be more general than the standard "narrower than", "broader than" type seen in the thesauri or simple ontologies implemented by previous programmes. Tools for the development and population of ontologoies will also be provided by NETMAR as there will be instances in which existing resources cannot sufficiently describe newly encountered data or services.
Discovering beaten paths in collaborative ontology-engineering projects using Markov chains.
Walk, Simon; Singer, Philipp; Strohmaier, Markus; Tudorache, Tania; Musen, Mark A; Noy, Natalya F
2014-10-01
Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain. Copyright © 2014 Elsevier Inc. All rights reserved.
Hoehndorf, Robert; Alshahrani, Mona; Gkoutos, Georgios V; Gosline, George; Groom, Quentin; Hamann, Thomas; Kattge, Jens; de Oliveira, Sylvia Mota; Schmidt, Marco; Sierra, Soraya; Smets, Erik; Vos, Rutger A; Weiland, Claus
2016-11-14
The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Behzadifar, Masoud; Gorji, Hasan Abolghasem; Rezapour, Aziz; Bragazzi, Nicola Luigi
2018-05-09
Hepatitis C virus (HCV) is one of the major public health problems both in developed and developing countries. Prison represents a high-risk environment for prisoners, in that it is characterized by high-risk behaviors such as injecting drug use (IDU), tattooing, unprotected sexual intercourses, or sharing syringes. The aim of this study was to quantitatively evaluate the prevalence of HCV among Iranian prisoners conducting a systematic review and meta-analysis. We searched different scholarly databases including Embase, PubMed/MEDLINE, ISI/Web of Sciences, the Cochrane library, Scopus, CINAHL, and PsycINFO as well as Iranian bibliographic thesauri (namely, Barakatns, MagIran, and SID) up to December 2017. The Newcastle Ottawa Scale (NOS) was used to assess the quality of the studies included. HCV prevalence rate with its 95% confidence interval (CI) was estimated using the DerSimonian-Laird random-effects model, with Freeman-Tukey double arcsine transformation. Egger's regression test was used to evaluate publication bias. Finally, 17 articles were selected based on inclusion and exclusion criteria. Overall, 18,693 prisoners were tested. Based on the random-effects model, the prevalence of HCV among Iranian prisoners was 28% (CI 95% 21-36) with heterogeneity of I 2 = 99.3% (p = 0.00). All studies used an ELISA test for the evaluation of HCV antibodies. The findings of this study showed that the highest prevalence rate (53%) was among prisoners who inject drugs. The findings of our study showed that the prevalence of HCV among Iranian prisoners is dramatically high. Managing this issue in Iran's prisons requires careful attention to the availability of health facilities and instruments, such as screening, and harm reduction policies, such as giving sterile syringes and needles to prisoners. An integrated program of training for prisoners, prison personnel and medical staff is also needed to improve the level of health condition in prisons.
Practical solutions to implementing "Born Semantic" data systems
NASA Astrophysics Data System (ADS)
Leadbetter, A.; Buck, J. J. H.; Stacey, P.
2015-12-01
The concept of data being "Born Semantic" has been proposed in recent years as a Semantic Web analogue to the idea of data being "born digital"[1], [2]. Within the "Born Semantic" concept, data are captured digitally and at a point close to the time of creation are annotated with markup terms from semantic web resources (controlled vocabularies, thesauri or ontologies). This allows heterogeneous data to be more easily ingested and amalgamated in near real-time due to the standards compliant annotation of the data. In taking the "Born Semantic" proposal from concept to operation, a number of difficulties have been encountered. For example, although there are recognised methods such as Header, Dictionary, Triples [3] for the compression, publication and dissemination of large volumes of triples these systems are not practical to deploy in the field on low-powered (both electrically and computationally) devices. Similarly, it is not practical for instruments to output fully formed semantically annotated data files if they are designed to be plugged into a modular system and the data to be centrally logged in the field as is the case on Argo floats and oceanographic gliders where internal bandwidth becomes an issue [2]. In light of these issues, this presentation will concentrate on pragmatic solutions being developed to the problem of generating Linked Data in near real-time systems. Specific examples from the European Commission SenseOCEAN project where Linked Data systems are being developed for autonomous underwater platforms, and from work being undertaken in the streaming of data from the Irish Galway Bay Cable Observatory initiative will be highlighted. Further, developments of a set of tools for the LogStash-ElasticSearch software ecosystem to allow the storing and retrieval of Linked Data will be introduced. References[1] A. Leadbetter & J. Fredericks, We have "born digital" - now what about "born semantic"?, European Geophysical Union General Assembly, 2014.[2] J. Buck & A. Leadbetter, Born semantic: linking data from sensors to users and balancing hardware limitations with data standards, European Geophysical Union General Assembly, 2015.[3] J. Fernandez et al., Binary RDF Representation for Publication and Exchange (HDT), Web Semantics 19:22-41, 2013.
Castaneda, Christian; Nalley, Kip; Mannion, Ciaran; Bhattacharyya, Pritish; Blake, Patrick; Pecora, Andrew; Goy, Andre; Suh, K Stephen
2015-01-01
As research laboratories and clinics collaborate to achieve precision medicine, both communities are required to understand mandated electronic health/medical record (EHR/EMR) initiatives that will be fully implemented in all clinics in the United States by 2015. Stakeholders will need to evaluate current record keeping practices and optimize and standardize methodologies to capture nearly all information in digital format. Collaborative efforts from academic and industry sectors are crucial to achieving higher efficacy in patient care while minimizing costs. Currently existing digitized data and information are present in multiple formats and are largely unstructured. In the absence of a universally accepted management system, departments and institutions continue to generate silos of information. As a result, invaluable and newly discovered knowledge is difficult to access. To accelerate biomedical research and reduce healthcare costs, clinical and bioinformatics systems must employ common data elements to create structured annotation forms enabling laboratories and clinics to capture sharable data in real time. Conversion of these datasets to knowable information should be a routine institutionalized process. New scientific knowledge and clinical discoveries can be shared via integrated knowledge environments defined by flexible data models and extensive use of standards, ontologies, vocabularies, and thesauri. In the clinical setting, aggregated knowledge must be displayed in user-friendly formats so that physicians, non-technical laboratory personnel, nurses, data/research coordinators, and end-users can enter data, access information, and understand the output. The effort to connect astronomical numbers of data points, including '-omics'-based molecular data, individual genome sequences, experimental data, patient clinical phenotypes, and follow-up data is a monumental task. Roadblocks to this vision of integration and interoperability include ethical, legal, and logistical concerns. Ensuring data security and protection of patient rights while simultaneously facilitating standardization is paramount to maintaining public support. The capabilities of supercomputing need to be applied strategically. A standardized, methodological implementation must be applied to developed artificial intelligence systems with the ability to integrate data and information into clinically relevant knowledge. Ultimately, the integration of bioinformatics and clinical data in a clinical decision support system promises precision medicine and cost effective and personalized patient care.
Harmonised information exchange between decentralised food composition database systems.
Pakkala, H; Christensen, T; de Victoria, I Martínez; Presser, K; Kadvan, A
2010-11-01
The main aim of the European Food Information Resource (EuroFIR) project is to develop and disseminate a comprehensive, coherent and validated data bank for the distribution of food composition data (FCD). This can only be accomplished by harmonising food description and data documentation and by the use of standardised thesauri. The data bank is implemented through a network of local FCD storages (usually national) under the control and responsibility of the local (national) EuroFIR partner. The implementation of the system based on the EuroFIR specifications is under development. The data interchange happens through the EuroFIR Web Services interface, allowing the partners to implement their system using methods and software suitable for the local computer environment. The implementation uses common international standards, such as Simple Object Access Protocol, Web Service Description Language and Extensible Markup Language (XML). A specifically constructed EuroFIR search facility (eSearch) was designed for end users. The EuroFIR eSearch facility compiles queries using a specifically designed Food Data Query Language and sends a request to those network nodes linked to the EuroFIR Web Services that will most likely have the requested information. The retrieved FCD are compiled into a specifically designed data interchange format (the EuroFIR Food Data Transport Package) in XML, which is sent back to the EuroFIR eSearch facility as the query response. The same request-response operation happens in all the nodes that have been selected in the EuroFIR eSearch facility for a certain task. Finally, the FCD are combined by the EuroFIR eSearch facility and delivered to the food compiler. The implementation of FCD interchange using decentralised computer systems instead of traditional data-centre models has several advantages. First of all, the local partners have more control over their FCD, which will increase commitment and improve quality. Second, a multicentred solution is more economically viable than the creation of a centralised data bank, because of the lack of national political support for multinational systems.
Centrality-based Selection of Semantic Resources for Geosciences
NASA Astrophysics Data System (ADS)
Cerba, Otakar; Jedlicka, Karel
2017-04-01
Semantical questions intervene almost in all disciplines dealing with geographic data and information, because relevant semantics is crucial for any way of communication and interaction among humans as well as among machines. But the existence of such a large number of different semantic resources (such as various thesauri, controlled vocabularies, knowledge bases or ontologies) makes the process of semantics implementation much more difficult and complicates the use of the advantages of semantics. This is because in many cases users are not able to find the most suitable resource for their purposes. The research presented in this paper introduces a methodology consisting of an analysis of identical relations in Linked Data space, which covers a majority of semantic resources, to find a suitable resource of semantic information. Identical links interconnect representations of an object or a concept in various semantic resources. Therefore this type of relations is considered to be crucial from the view of Linked Data, because these links provide new additional information, including various views on one concept based on different cultural or regional aspects (so-called social role of Linked Data). For these reasons it is possible to declare that one reasonable criterion for feasible semantic resources for almost all domains, including geosciences, is their position in a network of interconnected semantic resources and level of linking to other knowledge bases and similar products. The presented methodology is based on searching of mutual connections between various instances of one concept using "follow your nose" approach. The extracted data on interconnections between semantic resources are arranged to directed graphs and processed by various metrics patterned on centrality computing (degree, closeness or betweenness centrality). Semantic resources recommended by the research could be used for providing semantically described keywords for metadata records or as names of items in data models. Such an approach enables much more efficient data harmonization, integration, sharing and exploitation. * * * * This publication was supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports. This publication was supported by project Data-Driven Bioeconomy (DataBio) from the ICT-15-2016-2017, Big Data PPP call.
NASA Astrophysics Data System (ADS)
Cialone, Claudia; Stock, Kristin
2010-05-01
EuroGEOSS is a European Commission funded project. It aims at improving a scientific understanding of the complex mechanisms which drive changes affecting our planet, identifying and establishing interoperable arrangements between environmental information systems. These systems would be sustained and operated by organizations with a clear mandate and resources and rendered available following the specifications of already existent frameworks such as GEOSS (the Global Earth Observation System of systems)1 and INSPIRE (the Infrastructure for Spatial Information in the European Community)2. The EuroGEOSS project's infrastructure focuses on three thematic areas: forestry, drought and biodiversity. One of the important activities in the project is the retrieval, parsing and harmonization of the large amount of heterogeneous environmental data available at local, regional and global levels between these strategic areas. The challenge is to render it semantically and technically interoperable in a simple way. An initial step in achieving this semantic and technical interoperability involves the selection of appropriate classification schemes (for example, thesauri, ontologies and controlled vocabularies) to describe the resources in the EuroGEOSS framework. These classifications become a crucial part of the interoperable framework scaffolding because they allow data providers to describe their resources and thus support resource discovery, execution and orchestration of varying levels of complexity. However, at present, given the diverse range of environmental thesauri, controlled vocabularies and ontologies and the large number of resources provided by project participants, the selection of appropriate classification schemes involves a number of considerations. First of all, there is the semantic difficulty of selecting classification schemes that contain concepts that are relevant to each thematic area. Secondly, EuroGEOSS is intended to accommodate a number of existing environmental projects (for example, GEOSS and INSPIRE). This requirement imposes constraints on the selection. Thirdly, the selected classification scheme or group of schemes (if more than one) must be capable of alignment (establishing different kinds of mappings between concepts, hence preserving intact the original knowledge schemes) or merging (the creation of another unique ontology from the original ontological sources) (Pérez-Gómez et al., 2004). Last but not least, there is the issue of including multi-lingual schemes that are based on free, open standards (non-proprietary). Using these selection criteria, we aim to support open and convenient data discovery and exchange for users who speak different languages (particularly the European ones for the broad scopes of EuroGEOSS). In order to support the project, we have developed a solution that employs two classification schemes: the Societal Benefit Areas (SBAs)3: the upper-level environmental categorization developed for the GEOSS project and the GEneral Multilingual Environmental Thesaurus (GEMET)4: a general environmental thesaurus whose conceptual structure has already been integrated with the spatial data themes proposed by the INSPIRE project. The former seems to provide the spatial data keywords relevant to the INSPIRE's Directive (JRC, 2008). In this way, we provide users with a basic set of concepts to support resource description and discovery in the thematic areas while supporting the requirements of INSPIRE and GEOSS. Furthermore, the use of only two classification schemes together with the fact that the SBAs are very general categories while GEMET includes much more detailed, yet still top-level, concepts, makes alignment an achievable task. Alignment was selected over merging because it leaves the existing classification schemes intact and requires only a simple activity of defining mappings from GEMET to the SBAs. In order to accomplish this task we are developing a simple, automated, open-source application to assist thematic experts in defining the mappings between concepts in the two classification schemes. The application will then generate SKOS mappings (exactMatch, closeMatch, broadMatch, narrowMatch, relatedMatch) based on thematic expert selections between the concepts in GEMET with the SBAs (including both the general Societal Benefit Areas and their subcategories). Once these mappings are defined and the SKOS files generated, resource providers will be able to select concepts from either GEMET or the SBAs (or a mixture) to describe their resources, and discovery approaches will support selection of concepts from either classification scheme, also returning results classified using the other scheme. While the focus of our work has been on the SBAs and GEMET, we also plan to provide a method for resource providers to further extend the semantic infrastructure by defining alignments to new classification schemes if these are required to support particular specialized thematic areas that are not covered by GEMET. In this way, the approach is flexible and suited to the general scope of EuroGEOSS, allowing specialists to increase at will the level of semantic quality and specificity of data to the initial infrastructural skeleton of the project. References ____________________________________________ Joint research Centre (JRC), 2008. INSPIRE Metadata Editor User Guide Pérez-Gómez A., Fernandez-Lopez M., Corcho O. Ontological engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web.Spinger: London, 2004
The XML Metadata Editor of GFZ Data Services
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele
2017-04-01
Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor to be reused to generate metadata for data discovery and contextual metadata developed by the 'Multi-scale Laboratories' Thematic Core Service of the European Plate Observing System (EPOS-IP). The Editor will be used to build a common repository of a large variety of geological and geophysical datasets produced by multidisciplinary laboratories throughout Europe, thus contributing to a significant step toward the integration and accessibility of earth science data. This presentation will introduce the metadata editor and show the adjustments made for EPOS-IP. [1] http://dataservices.gfz-potsdam.de/panmetaworks/metaedit
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Su, Yan; Andrews, James; Huang, Hong; Wang, Yue; Kong, Liangliang; Cannon, Peter; Xu, Ping
2016-05-23
PubMed is a widely used database for scientists to find biomedical-related literature. Due to the complexity of the selected research subject and its interdisciplinary nature, as well as the exponential growth in the number of disparate pieces of biomedical literature, it is an overwhelming challenge for scientists to define the right search strategies and quickly locate all related information. Specialized subsets and groupings of controlled vocabularies, such as Medical Subject Headings (MeSH), can enhance information retrieval in specialized domains, such as stem cell research. There is a need to develop effective search strategies and convenient solutions for knowledge organization in stem cell research. The understanding of the interrelationships between these MeSH terms also facilitates the building of knowledge organization systems in related subject fields. This study collected empirical data for MeSH-related terms from stem cell literature and developed a novel approach that uses both automation and expert-selection to create a set of terms that supports enhanced retrieval. The selected MeSH terms were reconstructed into a classified thesaurus that can guide researchers towards a successful search and knowledge organization of stem cell literature. First, 4253 MeSH terms were harvested from a sample of 5527 stem cell related research papers from the PubMed database. Next, unrelated terms were filtered out based on term frequency and specificity. Precision and recall measures were used to help identify additional valuable terms, which were mostly non-MeSH terms. The study identified 15 terms that specifically referred to stem cell research for information retrieval, which would yield a higher precision (97.7 %) and recall (94.4 %) rates in comparison to other approaches. In addition, 128 root MeSH terms were selected to conduct knowledge organization of stem cell research in categories of anatomy, disease, and others. This study presented a novel strategy and procedure to reengineer term selections of the MeSH thesaurus for literature retrieval and knowledge organization using stem cell research as a case. It could help scientists to select their own search terms and build up a thesaurus-based knowledge organization system in interested and interdisciplinary research subject areas.
The NERC Vocabulary Server: Version 2.0
NASA Astrophysics Data System (ADS)
Leadbetter, A.; Lowry, R.; Clements, O.
2012-04-01
The NERC Vocabulary Server (NVS) has been used to publish controlled vocabularies of terms relevant to the marine environmental sciences domain since 2006 (version 0) with version 1 being introduced in 2007. It has been used for • metadata mark-up with verifiable content • populating dynamic drop down lists • semantic cross-walk between metadata schemata • so-called smart search • and the semantic enablement of Open Geospatial Consortium Web Processing Services in projects including: the NERC Data Grid; SeaDataNet; Geo-Seas; and the European Marine Observation and Data Network (EMODnet). The NVS is based on the Simple Knowledge Organization System (SKOS) model and following a version change for SKOS in 2009 there was a desire to upgrade the NVS to incorporate the changes in this standard. SKOS is based on the "concept", which it defines as a "unit of thought", that is an idea or notion such as "oil spill". The latest version of SKOS introduces the ability to aggregate concepts in both collections and schemes. The design of version 2 of the NVS uses both types of aggregation: schemes for the discovery of content through hierarchical thesauri and collections for the publication and addressing of content. Other desired changes from version 1 of the NVS included: • the removal of the potential for multiple Uniform Resource Names for the same concept to ensure consistent identification of concepts • the addition of content and technical governance information in the payload documents to provide an audit trail to users of NVS content • the removal of XML snippets from concept definitions in order to correctly validate XML serializations of the SKOS • the addition of the ability to map into external knowledge organization systems in order to extend the knowledge base • a more truly RESTful approach URL access to the NVS to make the development of applications on top of the NVS easier • and support for multiple human languages to increase the user base of the NVS Version 2 of the NVS underpins the semantic layer for the Open Service Network for Marine Environmental Data (NETMAR) project, funded by the European Commission under the Seventh Framework Programme. Here we present the results of upgrading the NVS from version 1 to 2 and show applications which have been built on top of the NVS using its Application Programming Interface, including a demonstration version of a SPARQL interface.
Konstantinidis, S; Fernandez-Luque, L; Bamidis, P; Karlsen, R
2013-01-01
An increasing amount of health education resources for patients and professionals are distributed via social media channels. For example, thousands of health education videos are disseminated via YouTube. Often, tags are assigned by the disseminator. However, the lack of use of standardized terminologies in those tags and the presence of misleading videos make it particularly hard to retrieve relevant videos. i) Identify the use of standardized medical thesauri (SNOMED CT) in YouTube Health videos tags from preselected YouTube Channels and demonstrate an information technology (IT) architecture for treating the tags of these health (video) resources. ii) Investigate the relative percentage of the tags used that relate to SNOMED CT terms. As such resources may play a key role in educating professionals and patients, the use of standardized vocabularies may facilitate the sharing of such resources. iii) Demonstrate how such resources may be properly exploited within the new generation of semantically enriched content or learning management systems that allow for knowledge expansion through the use of linked medical data and numerous literature resources also described through the same vocabularies. We implemented a video portal integrating videos from 500 US Hospital channels. The portal integrated 4,307 YouTube videos regarding surgery as described by 64,367 tags. BioPortal REST services were used within our portal to match SNOMED CT terms with YouTube tags by both exact match and non-exact match. The whole architecture was complemented with a mechanism to enrich the retrieved video resources with other educational material residing in other repositories by following contemporary semantic web advances, in the form of Linked Open Data (LOD) principles. The average percentage of YouTube tags that were expressed using SNOMED CT terms was about 22.5%, while one third of YouTube tags per video contained a SNOMED CT term in a loose search; this analogy became one tenth in the case of exact match. Retrieved videos were then linked further to other resources by using LOD compliant systems. Such results were exemplified in the case of systems and technologies used in the mEducator EC funded project. YouTube Health videos can be searched for and retrieved using SNOMED CT terms with a high possibility of identifying health videos that users want based on their search criteria. Despite the fact that tagging of this information with SNOMED CT terms may vary, its availability and linked data capacity opens the door to new studies for personalized retrieval of content and linking with other knowledge through linked medical data and semantic advances in (learning) content management systems.
Life Sciences Data Archives (LSDA) in the Post-Shuttle Era
NASA Technical Reports Server (NTRS)
Fitts, Mary A.; Johnson-Throop, Kathy; Havelka, Jacque; Thomas, Diedre
2010-01-01
Now, more than ever before, NASA is realizing the value and importance of their intellectual assets. Principles of knowledge management-the systematic use and reuse of information, experience, and expertise to achieve a specific goal-are being applied throughout the agency. LSDA is also applying these solutions, which rely on a combination of content and collaboration technologies, to enable research teams to create, capture, share, and harness knowledge to do the things they do well, even better. In the early days of spaceflight, space life sciences data were collected and stored in numerous databases, formats, media-types and geographical locations. These data were largely unknown/unavailable to the research community. The Biomedical Informatics and Health Care Systems Branch of the Space Life Sciences Directorate at JSC and the Data Archive Project at ARC, with funding from the Human Research Program through the Exploration Medical Capability Element, are fulfilling these requirements through the systematic population of the Life Sciences Data Archive. This project constitutes a formal system for the acquisition, archival and distribution of data for HRP-related experiments and investigations. The general goal of the archive is to acquire, preserve, and distribute these data and be responsive to inquiries for the science communities. Information about experiments and data, as well as non-attributable human data and data from other species' are available on our public Web site http://lsda.jsc.nasa.gov. The Web site also includes a repository for biospecimens, and a utilization process. NASA has undertaken an initiative to develop a Shuttle Data Archive repository. The Shuttle program is nearing its end in 2010 and it is critical that the medical and research data related to the Shuttle program be captured, retained, and usable for research, lessons learned, and future mission planning. Communities of practice are groups of people who share a concern or a passion for something they do, and learn how to do it better as they interact regularly. LSDA works with the HRP community of practice to ensure that we are preserving the relevant research and data they need in the LSDA repository. An evidence-based approach to risk management is required in space life sciences. Evidence changes over time. LSDA has a pilot project with Collexis, a new type of Web-based search engine. Collexis differentiates itself from full-text search engines by making use of thesauri for information retrieval. The high-quality search is based on semantics that have been defined in a life sciences ontology. Additionally, Collexis' matching technology is unique, allowing discovery of partially matching dicuments. Users do not have to construct a complicated (Boolean) search query, but can simply enter a free text search without the risk of getting "no results". Collexis may address these issues by virtue of its retrieval and discovery capabilities across multiple repositories.
The NERC Vocabulary Server: Version 2.0
NASA Astrophysics Data System (ADS)
Leadbetter, A. M.; Lowry, R. K.
2012-12-01
The Natural Environment Research Council (NERC) Vocabulary Server (NVS) has been used to publish controlled vocabularies of terms relevant to marine environmental sciences since 2006 (version 0) with version 1 being introduced in 2007. It has been used for - metadata mark-up with verifiable content - populating dynamic drop down lists - semantic cross-walk between metadata schemata - so-called smart search - and the semantic enablement of Open Geospatial Consortium (OGC) Web Processing Services in the NERC Data Grid and the European Commission SeaDataNet, Geo-Seas, and European Marine Observation and Data Network (EMODnet) projects. The NVS is based on the Simple Knowledge Organization System (SKOS) model. SKOS is based on the "concept", which it defines as a "unit of thought", that is an idea or notion such as "oil spill". Following a version change for SKOS in 2009 there was a desire to upgrade the NVS to incorporate the changes. This version of SKOS introduces the ability to aggregate concepts in both collections and schemes. The design of version 2 of the NVS uses both types of aggregation: schemes for the discovery of content through hierarchical thesauri and collections for the publication and addressing of content. Other desired changes from version 1 of the NVS included: - the removal of the potential for multiple identifiers for the same concept to ensure consistent addressing of concepts - the addition of content and technical governance information in the payload documents to provide an audit trail to users of NVS content - the removal of XML snippets from concept definitions in order to correctly validate XML serializations of the SKOS - the addition of the ability to map into external knowledge organization systems in order to extend the knowledge base - a more truly RESTful approach URL access to the NVS to make the development of applications on top of the NVS easier - and support for multiple human languages to increase the user base of the NVS Version 2 of the NVS (NVS2.0) underpins the semantic layer for the Open Service Network for Marine Environmental Data (NETMAR) project, funded by the European Commission under the Seventh Framework Programme. Within NETMAR, NVS2.0 has been used for: - semantic validation of inputs to chained OGC Web Processing Services - smart discovery of data and services - integration of data from distributed nodes of the International Coastal Atlas Network Since its deployment, NVS2.0 has been adopted within the European SeaDataNet community's software products which has significantly increased the usage of the NVS2.0 Application Programming Interace (API), as illustrated in Table 1. Here we present the results of upgrading the NVS to version 2 and show applications which have been built on top of the NVS2.0 API, including a SPARQL endpoint and a hierarchical catalogue of oceanographic hardware.Table 1. NVS2.0 API usage by month from 467 unique IP addressest;
Different Categories of Astronomical Heritage: Issues and Challenges
NASA Astrophysics Data System (ADS)
Ruggles, Clive
2012-09-01
Since 2008 the AWHWG has, on behalf of the IAU, been working with UNESCO and its advisory bodies to help identify, safeguard and promote cultural properties relating to astronomy and, where possible, to try to facilitate the eventual nomination of key astronomical heritage sites onto the World Heritage List. Unfortunately, the World Heritage Convention only covers fixed sites (i.e., the tangible immovable heritage of astronomy), and a key question for the UNESCO-IAU Astronomy and World Heritage Initiative (AWHI) is the extent to which the tangible moveable and intangible heritage of astronomy (e.g. moveable instruments; ideas and theories) influence the assessment of the tangible immovable heritage. Clearly, in an ideal world we should be concerned not only with tangible immovable heritage but, to quote the AWHWG's own Terms of Reference, ``to help ensure that cultural properties and artefacts significant in the development of astronomy, together with the intangible heritage of astronomy, are duly studied, protected and maintained, both for the greater benefit of humankind and to the potential benefit of future historical research''. With this in mind, the IAU/INAF symposium on ``Astronomy and its Instruments before and after Galileo'' held in Venice in Sep-Oct 2009 recommended that urgent steps should be taken 1. to sensitise astronomers and the general public, and particularly observatory directors and others with direct influence and control over astronomical resources, to the importance of identifying, protecting and preserving the various material products of astronomical research and discovery that already have, or have significant potential to acquire, universal value; (N.B. National or regional interests and concerns have no relevance in the assessment of ``universal value'', which, by definition, extends beyond cultural boundaries and, by reasonable expectation, down the generations into the future. 2. to identify modes of interconnectivity between different forms of astronomical heritage, including its intangible aspects, that will help in the development of more integrated approaches to identification and cataloguing, protection and preservation; and 3. to increase global awareness of regional, national and local initiatives relating to astronomical heritage in all its forms. In pursuance of these aims, the meeting also recommended that the AWHWG, working in collaboration with the WGs on Astronomical Instruments and Archives, and other bodies as appropriate, should develop the following additional projects: 1. to establish guidelines to help in the identification and safeguarding of tangible and intangible astronomical heritage in all its forms; 2. to gather examples of existing best practice, and to make these available as case studies on their website; and 3. to develop the website of the Astronomy and World Heritage Initiative (AWHI) as a portal to existing on-line catalogues and thesauri. It also recommended that the WGs should work together to: 1. formulate recommendations about the ways in which links and common approaches should be developed in the future; and 2. organise a meeting of international experts in the historical and heritage aspects of astronomical structures, instruments, and archives, focussed specifically upon the task of developing more integrated approaches to identification and cataloguing, protection and preservation. This joint session will attempt to make headway on as many as possible of these issues. In this opening talk I will attempt to lay out some of the main challenges that we face, and outline what we hope to achieve in this session.
Modelling and approaching pragmatic interoperability of distributed geoscience data
NASA Astrophysics Data System (ADS)
Ma, Xiaogang
2010-05-01
Interoperability of geodata, which is essential for sharing information and discovering insights within a cyberinfrastructure, is receiving increasing attention. A key requirement of interoperability in the context of geodata sharing is that data provided by local sources can be accessed, decoded, understood and appropriately used by external users. Various researchers have discussed that there are four levels in data interoperability issues: system, syntax, schematics and semantics, which respectively relate to the platform, encoding, structure and meaning of geodata. Ontology-driven approaches have been significantly studied addressing schematic and semantic interoperability issues of geodata in the last decade. There are different types, e.g. top-level ontologies, domain ontologies and application ontologies and display forms, e.g. glossaries, thesauri, conceptual schemas and logical theories. Many geodata providers are maintaining their identified local application ontologies in order to drive standardization in local databases. However, semantic heterogeneities often exist between these local ontologies, even though they are derived from equivalent disciplines. In contrast, common ontologies are being studied in different geoscience disciplines (e.g., NAMD, SWEET, etc.) as a standardization procedure to coordinate diverse local ontologies. Semantic mediation, e.g. mapping between local ontologies, or mapping local ontologies to common ontologies, has been studied as an effective way of achieving semantic interoperability between local ontologies thus reconciling semantic heterogeneities in multi-source geodata. Nevertheless, confusion still exists in the research field of semantic interoperability. One problem is caused by eliminating elements of local pragmatic contexts in semantic mediation. Comparing to the context-independent feature of a common domain ontology, local application ontologies are closely related to elements (e.g., people, time, location, intention, procedure, consequence, etc.) of local pragmatic contexts and thus context-dependent. Elimination of these elements will inevitably lead to information loss in semantic mediation between local ontologies. Correspondingly, understanding and effect of exchanged data in a new context may differ from that in its original context. Another problem is the dilemma on how to find a balance between flexibility and standardization of local ontologies, because ontologies are not fixed, but continuously evolving. It is commonly realized that we cannot use a unified ontology to replace all local ontologies because they are context-dependent and need flexibility. However, without coordination of standards, freely developed local ontologies and databases will bring enormous work of mediation between them. Finding a balance between standardization and flexibility for evolving ontologies, in a practical sense, requires negotiations (i.e. conversations, agreements and collaborations) between different local pragmatic contexts. The purpose of this work is to set up a computer-friendly model representing local pragmatic contexts (i.e. geodata sources), and propose a practical semantic negotiation procedure for approaching pragmatic interoperability between local pragmatic contexts. Information agents, objective facts and subjective dimensions are reviewed as elements of a conceptual model for representing pragmatic contexts. The author uses them to draw a practical semantic negotiation procedure approaching pragmatic interoperability of distributed geodata. The proposed conceptual model and semantic negotiation procedure were encoded with Description Logic, and then applied to analyze and manipulate semantic negotiations between different local ontologies within the National Mineral Resources Assessment (NMRA) project of China, which involves multi-source and multi-subject geodata sharing.