SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.
Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan
2014-08-15
Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
A novel adaptive Cuckoo search for optimal query plan generation.
Gomathi, Ramalingam; Sharmila, Dhandapani
2014-01-01
The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.
A journey to Semantic Web query federation in the life sciences.
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-10-01
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-01-01
Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
Semantator: semantic annotator for converting biomedical text to linked data.
Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G
2013-10-01
More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.
Semantic Annotations and Querying of Web Data Sources
NASA Astrophysics Data System (ADS)
Hornung, Thomas; May, Wolfgang
A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.
2013-01-01
Background Clinical Intelligence, as a research and engineering discipline, is dedicated to the development of tools for data analysis for the purposes of clinical research, surveillance, and effective health care management. Self-service ad hoc querying of clinical data is one desirable type of functionality. Since most of the data are currently stored in relational or similar form, ad hoc querying is problematic as it requires specialised technical skills and the knowledge of particular data schemas. Results A possible solution is semantic querying where the user formulates queries in terms of domain ontologies that are much easier to navigate and comprehend than data schemas. In this article, we are exploring the possibility of using SADI Semantic Web services for semantic querying of clinical data. We have developed a prototype of a semantic querying infrastructure for the surveillance of, and research on, hospital-acquired infections. Conclusions Our results suggest that SADI can support ad-hoc, self-service, semantic queries of relational data in a Clinical Intelligence context. The use of SADI compares favourably with approaches based on declarative semantic mappings from data schemas to ontologies, such as query rewriting and RDFizing by materialisation, because it can easily cope with situations when (i) some computation is required to turn relational data into RDF or OWL, e.g., to implement temporal reasoning, or (ii) integration with external data sources is necessary. PMID:23497556
A Query Integrator and Manager for the Query Web
Brinkley, James F.; Detwiler, Landon T.
2012-01-01
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
Taboada, María; Martínez, Diego; Pilo, Belén; Jiménez-Escrig, Adriano; Robinson, Peter N; Sobrido, María J
2012-07-31
Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research.
Hybrid Filtering in Semantic Query Processing
ERIC Educational Resources Information Center
Jeong, Hanjo
2011-01-01
This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…
Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel
2013-04-15
In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
2013-01-01
Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394
SPARK: Adapting Keyword Query to Semantic Search
NASA Astrophysics Data System (ADS)
Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong
Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.
SPARQL Assist language-neutral query composer
2012-01-01
Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327
SPARQL assist language-neutral query composer.
McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark
2012-01-25
SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.
Graph Mining Meets the Semantic Web
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less
2012-01-01
Background Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction. Methods Due to the current availability of ontological and terminological resources that have already reached some consensus in biomedicine, a reuse-based ontology engineering approach was followed. The proposed approach uses the Ontology Web Language (OWL) to represent the phenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) to bridge the gap between phenotype descriptions and clinical data, and the Semantic Query Web Rule Language (SQWRL) to query relevant phenotype-genotype bidirectional relationships. The work tests the use of semantic web technology in the biomedical research domain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results A framework to query relevant phenotype-genotype bidirectional relationships is provided. Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules in terms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed following the initial list of competency questions. As the approach is based on OWL, the semantic of the framework adapts the standard logical model of an open world assumption. Conclusions This work demonstrates how semantic web technologies can be used to support flexible representation and computational inference mechanisms required to query patient datasets at different levels of abstraction. The open world assumption is especially good for describing only partially known phenotype-genotype relationships, in a way that is easily extensible. In future, this type of approach could offer researchers a valuable resource to infer new data from patient data for statistical analysis in translational research. In conclusion, phenotype description formalization and mapping to clinical data are two key elements for interchanging knowledge between basic and clinical research. PMID:22849591
Graph-Based Semantic Web Service Composition for Healthcare Data Integration.
Arch-Int, Ngamnij; Arch-Int, Somjit; Sonsilphong, Suphachoke; Wanchai, Paweena
2017-01-01
Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement.
Graph-Based Semantic Web Service Composition for Healthcare Data Integration
2017-01-01
Within the numerous and heterogeneous web services offered through different sources, automatic web services composition is the most convenient method for building complex business processes that permit invocation of multiple existing atomic services. The current solutions in functional web services composition lack autonomous queries of semantic matches within the parameters of web services, which are necessary in the composition of large-scale related services. In this paper, we propose a graph-based Semantic Web Services composition system consisting of two subsystems: management time and run time. The management-time subsystem is responsible for dependency graph preparation in which a dependency graph of related services is generated automatically according to the proposed semantic matchmaking rules. The run-time subsystem is responsible for discovering the potential web services and nonredundant web services composition of a user's query using a graph-based searching algorithm. The proposed approach was applied to healthcare data integration in different health organizations and was evaluated according to two aspects: execution time measurement and correctness measurement. PMID:29065602
Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi
2009-01-01
The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
NASA Astrophysics Data System (ADS)
Farhan Husain, Mohammad; Doshi, Pankil; Khan, Latifur; Thuraisingham, Bhavani
Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class hardware and still can answer queries fast enough. We conclude that ours is a scalable framework, able to handle large amount of RDF data efficiently.
A Research on E - learning Resources Construction Based on Semantic Web
NASA Astrophysics Data System (ADS)
Rui, Liu; Maode, Deng
Traditional e-learning platforms have the flaws that it's usually difficult to query or positioning, and realize the cross platform sharing and interoperability. In the paper, the semantic web and metadata standard is discussed, and a kind of e - learning system framework based on semantic web is put forward to try to solve the flaws of traditional elearning platforms.
Spatial information semantic query based on SPARQL
NASA Astrophysics Data System (ADS)
Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang
2009-10-01
How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.
The semantic web and computer vision: old AI meets new AI
NASA Astrophysics Data System (ADS)
Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.
2018-04-01
There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.
NASA Astrophysics Data System (ADS)
Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros
SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.
Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman
2013-01-01
Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.
Introducing glycomics data into the Semantic Web
2013-01-01
Background Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. Results In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. Conclusions We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains. PMID:24280648
Introducing glycomics data into the Semantic Web.
Aoki-Kinoshita, Kiyoko F; Bolleman, Jerven; Campbell, Matthew P; Kawano, Shin; Kim, Jin-Dong; Lütteke, Thomas; Matsubara, Masaaki; Okuda, Shujiro; Ranzinger, Rene; Sawaki, Hiromichi; Shikanai, Toshihide; Shinmachi, Daisuke; Suzuki, Yoshinori; Toukach, Philip; Yamada, Issaku; Packer, Nicolle H; Narimatsu, Hisashi
2013-11-26
Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
A novel visualization model for web search results.
Nguyen, Tien N; Zhang, Jin
2006-01-01
This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.
Progress toward a Semantic eScience Framework; building on advanced cyberinfrastructure
NASA Astrophysics Data System (ADS)
McGuinness, D. L.; Fox, P. A.; West, P.; Rozell, E.; Zednik, S.; Chang, C.
2010-12-01
The configurable and extensible semantic eScience framework (SESF) has begun development and implementation of several semantic application components. Extensions and improvements to several ontologies have been made based on distinct interdisciplinary use cases ranging from solar physics, to biologicl and chemical oceanography. Importantly, these semantic representations mediate access to a diverse set of existing and emerging cyberinfrastructure. Among the advances are the population of triple stores with web accessible query services. A triple store is akin to a relational data store where the basic stored unit is a subject-predicate-object tuple. Access via a query is provided by the W3 Recommendation language specification SPARQL. Upon this middle tier of semantic cyberinfrastructure, we have developed several forms of semantic faceted search, including provenance-awareness. We report on the rapid advances in semantic technologies and tools and how we are sustaining the software path for the required technical advances as well as the ontology improvements and increased functionality of the semantic applications including how they are integrated into web-based portals (e.g. Drupal) and web services. Lastly, we indicate future work direction and opportunities for collaboration.
Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi
2013-01-01
The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present “Entrez Neuron”, a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the ‘HCLS knowledgebase’ developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrates how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup. PMID:19745321
Waagmeester, Andra; Pico, Alexander R.
2016-01-01
The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web. PMID:27336457
Waagmeester, Andra; Kutmon, Martina; Riutta, Anders; Miller, Ryan; Willighagen, Egon L; Evelo, Chris T; Pico, Alexander R
2016-06-01
The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.
Regular paths in SparQL: querying the NCI Thesaurus.
Detwiler, Landon T; Suciu, Dan; Brinkley, James F
2008-11-06
OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.
Hybrid ontology for semantic information retrieval model using keyword matching indexing system.
Uthayan, K R; Mala, G S Anandha
2015-01-01
Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.
Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System
Uthayan, K. R.; Anandha Mala, G. S.
2015-01-01
Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851
EquiX-A Search and Query Language for XML.
ERIC Educational Resources Information Center
Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander
2002-01-01
Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying
Kiefer, Richard C.; Freimuth, Robert R.; Chute, Christopher G; Pathak, Jyotishman
Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively. PMID:24303249
Automatically exposing OpenLifeData via SADI semantic Web Services.
González, Alejandro Rodríguez; Callahan, Alison; Cruz-Toledo, José; Garcia, Adrian; Egaña Aranguren, Mikel; Dumontier, Michel; Wilkinson, Mark D
2014-01-01
Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Developing A Web-based User Interface for Semantic Information Retrieval
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Keller, Richard M.
2003-01-01
While there are now a number of languages and frameworks that enable computer-based systems to search stored data semantically, the optimal design for effective user interfaces for such systems is still uncle ar. Such interfaces should mask unnecessary query detail from users, yet still allow them to build queries of arbitrary complexity without significant restrictions. We developed a user interface supporting s emantic query generation for Semanticorganizer, a tool used by scient ists and engineers at NASA to construct networks of knowledge and dat a. Through this interface users can select node types, node attribute s and node links to build ad-hoc semantic queries for searching the S emanticOrganizer network.
Improving Concept-Based Web Image Retrieval by Mixing Semantically Similar Greek Queries
ERIC Educational Resources Information Center
Lazarinis, Fotis
2008-01-01
Purpose: Image searching is a common activity for web users. Search engines offer image retrieval services based on textual queries. Previous studies have shown that web searching is more demanding when the search is not in English and does not use a Latin-based language. The aim of this paper is to explore the behaviour of the major search…
Query Results Clustering by Extending SPARQL with CLUSTER BY
NASA Astrophysics Data System (ADS)
Ławrynowicz, Agnieszka
The task of dynamic clustering of the search results proved to be useful in the Web context, where the user often does not know the granularity of the search results in advance. The goal of this paper is to provide a declarative way for invoking dynamic clustering of the results of queries submitted over Semantic Web data. To achieve this goal the paper proposes an approach that extends SPARQL by clustering abilities. The approach introduces a new statement, CLUSTER BY, into the SPARQL grammar and proposes semantics for such extension.
Moby and Moby 2: creatures of the deep (web).
Vandervalk, Ben P; McCarthy, E Luke; Wilkinson, Mark D
2009-03-01
Facile and meaningful integration of data from disparate resources is the 'holy grail' of bioinformatics. Some resources have begun to address this problem by providing their data using Semantic Web standards, specifically the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Unfortunately, adoption of Semantic Web standards has been slow overall, and even in cases where the standards are being utilized, interconnectivity between resources is rare. In response, we have seen the emergence of centralized 'semantic warehouses' that collect public data from third parties, integrate it, translate it into OWL/RDF and provide it to the community as a unified and queryable resource. One limitation of the warehouse approach is that queries are confined to the resources that have been selected for inclusion. A related problem, perhaps of greater concern, is that the majority of bioinformatics data exists in the 'Deep Web'-that is, the data does not exist until an application or analytical tool is invoked, and therefore does not have a predictable Web address. The inability to utilize Uniform Resource Identifiers (URIs) to address this data is a barrier to its accessibility via URI-centric Semantic Web technologies. Here we examine 'The State of the Union' for the adoption of Semantic Web standards in the health care and life sciences domain by key bioinformatics resources, explore the nature and connectivity of several community-driven semantic warehousing projects, and report on our own progress with the CardioSHARE/Moby-2 project, which aims to make the resources of the Deep Web transparently accessible through SPARQL queries.
Collaborative E-Learning Using Semantic Course Blog
ERIC Educational Resources Information Center
Lu, Lai-Chen; Yeh, Ching-Long
2008-01-01
Collaborative e-learning delivers many enhancements to e-learning technology; it enables students to collaborate with each other and improves their learning efficiency. Semantic blog combines semantic Web and blog technology that users can import, export, view, navigate, and query the blog. We developed a semantic course blog for collaborative…
NASA Astrophysics Data System (ADS)
Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.
2014-05-01
Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).
AlzPharm: integration of neurodegeneration data using RDF.
Lam, Hugo Y K; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi
2007-05-09
Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields.
AlzPharm: integration of neurodegeneration data using RDF
Lam, Hugo YK; Marenco, Luis; Clark, Tim; Gao, Yong; Kinoshita, June; Shepherd, Gordon; Miller, Perry; Wu, Elizabeth; Wong, Gwendolyn T; Liu, Nian; Crasto, Chiquito; Morse, Thomas; Stephens, Susie; Cheung, Kei-Hoi
2007-01-01
Background Neuroscientists often need to access a wide range of data sets distributed over the Internet. These data sets, however, are typically neither integrated nor interoperable, resulting in a barrier to answering complex neuroscience research questions. Domain ontologies can enable the querying heterogeneous data sets, but they are not sufficient for neuroscience since the data of interest commonly span multiple research domains. To this end, e-Neuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of the very diverse types of neuroscience data. Here we present a Semantic Web approach to building this e-Neuroscience framework by using the Resource Description Framework (RDF) and its vocabulary description language, RDF Schema (RDFS), as a standard data model to facilitate both representation and integration of the data. Results We have constructed a pilot ontology for BrainPharm (a subset of SenseLab) using RDFS and then converted a subset of the BrainPharm data into RDF according to the ontological structure. We have also integrated the converted BrainPharm data with existing RDF hypothesis and publication data from a pilot version of SWAN (Semantic Web Applications in Neuromedicine). Our implementation uses the RDF Data Model in Oracle Database 10g release 2 for data integration, query, and inference, while our Web interface allows users to query the data and retrieve the results in a convenient fashion. Conclusion Accessing and integrating biomedical data which cuts across multiple disciplines will be increasingly indispensable and beneficial to neuroscience researchers. The Semantic Web approach we undertook has demonstrated a promising way to semantically integrate data sets created independently. It also shows how advanced queries and inferences can be performed over the integrated data, which are hard to achieve using traditional data integration approaches. Our pilot results suggest that our Semantic Web approach is suitable for realizing e-Neuroscience and generic enough to be applied in other biomedical fields. PMID:17493287
CNTRO: A Semantic Web Ontology for Temporal Relation Inferencing in Clinical Narratives.
Tao, Cui; Wei, Wei-Qi; Solbrig, Harold R; Savova, Guergana; Chute, Christopher G
2010-11-13
Using Semantic-Web specifications to represent temporal information in clinical narratives is an important step for temporal reasoning and answering time-oriented queries. Existing temporal models are either not compatible with the powerful reasoning tools developed for the Semantic Web, or designed only for structured clinical data and therefore are not ready to be applied on natural-language-based clinical narrative reports directly. We have developed a Semantic-Web ontology which is called Clinical Narrative Temporal Relation ontology. Using this ontology, temporal information in clinical narratives can be represented as RDF (Resource Description Framework) triples. More temporal information and relations can then be inferred by Semantic-Web based reasoning tools. Experimental results show that this ontology can represent temporal information in real clinical narratives successfully.
Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer
González-Castro, Lorena; Carta, Claudio; van der Horst, Eelke; Lopes, Pedro; Kaliyaperumal, Rajaram; Thompson, Mark; Thompson, Rachel; Queralt-Rosinach, Núria; Lopez, Estrella; Wood, Libby; Robertson, Agata; Lamanna, Claudia; Gilling, Mette; Orth, Michael; Merino-Martinez, Roxana; Taruscio, Domenica; Lochmüller, Hanns
2017-01-01
Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries. PMID:29214177
Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer.
Sernadela, Pedro; González-Castro, Lorena; Carta, Claudio; van der Horst, Eelke; Lopes, Pedro; Kaliyaperumal, Rajaram; Thompson, Mark; Thompson, Rachel; Queralt-Rosinach, Núria; Lopez, Estrella; Wood, Libby; Robertson, Agata; Lamanna, Claudia; Gilling, Mette; Orth, Michael; Merino-Martinez, Roxana; Posada, Manuel; Taruscio, Domenica; Lochmüller, Hanns; Robinson, Peter; Roos, Marco; Oliveira, José Luís
2017-01-01
Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.
2006-06-01
SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
vSPARQL: A View Definition Language for the Semantic Web
Shaw, Marianne; Detwiler, Landon T.; Noy, Natalya; Brinkley, James; Suciu, Dan
2010-01-01
Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. PMID:20800106
Designing learning management system interoperability in semantic web
NASA Astrophysics Data System (ADS)
Anistyasari, Y.; Sarno, R.; Rochmawati, N.
2018-01-01
The extensive adoption of learning management system (LMS) has set the focus on the interoperability requirement. Interoperability is the ability of different computer systems, applications or services to communicate, share and exchange data, information, and knowledge in a precise, effective and consistent way. Semantic web technology and the use of ontologies are able to provide the required computational semantics and interoperability for the automation of tasks in LMS. The purpose of this study is to design learning management system interoperability in the semantic web which currently has not been investigated deeply. Moodle is utilized to design the interoperability. Several database tables of Moodle are enhanced and some features are added. The semantic web interoperability is provided by exploited ontology in content materials. The ontology is further utilized as a searching tool to match user’s queries and available courses. It is concluded that LMS interoperability in Semantic Web is possible to be performed.
Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.
2006-12-01
The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Standard biological parts knowledgebase.
Galdzicki, Michal; Rodriguez, Cesar; Chandran, Deepak; Sauro, Herbert M; Gennari, John H
2011-02-24
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate "promoter" parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.
Web information retrieval based on ontology
NASA Astrophysics Data System (ADS)
Zhang, Jian
2013-03-01
The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
On2broker: Semantic-Based Access to Information Sources at the WWW.
ERIC Educational Resources Information Center
Fensel, Dieter; Angele, Jurgen; Decker, Stefan; Erdmann, Michael; Schnurr, Hans-Peter; Staab, Steffen; Studer, Rudi; Witt, Andreas
On2broker provides brokering services to improve access to heterogeneous, distributed, and semistructured information sources as they are presented in the World Wide Web. It relies on the use of ontologies to make explicit the semantics of Web pages. This paper discusses the general architecture and main components (i.e., query engine, information…
Pathak, Jyotishman; Kiefer, Richard C.; Chute, Christopher G.
2012-01-01
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale data federation approaches to execute complex queries. PMID:22779040
Incremental Query Rewriting with Resolution
NASA Astrophysics Data System (ADS)
Riazanov, Alexandre; Aragão, Marcelo A. T.
We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences
Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi
2006-01-01
Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
2008-07-01
Study. WWW2006 Workshop on the Models of Trust for the Web (MTW), Edinburgh, Scotland, May 22, 2006. • Daniel J. Weitzner, Hal Abelson, Tim Berners ...McGuinness gave an invited talk on ontologies in Intel’s Semantic web day. Other invited speakers were Hendler and Berners - Lee . February 4, 2002...Burke (DARPA) concerning ontology tools. July 19-20, 2000. McGuinness met with W3C representatives ( Berners - Lee , Connolly, Lassila) and other
Parikh, Priti P; Minning, Todd A; Nguyen, Vinh; Lalithsena, Sarasi; Asiaee, Amir H; Sahoo, Satya S; Doshi, Prashant; Tarleton, Rick; Sheth, Amit P
2012-01-01
Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.
NASA Astrophysics Data System (ADS)
Arenas, Marcelo; Gutierrez, Claudio; Pérez, Jorge
The Resource Description Framework (RDF) is the standard data model for representing information about World Wide Web resources. In January 2008, it was released the recommendation of the W3C for querying RDF data, a query language called SPARQL. In this chapter, we give a detailed description of the semantics of this language. We start by focusing on the definition of a formal semantics for the core part of SPARQL, and then move to the definition for the entire language, including all the features in the specification of SPARQL by the W3C such as blank nodes in graph patterns and bag semantics for solutions.
vSPARQL: a view definition language for the semantic web.
Shaw, Marianne; Detwiler, Landon T; Noy, Natalya; Brinkley, James; Suciu, Dan
2011-02-01
Translational medicine applications would like to leverage the biological and biomedical ontologies, vocabularies, and data sets available on the semantic web. We present a general solution for RDF information set reuse inspired by database views. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. We evaluate the expressivity of our approach by defining views for practical use cases and comparing our view definition language to existing query languages. Copyright © 2010 Elsevier Inc. All rights reserved.
Auditing the NCI Thesaurus with Semantic Web Technologies
Mougin, Fleur; Bodenreider, Olivier
2008-01-01
Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple store. By querying this store, we assessed the consistency of both hierarchical and associative relations from the NCI thesaurus among themselves and with corresponding relations in the UMLS Semantic Network. We show that the consistency is better for associative relations than for hierarchical relations. Causes for inconsistency and benefits from using Semantic Web technologies for auditing purposes are discussed. PMID:18999265
Auditing the NCI thesaurus with semantic web technologies.
Mougin, Fleur; Bodenreider, Olivier
2008-11-06
Auditing biomedical terminologies often results in the identification of inconsistencies and thus helps to improve their quality. In this paper, we present a method based on Semantic Web technologies for auditing biomedical terminologies and apply it to the NCI thesaurus. We stored the NCI thesaurus concepts and their properties in an RDF triple store. By querying this store, we assessed the consistency of both hierarchical and associative relations from the NCI thesaurus among themselves and with corresponding relations in the UMLS Semantic Network. We show that the consistency is better for associative relations than for hierarchical relations. Causes for inconsistency and benefits from using Semantic Web technologies for auditing purposes are discussed.
Standard Biological Parts Knowledgebase
Galdzicki, Michal; Rodriguez, Cesar; Chandran, Deepak; Sauro, Herbert M.; Gennari, John H.
2011-01-01
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate “promoter” parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible. PMID:21390321
Hybrid Schema Matching for Deep Web
NASA Astrophysics Data System (ADS)
Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng
Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.
BioSWR – Semantic Web Services Registry for Bioinformatics
Repchevsky, Dmitry; Gelpi, Josep Ll.
2014-01-01
Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license. PMID:25233118
BioSWR--semantic web services registry for bioinformatics.
Repchevsky, Dmitry; Gelpi, Josep Ll
2014-01-01
Despite of the variety of available Web services registries specially aimed at Life Sciences, their scope is usually restricted to a limited set of well-defined types of services. While dedicated registries are generally tied to a particular format, general-purpose ones are more adherent to standards and usually rely on Web Service Definition Language (WSDL). Although WSDL is quite flexible to support common Web services types, its lack of semantic expressiveness led to various initiatives to describe Web services via ontology languages. Nevertheless, WSDL 2.0 descriptions gained a standard representation based on Web Ontology Language (OWL). BioSWR is a novel Web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional WSDL based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language. BioSWR server is located at http://inb.bsc.es/BioSWR/and its code is available at https://sourceforge.net/projects/bioswr/under the LGPL license.
Semantic integration of information about orthologs and diseases: the OGO system.
Miñarro-Gimenez, Jose Antonio; Egaña Aranguren, Mikel; Martínez Béjar, Rodrigo; Fernández-Breis, Jesualdo Tomás; Madrid, Marisa
2011-12-01
Semantic Web technologies like RDF and OWL are currently applied in life sciences to improve knowledge management by integrating disparate information. Many of the systems that perform such task, however, only offer a SPARQL query interface, which is difficult to use for life scientists. We present the OGO system, which consists of a knowledge base that integrates information of orthologous sequences and genetic diseases, providing an easy to use ontology-constrain driven query interface. Such interface allows the users to define SPARQL queries through a graphical process, therefore not requiring SPARQL expertise. Copyright © 2011 Elsevier Inc. All rights reserved.
Sahoo, Satya S.; Bodenreider, Olivier; Rutter, Joni L.; Skinner, Karen J.; Sheth, Amit P.
2008-01-01
Objectives This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. Methods We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Results Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Conclusion Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. Resource page http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/ PMID:18395495
Sahoo, Satya S; Bodenreider, Olivier; Rutter, Joni L; Skinner, Karen J; Sheth, Amit P
2008-10-01
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries. Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins. Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces. RESOURCE PAGE: http://knoesis.wright.edu/research/lifesci/integration/structured_data/JBI-2008/
An ontology-driven tool for structured data acquisition using Web forms.
Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I; Tierney, Michael J; Musen, Mark A
2017-08-01
Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining eligibility for disability benefits) based on conclusions derived from acquired data (e.g., assessment of impaired motor function). To use data in these settings, we need it structured in a way that can be exploited by automated reasoning systems, for instance, in the Web Ontology Language (OWL); the de facto ontology language for the Web. We tackle the problem of generating Web-based assessment forms from OWL ontologies, and aggregating input gathered through these forms as an ontology of "semantically-enriched" form data that can be queried using an RDF query language, such as SPARQL. We developed an ontology-based structured data acquisition system, which we present through its specific application to the clinical functional assessment domain. We found that data gathered through our system is highly amenable to automatic analysis using queries. We demonstrated how ontologies can be used to help structuring Web-based forms and to semantically enrich the data elements of the acquired structured data. The ontologies associated with the enriched data elements enable automated inferences and provide a rich vocabulary for performing queries.
Semantic Web repositories for genomics data using the eXframe platform.
Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna
2014-01-01
With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-01-16
The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less
Semantic Services in e-Learning: An Argumentation Case Study
ERIC Educational Resources Information Center
Moreale, Emanuela; Vargas-Vera, Maria
2004-01-01
This paper outlines an e-Learning services architecture offering semantic-based services to students and tutors, in particular ways to browse and obtain information through web services. Services could include registration, authentication, tutoring systems, smart question answering for students' queries, automated marking systems and a student…
A novel architecture for information retrieval system based on semantic web
NASA Astrophysics Data System (ADS)
Zhang, Hui
2011-12-01
Nowadays, the web has enabled an explosive growth of information sharing (there are currently over 4 billion pages covering most areas of human endeavor) so that the web has faced a new challenge of information overhead. The challenge that is now before us is not only to help people locating relevant information precisely but also to access and aggregate a variety of information from different resources automatically. Current web document are in human-oriented formats and they are suitable for the presentation, but machines cannot understand the meaning of document. To address this issue, Berners-Lee proposed a concept of semantic web. With semantic web technology, web information can be understood and processed by machine. It provides new possibilities for automatic web information processing. A main problem of semantic web information retrieval is that when these is not enough knowledge to such information retrieval system, the system will return to a large of no sense result to uses due to a huge amount of information results. In this paper, we present the architecture of information based on semantic web. In addiction, our systems employ the inference Engine to check whether the query should pose to Keyword-based Search Engine or should pose to the Semantic Search Engine.
HyQue: evaluating hypotheses using Semantic Web technologies.
Callahan, Alison; Dumontier, Michel; Shah, Nigam H
2011-05-17
Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks. We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF. HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at http://semanticscience.org/projects/hyque.
Analysis and visualization of disease courses in a semantically-enabled cancer registry.
Esteban-Gil, Angel; Fernández-Breis, Jesualdo Tomás; Boeker, Martin
2017-09-29
Regional and epidemiological cancer registries are important for cancer research and the quality management of cancer treatment. Many technological solutions are available to collect and analyse data for cancer registries nowadays. However, the lack of a well-defined common semantic model is a problem when user-defined analyses and data linking to external resources are required. The objectives of this study are: (1) design of a semantic model for local cancer registries; (2) development of a semantically-enabled cancer registry based on this model; and (3) semantic exploitation of the cancer registry for analysing and visualising disease courses. Our proposal is based on our previous results and experience working with semantic technologies. Data stored in a cancer registry database were transformed into RDF employing a process driven by OWL ontologies. The semantic representation of the data was then processed to extract semantic patient profiles, which were exploited by means of SPARQL queries to identify groups of similar patients and to analyse the disease timelines of patients. Based on the requirements analysis, we have produced a draft of an ontology that models the semantics of a local cancer registry in a pragmatic extensible way. We have implemented a Semantic Web platform that allows transforming and storing data from cancer registries in RDF. This platform also permits users to formulate incremental user-defined queries through a graphical user interface. The query results can be displayed in several customisable ways. The complex disease timelines of individual patients can be clearly represented. Different events, e.g. different therapies and disease courses, are presented according to their temporal and causal relations. The presented platform is an example of the parallel development of ontologies and applications that take advantage of semantic web technologies in the medical field. The semantic structure of the representation renders it easy to analyse key figures of the patients and their evolution at different granularity levels.
Semantic web data warehousing for caGrid.
McCusker, James P; Phillips, Joshua A; González Beltrán, Alejandra; Finkelstein, Anthony; Krauthammer, Michael
2009-10-01
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.
Active Wiki Knowledge Repository
2012-10-01
data using SPARQL queries or RESTful web-services; ‘gardening’ tools for examining the semantically tagged content in the wiki; high-level language tool...Tagging & RDF triple-store Fusion and inferences for collaboration Tools for Consuming Data SPARQL queries or RESTful WS Inference & Gardening tools...other stores using AW SPARQL queries and rendering templates; and 4) Interactively share maps and other content using annotation tools to post notes
Matos, Ely Edison; Campos, Fernanda; Braga, Regina; Palazzi, Daniele
2010-02-01
The amount of information generated by biological research has lead to an intensive use of models. Mathematical and computational modeling needs accurate description to share, reuse and simulate models as formulated by original authors. In this paper, we introduce the Cell Component Ontology (CelO), expressed in OWL-DL. This ontology captures both the structure of a cell model and the properties of functional components. We use this ontology in a Web project (CelOWS) to describe, query and compose CellML models, using semantic web services. It aims to improve reuse and composition of existent components and allow semantic validation of new models.
Semantic Web repositories for genomics data using the eXframe platform
2014-01-01
Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072
COEUS: “semantic web in a box” for biomedical applications
2012-01-01
Background As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. Results COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. Conclusions The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/. PMID:23244467
COEUS: "semantic web in a box" for biomedical applications.
Lopes, Pedro; Oliveira, José Luís
2012-12-17
As the "omics" revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter's complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a "semantic web in a box" approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
Jiang, Guoqian; Solbrig, Harold R; Chute, Christopher G
2011-01-01
A source of semantically coded Adverse Drug Event (ADE) data can be useful for identifying common phenotypes related to ADEs. We proposed a comprehensive framework for building a standardized ADE knowledge base (called ADEpedia) through combining ontology-based approach with semantic web technology. The framework comprises four primary modules: 1) an XML2RDF transformation module; 2) a data normalization module based on NCBO Open Biomedical Annotator; 3) a RDF store based persistence module; and 4) a front-end module based on a Semantic Wiki for the review and curation. A prototype is successfully implemented to demonstrate the capability of the system to integrate multiple drug data and ontology resources and open web services for the ADE data standardization. A preliminary evaluation is performed to demonstrate the usefulness of the system, including the performance of the NCBO annotator. In conclusion, the semantic web technology provides a highly scalable framework for ADE data source integration and standard query service.
HyQue: evaluating hypotheses using Semantic Web technologies
2011-01-01
Background Key to the success of e-Science is the ability to computationally evaluate expert-composed hypotheses for validity against experimental data. Researchers face the challenge of collecting, evaluating and integrating large amounts of diverse information to compose and evaluate a hypothesis. Confronted with rapidly accumulating data, researchers currently do not have the software tools to undertake the required information integration tasks. Results We present HyQue, a Semantic Web tool for querying scientific knowledge bases with the purpose of evaluating user submitted hypotheses. HyQue features a knowledge model to accommodate diverse hypotheses structured as events and represented using Semantic Web languages (RDF/OWL). Hypothesis validity is evaluated against experimental and literature-sourced evidence through a combination of SPARQL queries and evaluation rules. Inference over OWL ontologies (for type specifications, subclass assertions and parthood relations) and retrieval of facts stored as Bio2RDF linked data provide support for a given hypothesis. We evaluate hypotheses of varying levels of detail about the genetic network controlling galactose metabolism in Saccharomyces cerevisiae to demonstrate the feasibility of deploying such semantic computing tools over a growing body of structured knowledge in Bio2RDF. Conclusions HyQue is a query-based hypothesis evaluation system that can currently evaluate hypotheses about the galactose metabolism in S. cerevisiae. Hypotheses as well as the supporting or refuting data are represented in RDF and directly linked to one another allowing scientists to browse from data to hypothesis and vice versa. HyQue hypotheses and data are available at http://semanticscience.org/projects/hyque. PMID:21624158
SCALEUS: Semantic Web Services Integration for Biomedical Applications.
Sernadela, Pedro; González-Castro, Lorena; Oliveira, José Luís
2017-04-01
In recent years, we have witnessed an explosion of biological data resulting largely from the demands of life science research. The vast majority of these data are freely available via diverse bioinformatics platforms, including relational databases and conventional keyword search applications. This type of approach has achieved great results in the last few years, but proved to be unfeasible when information needs to be combined or shared among different and scattered sources. During recent years, many of these data distribution challenges have been solved with the adoption of semantic web. Despite the evident benefits of this technology, its adoption introduced new challenges related with the migration process, from existent systems to the semantic level. To facilitate this transition, we have developed Scaleus, a semantic web migration tool that can be deployed on top of traditional systems in order to bring knowledge, inference rules, and query federation to the existent data. Targeted at the biomedical domain, this web-based platform offers, in a single package, straightforward data integration and semantic web services that help developers and researchers in the creation process of new semantically enhanced information systems. SCALEUS is available as open source at http://bioinformatics-ua.github.io/scaleus/ .
Semantic similarity measure in biomedical domain leverage web search engine.
Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei
2010-01-01
Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
A semantically-aided architecture for a web-based monitoring system for carotid atherosclerosis.
Kolias, Vassileios D; Stamou, Giorgos; Golemati, Spyretta; Stoitsis, Giannis; Gkekas, Christos D; Liapis, Christos D; Nikita, Konstantina S
2015-08-01
Carotid atherosclerosis is a multifactorial disease and its clinical diagnosis depends on the evaluation of heterogeneous clinical data, such as imaging exams, biochemical tests and the patient's clinical history. The lack of interoperability between Health Information Systems (HIS) does not allow the physicians to acquire all the necessary data for the diagnostic process. In this paper, a semantically-aided architecture is proposed for a web-based monitoring system for carotid atherosclerosis that is able to gather and unify heterogeneous data with the use of an ontology and to create a common interface for data access enhancing the interoperability of HIS. The architecture is based on an application ontology of carotid atherosclerosis that is used to (a) integrate heterogeneous data sources on the basis of semantic representation and ontological reasoning and (b) access the critical information using SPARQL query rewriting and ontology-based data access services. The architecture was tested over a carotid atherosclerosis dataset consisting of the imaging exams and the clinical profile of 233 patients, using a set of complex queries, constructed by the physicians. The proposed architecture was evaluated with respect to the complexity of the queries that the physicians could make and the retrieval speed. The proposed architecture gave promising results in terms of interoperability, data integration of heterogeneous sources with an ontological way and expanded capabilities of query and retrieval in HIS.
The BiSciCol Triplifier: bringing biodiversity data to the Semantic Web.
Stucky, Brian J; Deck, John; Conlin, Tom; Ziemba, Lukasz; Cellinese, Nico; Guralnick, Robert
2014-07-29
Recent years have brought great progress in efforts to digitize the world's biodiversity data, but integrating data from many different providers, and across research domains, remains challenging. Semantic Web technologies have been widely recognized by biodiversity scientists for their potential to help solve this problem, yet these technologies have so far seen little use for biodiversity data. Such slow uptake has been due, in part, to the relative complexity of Semantic Web technologies along with a lack of domain-specific software tools to help non-experts publish their data to the Semantic Web. The BiSciCol Triplifier is new software that greatly simplifies the process of converting biodiversity data in standard, tabular formats, such as Darwin Core-Archives, into Semantic Web-ready Resource Description Framework (RDF) representations. The Triplifier uses a vocabulary based on the popular Darwin Core standard, includes both Web-based and command-line interfaces, and is fully open-source software. Unlike most other RDF conversion tools, the Triplifier does not require detailed familiarity with core Semantic Web technologies, and it is tailored to a widely popular biodiversity data format and vocabulary standard. As a result, the Triplifier can often fully automate the conversion of biodiversity data to RDF, thereby making the Semantic Web much more accessible to biodiversity scientists who might otherwise have relatively little knowledge of Semantic Web technologies. Easy availability of biodiversity data as RDF will allow researchers to combine data from disparate sources and analyze them with powerful linked data querying tools. However, before software like the Triplifier, and Semantic Web technologies in general, can reach their full potential for biodiversity science, the biodiversity informatics community must address several critical challenges, such as the widespread failure to use robust, globally unique identifiers for biodiversity data.
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
Gessler, Damian DG; Schiltz, Gary S; May, Greg D; Avraham, Shulamit; Town, Christopher D; Grant, David; Nelson, Rex T
2009-01-01
Background SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies. Results There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at , developer tools at , and a portal to third-party ontologies at (a "swap meet"). Conclusion SSWAP addresses the three basic requirements of a semantic web services architecture (i.e., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: i.e., i) the fatal mutability of traditional interfaces, ii) the rigidity and fragility of static subsumption hierarchies, and iii) the confounding of content, structure, and presentation. SSWAP is novel by establishing the concept of a canonical yet mutable OWL DL graph that allows data and service providers to describe their resources, to allow discovery servers to offer semantically rich search engines, to allow clients to discover and invoke those resources, and to allow providers to respond with semantically tagged data. SSWAP allows for a mix-and-match of terms from both new and legacy third-party ontologies in these graphs. PMID:19775460
Semantic web data warehousing for caGrid
McCusker, James P; Phillips, Joshua A; Beltrán, Alejandra González; Finkelstein, Anthony; Krauthammer, Michael
2009-01-01
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges. PMID:19796399
A Ubiquitous Sensor Network Platform for Integrating Smart Devices into the Semantic Sensor Web
de Vera, David Díaz Pardo; Izquierdo, Álvaro Sigüenza; Vercher, Jesús Bernat; Gómez, Luis Alfonso Hernández
2014-01-01
Ongoing Sensor Web developments make a growing amount of heterogeneous sensor data available to smart devices. This is generating an increasing demand for homogeneous mechanisms to access, publish and share real-world information. This paper discusses, first, an architectural solution based on Next Generation Networks: a pilot Telco Ubiquitous Sensor Network (USN) Platform that embeds several OGC® Sensor Web services. This platform has already been deployed in large scale projects. Second, the USN-Platform is extended to explore a first approach to Semantic Sensor Web principles and technologies, so that smart devices can access Sensor Web data, allowing them also to share richer (semantically interpreted) information. An experimental scenario is presented: a smart car that consumes and produces real-world information which is integrated into the Semantic Sensor Web through a Telco USN-Platform. Performance tests revealed that observation publishing times with our experimental system were well within limits compatible with the adequate operation of smart safety assistance systems in vehicles. On the other hand, response times for complex queries on large repositories may be inappropriate for rapid reaction needs. PMID:24945678
A ubiquitous sensor network platform for integrating smart devices into the semantic sensor web.
de Vera, David Díaz Pardo; Izquierdo, Alvaro Sigüenza; Vercher, Jesús Bernat; Hernández Gómez, Luis Alfonso
2014-06-18
Ongoing Sensor Web developments make a growing amount of heterogeneous sensor data available to smart devices. This is generating an increasing demand for homogeneous mechanisms to access, publish and share real-world information. This paper discusses, first, an architectural solution based on Next Generation Networks: a pilot Telco Ubiquitous Sensor Network (USN) Platform that embeds several OGC® Sensor Web services. This platform has already been deployed in large scale projects. Second, the USN-Platform is extended to explore a first approach to Semantic Sensor Web principles and technologies, so that smart devices can access Sensor Web data, allowing them also to share richer (semantically interpreted) information. An experimental scenario is presented: a smart car that consumes and produces real-world information which is integrated into the Semantic Sensor Web through a Telco USN-Platform. Performance tests revealed that observation publishing times with our experimental system were well within limits compatible with the adequate operation of smart safety assistance systems in vehicles. On the other hand, response times for complex queries on large repositories may be inappropriate for rapid reaction needs.
2012-01-01
Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations. PMID:23244446
Semantic Web Applications and Tools for the Life Sciences: SWAT4LS 2010
2012-01-01
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data. PMID:22373274
Semantic Web applications and tools for the life sciences: SWAT4LS 2010.
Burger, Albert; Paschke, Adrian; Romano, Paolo; Marshall, M Scott; Splendiani, Andrea
2012-01-25
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data.
Developing a kidney and urinary pathway knowledge base
2011-01-01
Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. Results We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. Conclusions The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. Availability The KUPKB may be accessed via http://www.e-lico.eu/kupkb. PMID:21624162
Ontology-based geospatial data query and integration
Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.
2008-01-01
Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.
A semantic web ontology for small molecules and their biological targets.
Choi, Jooyoung; Davis, Melissa J; Newman, Andrew F; Ragan, Mark A
2010-05-24
A wide range of data on sequences, structures, pathways, and networks of genes and gene products is available for hypothesis testing and discovery in biological and biomedical research. However, data describing the physical, chemical, and biological properties of small molecules have not been well-integrated with these resources. Semantically rich representations of chemical data, combined with Semantic Web technologies, have the potential to enable the integration of small molecule and biomolecular data resources, expanding the scope and power of biomedical and pharmacological research. We employed the Semantic Web technologies Resource Description Framework (RDF) and Web Ontology Language (OWL) to generate a Small Molecule Ontology (SMO) that represents concepts and provides unique identifiers for biologically relevant properties of small molecules and their interactions with biomolecules, such as proteins. We instanced SMO using data from three public data sources, i.e., DrugBank, PubChem and UniProt, and converted to RDF triples. Evaluation of SMO by use of predetermined competency questions implemented as SPARQL queries demonstrated that data from chemical and biomolecular data sources were effectively represented and that useful knowledge can be extracted. These results illustrate the potential of Semantic Web technologies in chemical, biological, and pharmacological research and in drug discovery.
Mining the Human Phenome using Semantic Web Technologies: A Case Study for Type 2 Diabetes
Pathak, Jyotishman; Kiefer, Richard C.; Bielinski, Suzette J.; Chute, Christopher G.
2012-01-01
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. PMID:23304343
Mining the human phenome using semantic web technologies: a case study for Type 2 Diabetes.
Pathak, Jyotishman; Kiefer, Richard C; Bielinski, Suzette J; Chute, Christopher G
2012-01-01
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
A Semantic Sensor Web for Environmental Decision Support Applications
Gray, Alasdair J. G.; Sadler, Jason; Kit, Oles; Kyzirakos, Kostis; Karpathiotakis, Manos; Calbimonte, Jean-Paul; Page, Kevin; García-Castro, Raúl; Frazer, Alex; Galpin, Ixent; Fernandes, Alvaro A. A.; Paton, Norman W.; Corcho, Oscar; Koubarakis, Manolis; De Roure, David; Martinez, Kirk; Gómez-Pérez, Asunción
2011-01-01
Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g., flood emergency response. For these applications, the sensor readings need to be put in context by integrating them with other sources of data about the surrounding environment. Traditional systems for predicting and detecting floods rely on methods that need significant human resources. In this paper we describe a semantic sensor web architecture for integrating multiple heterogeneous datasets, including live and historic sensor data, databases, and map layers. The architecture provides mechanisms for discovering datasets, defining integrated views over them, continuously receiving data in real-time, and visualising on screen and interacting with the data. Our approach makes extensive use of web service standards for querying and accessing data, and semantic technologies to discover and integrate datasets. We demonstrate the use of our semantic sensor web architecture in the context of a flood response planning web application that uses data from sensor networks monitoring the sea-state around the coast of England. PMID:22164110
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services.
Gessler, Damian D G; Schiltz, Gary S; May, Greg D; Avraham, Shulamit; Town, Christopher D; Grant, David; Nelson, Rex T
2009-09-23
SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies. There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at http://sswap.info (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at http://sswap.info/protocol.jsp, developer tools at http://sswap.info/developer.jsp, and a portal to third-party ontologies at http://sswapmeet.sswap.info (a "swap meet"). SSWAP addresses the three basic requirements of a semantic web services architecture (i.e., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: i.e., i) the fatal mutability of traditional interfaces, ii) the rigidity and fragility of static subsumption hierarchies, and iii) the confounding of content, structure, and presentation. SSWAP is novel by establishing the concept of a canonical yet mutable OWL DL graph that allows data and service providers to describe their resources, to allow discovery servers to offer semantically rich search engines, to allow clients to discover and invoke those resources, and to allow providers to respond with semantically tagged data. SSWAP allows for a mix-and-match of terms from both new and legacy third-party ontologies in these graphs.
Don’t Like RDF Reification? Making Statements about Statements Using Singleton Property
Nguyen, Vinh; Bodenreider, Olivier; Sheth, Amit
2015-01-01
Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows such meta knowledge of RDF triples to be expressed using RDF by two steps. The first step is representing the triple by a Statement instance which has subject, predicate, and object indicated separately in three different triples. The second step is creating assertions about that instance as if it is a statement. While reification is simple and intuitive, this approach does not have formal semantics and is not commonly used in practice as described in the RDF Primer. In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it. We explain how this singleton property approach fits well with the existing syntax and formal semantics of RDF, and the syntax of SPARQL query language. We also demonstrate the use of singleton property in the representation and querying of meta knowledge in two examples of Semantic Web knowledge bases: YAGO2 and BKR. Our experiments on the BKR show that the singleton property approach gives a decent performance in terms of number of triples, query length and query execution time compared to existing approaches. This approach, which is also simple and intuitive, can be easily adopted for representing and querying statements about statements in other knowledge bases. PMID:25750938
SADI, SHARE, and the in silico scientific method
2010-01-01
Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
UMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links
2014-11-01
bears 270 sun tzu 274 golf instruction 291 sangre de cristo mountains 263 evidence for evolution 300 how to find the mean 262 balding cure 280 view my...internet history 294 flowering plants (b) Worst Query Title 264 tribe formerly living in alabama 295 how to tie a windsor knot 283 hayrides in pa 252...work we leverage the rich semantic knowledge available through these links to understand relevance of documents for a query. We fo- cus on the ad hoc
Addressing the Challenges of Multi-Domain Data Integration with the SemantEco Framework
NASA Astrophysics Data System (ADS)
Patton, E. W.; Seyed, P.; McGuinness, D. L.
2013-12-01
Data integration across multiple domains will continue to be a challenge with the proliferation of big data in the sciences. Data origination issues and how data are manipulated are critical to enable scientists to understand and consume disparate datasets as research becomes more multidisciplinary. We present the SemantEco framework as an exemplar for designing an integrative portal for data discovery, exploration, and interpretation that uses best practice W3C Recommendations. We use the Resource Description Framework (RDF) with extensible ontologies described in the Web Ontology Language (OWL) to provide graph-based data representation. Furthermore, SemantEco ingests data via the software package csv2rdf4lod, which generates data provenance using the W3C provenance recommendation (PROV). Our presentation will discuss benefits and challenges of semantic integration, their effect on runtime performance, and how the SemantEco framework assisted in identifying performance issues and improved query performance across multiple domains by an order of magnitude. SemantEco benefits from a semantic approach that provides an 'open world', which allows data to incrementally change just as it does in the real world. SemantEco modules may load new ontologies and data using the W3C's SPARQL Protocol and RDF Query Language via HTTP. Modules may also provide user interface elements for applications and query capabilities to support new use cases. Modules can associate with domains, which are first-class objects in SemantEco. This enables SemantEco to perform integration and reasoning both within and across domains on module-provided data. The SemantEco framework has been used to construct a web portal for environmental and ecological data. The portal includes water and air quality data from the U.S. Geological Survey (USGS) and Environmental Protection Agency (EPA) and species observation counts for birds and fish from the Avian Knowledge Network and the Santa Barbara Long Term Ecological Research, respectively. We provide regulation ontologies using OWL2 datatype facets to detect out-of-range measurements for environmental standards set by the EPA, i.a. Users adjust queries using module-defined facets and a map presents the resulting measurement sites. Custom icons identify sites that violate regulations, making them easy to locate. Selecting a site gives the option of charting spatially proximate data from different domains over time. Our portal currently provides 1.6 billion triples of scientific data in RDF. We segment data by ZIP code and reasoning over 2157 measurements with our EPA regulation ontology that contains 131 regulations takes 2.5 seconds on a 2.4 GHz Intel Core 2 Quad with 8 GB of RAM. SemantEco's modular design and reasoning capabilities make it an exemplar for building multidisciplinary data integration tools that provide data access to scientists and the general population alike. Its provenance tracking provides accountability and its reasoning services can assist users in interpreting data. Future work includes support for geographical queries using the Open Geospatial Consortium's GeoSPARQL standard.
Design and development of linked data from the National Map
Usery, E. Lynn; Varanka, Dalia E.
2012-01-01
The development of linked data on the World-Wide Web provides the opportunity for the U.S. Geological Survey (USGS) to supply its extensive volumes of geospatial data, information, and knowledge in a machine interpretable form and reach users and applications that heretofore have been unavailable. To pilot a process to take advantage of this opportunity, the USGS is developing an ontology for The National Map and converting selected data from nine research test areas to a Semantic Web format to support machine processing and linked data access. In a case study, the USGS has developed initial methods for legacy vector and raster formatted geometry, attributes, and spatial relationships to be accessed in a linked data environment maintaining the capability to generate graphic or image output from semantic queries. The description of an initial USGS approach to developing ontology, linked data, and initial query capability from The National Map databases is presented.
Bim-Gis Integrated Geospatial Information Model Using Semantic Web and Rdf Graphs
NASA Astrophysics Data System (ADS)
Hor, A.-H.; Jadidi, A.; Sohn, G.
2016-06-01
In recent years, 3D virtual indoor/outdoor urban modelling becomes a key spatial information framework for many civil and engineering applications such as evacuation planning, emergency and facility management. For accomplishing such sophisticate decision tasks, there is a large demands for building multi-scale and multi-sourced 3D urban models. Currently, Building Information Model (BIM) and Geographical Information Systems (GIS) are broadly used as the modelling sources. However, data sharing and exchanging information between two modelling domains is still a huge challenge; while the syntactic or semantic approaches do not fully provide exchanging of rich semantic and geometric information of BIM into GIS or vice-versa. This paper proposes a novel approach for integrating BIM and GIS using semantic web technologies and Resources Description Framework (RDF) graphs. The novelty of the proposed solution comes from the benefits of integrating BIM and GIS technologies into one unified model, so-called Integrated Geospatial Information Model (IGIM). The proposed approach consists of three main modules: BIM-RDF and GIS-RDF graphs construction, integrating of two RDF graphs, and query of information through IGIM-RDF graph using SPARQL. The IGIM generates queries from both the BIM and GIS RDF graphs resulting a semantically integrated model with entities representing both BIM classes and GIS feature objects with respect to the target-client application. The linkage between BIM-RDF and GIS-RDF is achieved through SPARQL endpoints and defined by a query using set of datasets and entity classes with complementary properties, relationships and geometries. To validate the proposed approach and its performance, a case study was also tested using IGIM system design.
Development of a web-based video management and application processing system
NASA Astrophysics Data System (ADS)
Chan, Shermann S.; Wu, Yi; Li, Qing; Zhuang, Yueting
2001-07-01
How to facilitate efficient video manipulation and access in a web-based environment is becoming a popular trend for video applications. In this paper, we present a web-oriented video management and application processing system, based on our previous work on multimedia database and content-based retrieval. In particular, we extend the VideoMAP architecture with specific web-oriented mechanisms, which include: (1) Concurrency control facilities for the editing of video data among different types of users, such as Video Administrator, Video Producer, Video Editor, and Video Query Client; different users are assigned various priority levels for different operations on the database. (2) Versatile video retrieval mechanism which employs a hybrid approach by integrating a query-based (database) mechanism with content- based retrieval (CBR) functions; its specific language (CAROL/ST with CBR) supports spatio-temporal semantics of video objects, and also offers an improved mechanism to describe visual content of videos by content-based analysis method. (3) Query profiling database which records the `histories' of various clients' query activities; such profiles can be used to provide the default query template when a similar query is encountered by the same kind of users. An experimental prototype system is being developed based on the existing VideoMAP prototype system, using Java and VC++ on the PC platform.
SAS- Semantic Annotation Service for Geoscience resources on the web
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.
2015-12-01
There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
Component Models for Semantic Web Languages
NASA Astrophysics Data System (ADS)
Henriksson, Jakob; Aßmann, Uwe
Intelligent applications and agents on the Semantic Web typically need to be specified with, or interact with specifications written in, many different kinds of formal languages. Such languages include ontology languages, data and metadata query languages, as well as transformation languages. As learnt from years of experience in development of complex software systems, languages need to support some form of component-based development. Components enable higher software quality, better understanding and reusability of already developed artifacts. Any component approach contains an underlying component model, a description detailing what valid components are and how components can interact. With the multitude of languages developed for the Semantic Web, what are their underlying component models? Do we need to develop one for each language, or is a more general and reusable approach achievable? We present a language-driven component model specification approach. This means that a component model can be (automatically) generated from a given base language (actually, its specification, e.g. its grammar). As a consequence, we can provide components for different languages and simplify the development of software artifacts used on the Semantic Web.
Time-related patient data retrieval for the case studies from the pharmacogenomics research network
Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G.
2012-01-01
There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users’ own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities. PMID:23076712
Time-related patient data retrieval for the case studies from the pharmacogenomics research network.
Zhu, Qian; Tao, Cui; Ding, Ying; Chute, Christopher G
2012-11-01
There are lots of question-based data elements from the pharmacogenomics research network (PGRN) studies. Many data elements contain temporal information. To semantically represent these elements so that they can be machine processiable is a challenging problem for the following reasons: (1) the designers of these studies usually do not have the knowledge of any computer modeling and query languages, so that the original data elements usually are represented in spreadsheets in human languages; and (2) the time aspects in these data elements can be too complex to be represented faithfully in a machine-understandable way. In this paper, we introduce our efforts on representing these data elements using semantic web technologies. We have developed an ontology, CNTRO, for representing clinical events and their temporal relations in the web ontology language (OWL). Here we use CNTRO to represent the time aspects in the data elements. We have evaluated 720 time-related data elements from PGRN studies. We adapted and extended the knowledge representation requirements for EliXR-TIME to categorize our data elements. A CNTRO-based SPARQL query builder has been developed to customize users' own SPARQL queries for each knowledge representation requirement. The SPARQL query builder has been evaluated with a simulated EHR triple store to ensure its functionalities.
A Story of a Crashed Plane in US-Mexican border
NASA Astrophysics Data System (ADS)
Bermudez, Luis; Hobona, Gobe; Vretanos, Peter; Peterson, Perry
2013-04-01
A plane has crashed on the US-Mexican border. The search and rescue command center planner needs to find information about the crash site, a mountain, nearby mountains for the establishment of a communications tower, as well as ranches for setting up a local incident center. Events like this one occur all over the world and exchanging information seamlessly is key to save lives and prevent further disasters. This abstract describes an interoperability testbed that applied this scenario using technologies based on Open Geospatial Consortium (OGC) standards. The OGC, which has about 500 members, serves as a global forum for the collaboration of developers and users of spatial data products and services, and to advance the development of international standards for geospatial interoperability. The OGC Interoperability Program conducts international interoperability testbeds, such as the OGC Web Services Phase 9 (OWS-9), that encourages rapid development, testing, validation, demonstration and adoption of open, consensus based standards and best practices. The Cross-Community Interoperability (CCI) thread in OWS-9 advanced the Web Feature Service for Gazetteers (WFS-G) by providing a Single Point of Entry Global Gazetteer (SPEGG), where a user can submit a single query and access global geographic names data across multiple Federal names databases. Currently users must make two queries with differing input parameters against two separate databases to obtain authoritative cross border geographic names data. The gazetteers in this scenario included: GNIS and GNS. GNIS or Geographic Names Information System is managed by USGS. It was first developed in 1964 and contains information about domestic and Antarctic names. GNS or GeoNET Names Server provides the Geographic Names Data Base (GNDB) and it is managed by National Geospatial Intelligence Agency (NGA). GNS has been in service since 1994, and serves names for areas outside the United States and its dependent areas, as well as names for undersea features. The following challenges were advanced: Cascaded WFS-G servers (allowing to query multiple WFSs with a "parent" WFS), implemented query names filters (e.g. fuzzy search, text search), implemented dealing with multilingualism and diacritics, implemented advanced spatial constraints (e.g. search by radial search and nearest neighbor) and semantically mediated feature types (e.g. mountain vs. hill). To enable semantic mediation, a series of semantic mappings were defined between the NGA GNS, USGS GNIS and the Alexandria Digital Library (ADL) Gazetteer. The mappings were encoded in the Web Ontology Language (OWL) to enable them to be used by semantic web technologies. The semantic mappings were then published for ingestion into a semantic mediator that used the mappings to associate location types from one gazetteer with location types in another. The semantic mediator was then able to transform requests on the fly, providing a single point of entry WFS-G to multiple gazetteers. The presentation will provide a live presentation of the work performed, highlight main developments, and discuss future development.
Vandervalk, Ben; McCarthy, E Luke; Cruz-Toledo, José; Klein, Artjom; Baker, Christopher J O; Dumontier, Michel; Wilkinson, Mark D
2013-04-05
The Web provides widespread access to vast quantities of health-related information that can improve quality-of-life through better understanding of personal symptoms, medical conditions, and available treatments. Unfortunately, identifying a credible and personally relevant subset of information can be a time-consuming and challenging task for users without a medical background. The objective of the Personal Health Lens system is to aid users when reading health-related webpages by providing warnings about personally relevant drug interactions. More broadly, we wish to present a prototype for a novel, generalizable approach to facilitating interactions between a patient, their practitioner(s), and the Web. We utilized a distributed, Semantic Web-based architecture for recognizing personally dangerous drugs consisting of: (1) a private, local triple store of personal health information, (2) Semantic Web services, following the Semantic Automated Discovery and Integration (SADI) design pattern, for text mining and identifying substance interactions, (3) a bookmarklet to trigger analysis of a webpage and annotate it with personalized warnings, and (4) a semantic query that acts as an abstract template of the analytical workflow to be enacted by the system. A prototype implementation of the system is provided in the form of a Java standalone executable JAR file. The JAR file bundles all components of the system: the personal health database, locally-running versions of the SADI services, and a javascript bookmarklet that triggers analysis of a webpage. In addition, the demonstration includes a hypothetical personal health profile, allowing the system to be used immediately without configuration. Usage instructions are provided. The main strength of the Personal Health Lens system is its ability to organize medical information and to present it to the user in a personalized and contextually relevant manner. While this prototype was limited to a single knowledge domain (drug/drug interactions), the proposed architecture is generalizable, and could act as the foundation for much richer personalized-health-Web clients, while importantly providing a novel and personalizable mechanism for clinical experts to inject their expertise into the browsing experience of their patients in the form of customized semantic queries and ontologies.
Vandervalk, Ben; McCarthy, E Luke; Cruz-Toledo, José; Klein, Artjom; Baker, Christopher J O; Dumontier, Michel
2013-01-01
Background The Web provides widespread access to vast quantities of health-related information that can improve quality-of-life through better understanding of personal symptoms, medical conditions, and available treatments. Unfortunately, identifying a credible and personally relevant subset of information can be a time-consuming and challenging task for users without a medical background. Objective The objective of the Personal Health Lens system is to aid users when reading health-related webpages by providing warnings about personally relevant drug interactions. More broadly, we wish to present a prototype for a novel, generalizable approach to facilitating interactions between a patient, their practitioner(s), and the Web. Methods We utilized a distributed, Semantic Web-based architecture for recognizing personally dangerous drugs consisting of: (1) a private, local triple store of personal health information, (2) Semantic Web services, following the Semantic Automated Discovery and Integration (SADI) design pattern, for text mining and identifying substance interactions, (3) a bookmarklet to trigger analysis of a webpage and annotate it with personalized warnings, and (4) a semantic query that acts as an abstract template of the analytical workflow to be enacted by the system. Results A prototype implementation of the system is provided in the form of a Java standalone executable JAR file. The JAR file bundles all components of the system: the personal health database, locally-running versions of the SADI services, and a javascript bookmarklet that triggers analysis of a webpage. In addition, the demonstration includes a hypothetical personal health profile, allowing the system to be used immediately without configuration. Usage instructions are provided. Conclusions The main strength of the Personal Health Lens system is its ability to organize medical information and to present it to the user in a personalized and contextually relevant manner. While this prototype was limited to a single knowledge domain (drug/drug interactions), the proposed architecture is generalizable, and could act as the foundation for much richer personalized-health-Web clients, while importantly providing a novel and personalizable mechanism for clinical experts to inject their expertise into the browsing experience of their patients in the form of customized semantic queries and ontologies. PMID:23612187
Semantically Enriching the Search System of a Music Digital Library
NASA Astrophysics Data System (ADS)
de Juan, Paloma; Iglesias, Carlos
Traditional search systems are usually based on keywords, a very simple and convenient mechanism to express a need for information. This is the most popular way of searching the Web, although it is not always an easy task to accurately summarize a natural language query in a few keywords. Working with keywords means losing the context, which is the only thing that can help us deal with ambiguity. This is the biggest problem of keyword-based systems. Semantic Web technologies seem a perfect solution to this problem, since they make it possible to represent the semantics of a given domain. In this chapter, we present three projects, Harmos, Semusici and Cantiga, whose aim is to provide access to a music digital library. We will describe two search systems, a traditional one and a semantic one, developed in the context of these projects and compare them in terms of usability and effectiveness.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ostlund, Neil
This research showed the feasibility of applying the concepts of the Semantic Web to Computation Chemistry. We have created the first web portal (www.chemsem.com) that allows data created in the calculations of quantum chemistry, and other such chemistry calculations to be placed on the web in a way that makes the data accessible to scientists in a semantic form never before possible. The semantic web nature of the portal allows data to be searched, found, and used as an advance over the usual approach of a relational database. The semantic data on our portal has the nature of a Giantmore » Global Graph (GGG) that can be easily merged with related data and searched globally via a SPARQL Protocol and RDF Query Language (SPARQL) that makes global searches for data easier than with traditional methods. Our Semantic Web Portal requires that the data be understood by a computer and hence defined by an ontology (vocabulary). This ontology is used by the computer in understanding the data. We have created such an ontology for computational chemistry (purl.org/gc) that encapsulates a broad knowledge of the field of computational chemistry. We refer to this ontology as the Gainesville Core. While it is perhaps the first ontology for computational chemistry and is used by our portal, it is only a start of what must be a long multi-partner effort to define computational chemistry. In conjunction with the above efforts we have defined a new potential file standard (Common Standard for eXchange – CSX for computational chemistry data). This CSX file is the precursor of data in the Resource Description Framework (RDF) form that the semantic web requires. Our portal translates CSX files (as well as other computational chemistry data files) into RDF files that are part of the graph database that the semantic web employs. We propose a CSX file as a convenient way to encapsulate computational chemistry data.« less
A Semantic Graph Query Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kaplan, I L
2006-10-16
Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.
Biotea: semantics for Pubmed Central.
Garcia, Alexander; Lopez, Federico; Garcia, Leyla; Giraldo, Olga; Bucheli, Victor; Dumontier, Michel
2018-01-01
A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.
Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data
2013-01-01
Background The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. Results In this paper, we present our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services. We expose our model, services, prototype, and datasets at http://biotea.idiginfo.org/ Conclusions The semantic processing of biomedical literature presented in this paper embeds documents within the Web of Data and facilitates the execution of concept-based queries against the entire digital library. Our approach delivers a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. Our model delivers a semantically rich and highly interconnected dataset with self-describing content so that software can make effective use of it. PMID:23734622
Spatial Knowledge Infrastructures - Creating Value for Policy Makers and Benefits the Community
NASA Astrophysics Data System (ADS)
Arnold, L. M.
2016-12-01
The spatial data infrastructure is arguably one of the most significant advancements in the spatial sector. It's been a game changer for governments, providing for the coordination and sharing of spatial data across organisations and the provision of accessible information to the broader community of users. Today however, end-users such as policy-makers require far more from these spatial data infrastructures. They want more than just data; they want the knowledge that can be extracted from data and they don't want to have to download, manipulate and process data in order to get the knowledge they seek. It's time for the spatial sector to reduce its focus on data in spatial data infrastructures and take a more proactive step in emphasising and delivering the knowledge value. Nowadays, decision-makers want to be able to query at will the data to meet their immediate need for knowledge. This is a new value proposal for the decision-making consumer and will require a shift in thinking. This paper presents a model for a Spatial Knowledge Infrastructure and underpinning methods that will realise a new real-time approach to delivering knowledge. The methods embrace the new capabilities afforded through the sematic web, domain and process ontologies and natural query language processing. Semantic Web technologies today have the potential to transform the spatial industry into more than just a distribution channel for data. The Semantic Web RDF (Resource Description Framework) enables meaning to be drawn from data automatically. While pushing data out to end-users will remain a central role for data producers, the power of the semantic web is that end-users have the ability to marshal a broad range of spatial resources via a query to extract knowledge from available data. This can be done without actually having to configure systems specifically for the end-user. All data producers need do is make data accessible in RDF and the spatial analytics does the rest.
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
Group Centric Information Sharing Using Hierarchical Models
2011-01-01
enable people to create data using RDF, build vocabularies using web ontology language (OWL), write rules and query data stores using SPARQL [8...a strict joined and the document was added with a strict add. In order to represent the fact that an action is allowed (or not), we have created a...greatly improve the system’s readiness to handle any number of access decision queries . a. The pair is tested against the gSIS Join and Add semantics
Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro
2011-07-01
Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org.
Gazetteer Brokering through Semantic Mediation
NASA Astrophysics Data System (ADS)
Hobona, G.; Bermudez, L. E.; Brackin, R.
2013-12-01
A gazetteer is a geographical directory containing some information regarding places. It provides names, location and other attributes for places which may include points of interest (e.g. buildings, oilfields and boreholes), and other features. These features can be published via web services conforming to the Gazetteer Application Profile of the Web Feature Service (WFS) standard of the Open Geospatial Consortium (OGC). Against the backdrop of advances in geophysical surveys, there has been a significant increase in the amount of data referenced to locations. Gazetteers services have played a significant role in facilitating access to such data, including through provision of specialized queries such as text, spatial and fuzzy search. Recent developments in the OGC have led to advances in gazetteers such as support for multilingualism, diacritics, and querying via advanced spatial constraints (e.g. search by radial search and nearest neighbor). A challenge remaining however, is that gazetteers produced by different organizations have typically been modeled differently. Inconsistencies from gazetteers produced by different organizations may include naming the same feature in a different way, naming the attributes differently, locating the feature in a different location, and providing fewer or more attributes than the other services. The Gazetteer application profile of the WFS is a starting point to address such inconsistencies by providing a standardized interface based on rules specified in ISO 19112, the international standard for spatial referencing by geographic identifiers. The profile, however, does not provide rules to deal with semantic inconsistencies. The USGS and NGA commissioned research into the potential for a Single Point of Entry Global Gazetteer (SPEGG). The research was conducted by the Cross Community Interoperability thread of the OGC testbed, referenced OWS-9. The testbed prototyped approaches for brokering gazetteers through use of semantic web technologies, including ontologies and a semantic mediator. The semantically-enhanced SPEGG allowed a client to submit a single query (e.g. ';hills') and to retrieve data from two separate gazetteers with different vocabularies (e.g. where one refers to ';summits' another refers to ';hills'). Supporting the SPEGG was a SPARQL server that held the ontologies and processed queries on them. Earth Science surveys and forecast always have a place on Earth. Being able to share the information about a place and solve inconsistencies about that place from different sources will enable geoscientists to better do their research. In the advent of mobile geo computing and location based services (LBS), brokering gazetteers will provide geoscientists with access to gazetteer services rich with information and functionality beyond that offered by current generic gazetteers.
RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay
The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.
A semantically rich and standardised approach enhancing discovery of sensor data and metadata
NASA Astrophysics Data System (ADS)
Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise
2016-04-01
The marine environment plays an essential role in the earth's climate. To enhance the ability to monitor the health of this important system, innovative sensors are being produced and combined with state of the art sensor technology. As the number of sensors deployed is continually increasing,, it is a challenge for data users to find the data that meet their specific needs. Furthermore, users need to integrate diverse ocean datasets originating from the same or even different systems. Standards provide a solution to the above mentioned challenges. The Open Geospatial Consortium (OGC) has created Sensor Web Enablement (SWE) standards that enable different sensor networks to establish syntactic interoperability. When combined with widely accepted controlled vocabularies, they become semantically rich and semantic interoperability is achievable. In addition, Linked Data is the recommended best practice for exposing, sharing and connecting information on the Semantic Web using Uniform Resource Identifiers (URIs), Resource Description Framework (RDF) and RDF Query Language (SPARQL). As part of the EU-funded SenseOCEAN project, the British Oceanographic Data Centre (BODC) is working on the standardisation of sensor metadata enabling 'plug and play' sensor integration. Our approach combines standards, controlled vocabularies and persistent URIs to publish sensor descriptions, their data and associated metadata as 5 star Linked Data and OGC SWE (SensorML, Observations & Measurements) standard. Thus sensors become readily discoverable, accessible and useable via the web. Content and context based searching is also enabled since sensors descriptions are understood by machines. Additionally, sensor data can be combined with other sensor or Linked Data datasets to form knowledge. This presentation will describe the work done in BODC to achieve syntactic and semantic interoperability in the sensor domain. It will illustrate the reuse and extension of the Semantic Sensor Network (SSN) ontology to Linked Sensor Ontology (LSO) and the steps taken to combine OGC SWE with the Linked Data approach through alignment and embodiment of other ontologies. It will then explain how data and models were annotated with controlled vocabularies to establish unambiguous semantics and interconnect them with data from different sources. Finally, it will introduce the RDF triple store where the sensor descriptions and metadata are stored and can be queried through the standard query language SPARQL. Providing different flavours of machine readable interpretations of sensors, sensor data and metadata enhances discoverability but most importantly allows seamless aggregation of information from different networks that will finally produce knowledge.
A semantic web framework to integrate cancer omics data with biological knowledge.
Holford, Matthew E; McCusker, James P; Cheung, Kei-Hoi; Krauthammer, Michael
2012-01-25
The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily.
Towards the novel reasoning among particles in PSO by the use of RDF and SPARQL.
Fister, Iztok; Yang, Xin-She; Ljubič, Karin; Fister, Dušan; Brest, Janez; Fister, Iztok
2014-01-01
The significant development of the Internet has posed some new challenges and many new programming tools have been developed to address such challenges. Today, semantic web is a modern paradigm for representing and accessing knowledge data on the Internet. This paper tries to use the semantic tools such as resource definition framework (RDF) and RDF query language (SPARQL) for the optimization purpose. These tools are combined with particle swarm optimization (PSO) and the selection of the best solutions depends on its fitness. Instead of the local best solution, a neighborhood of solutions for each particle can be defined and used for the calculation of the new position, based on the key ideas from semantic web domain. The preliminary results by optimizing ten benchmark functions showed the promising results and thus this method should be investigated further.
Earth-Base: A Free And Open Source, RESTful Earth Sciences Platform
NASA Astrophysics Data System (ADS)
Kishor, P.; Heim, N. A.; Peters, S. E.; McClennen, M.
2012-12-01
This presentation describes the motivation, concept, and architecture behind Earth-Base, a web-based, RESTful data-management, analysis and visualization platform for earth sciences data. Traditionally web applications have been built directly accessing data from a database using a scripting language. While such applications are great at bring results to a wide audience, they are limited in scope to the imagination and capabilities of the application developer. Earth-Base decouples the data store from the web application by introducing an intermediate "data application" tier. The data application's job is to query the data store using self-documented, RESTful URIs, and send the results back formatted as JavaScript Object Notation (JSON). Decoupling the data store from the application allows virtually limitless flexibility in developing applications, both web-based for human consumption or programmatic for machine consumption. It also allows outside developers to use the data in their own applications, potentially creating applications that the original data creator and app developer may not have even thought of. Standardized specifications for URI-based querying and JSON-formatted results make querying and developing applications easy. URI-based querying also allows utilizing distributed datasets easily. Companion mechanisms for querying data snapshots aka time-travel, usage tracking and license management, and verification of semantic equivalence of data are also described. The latter promotes the "What You Expect Is What You Get" (WYEIWYG) principle that can aid in data citation and verification.
Validation and discovery of genotype-phenotype associations in chronic diseases using linked data.
Pathak, Jyotishman; Kiefer, Richard; Freimuth, Robert; Chute, Christopher
2012-01-01
This study investigates federated SPARQL queries over Linked Open Data (LOD) in the Semantic Web to validate existing, and potentially discover new genotype-phenotype associations from public datasets. In particular, we report our preliminary findings for identifying such associations for commonly occurring chronic diseases using the Online Mendelian Inheritance in Man (OMIM) and Database for SNPs (dbSNP) within the LOD knowledgebase and compare them with Gene Wiki for coverage and completeness. Our results indicate that Semantic Web technologies can play an important role for in-silico identification of novel disease-gene-SNP associations, although additional verification is required before such information can be applied and used effectively.
Semantic Enhancement for Enterprise Data Management
NASA Astrophysics Data System (ADS)
Ma, Li; Sun, Xingzhi; Cao, Feng; Wang, Chen; Wang, Xiaoyuan; Kanellos, Nick; Wolfson, Dan; Pan, Yue
Taking customer data as an example, the paper presents an approach to enhance the management of enterprise data by using Semantic Web technologies. Customer data is the most important kind of core business entity a company uses repeatedly across many business processes and systems, and customer data management (CDM) is becoming critical for enterprises because it keeps a single, complete and accurate record of customers across the enterprise. Existing CDM systems focus on integrating customer data from all customer-facing channels and front and back office systems through multiple interfaces, as well as publishing customer data to different applications. To make the effective use of the CDM system, this paper investigates semantic query and analysis over the integrated and centralized customer data, enabling automatic classification and relationship discovery. We have implemented these features over IBM Websphere Customer Center, and shown the prototype to our clients. We believe that our study and experiences are valuable for both Semantic Web community and data management community.
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
CDAO-Store: Ontology-driven Data Integration for Phylogenetic Analysis
2011-01-01
Background The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Results Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. Conclusions CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore. PMID:21496247
CDAO-store: ontology-driven data integration for phylogenetic analysis.
Chisham, Brandon; Wright, Ben; Le, Trung; Son, Tran Cao; Pontelli, Enrico
2011-04-15
The Comparative Data Analysis Ontology (CDAO) is an ontology developed, as part of the EvoInfo and EvoIO groups supported by the National Evolutionary Synthesis Center, to provide semantic descriptions of data and transformations commonly found in the domain of phylogenetic analysis. The core concepts of the ontology enable the description of phylogenetic trees and associated character data matrices. Using CDAO as the semantic back-end, we developed a triple-store, named CDAO-Store. CDAO-Store is a RDF-based store of phylogenetic data, including a complete import of TreeBASE. CDAO-Store provides a programmatic interface, in the form of web services, and a web-based front-end, to perform both user-defined as well as domain-specific queries; domain-specific queries include search for nearest common ancestors, minimum spanning clades, filter multiple trees in the store by size, author, taxa, tree identifier, algorithm or method. In addition, CDAO-Store provides a visualization front-end, called CDAO-Explorer, which can be used to view both character data matrices and trees extracted from the CDAO-Store. CDAO-Store provides import capabilities, enabling the addition of new data to the triple-store; files in PHYLIP, MEGA, nexml, and NEXUS formats can be imported and their CDAO representations added to the triple-store. CDAO-Store is made up of a versatile and integrated set of tools to support phylogenetic analysis. To the best of our knowledge, CDAO-Store is the first semantically-aware repository of phylogenetic data with domain-specific querying capabilities. The portal to CDAO-Store is available at http://www.cs.nmsu.edu/~cdaostore.
NASA Astrophysics Data System (ADS)
Gross, M. B.; Mayernik, M. S.; Rowan, L. R.; Khan, H.; Boler, F. M.; Maull, K. E.; Stott, D.; Williams, S.; Corson-Rikert, J.; Johns, E. M.; Daniels, M. D.; Krafft, D. B.
2015-12-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, an EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page will show, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can also be queried using SPARQL, a query language for semantic data. EarthCollab will also extend the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. Additional extensions, including enhanced geospatial capabilities, will be developed following task-centered usability testing.
An Ontology-Based Approach to Incorporate User-Generated Geo-Content Into Sdi
NASA Astrophysics Data System (ADS)
Deng, D.-P.; Lemmens, R.
2011-08-01
The Web is changing the way people share and communicate information because of emergence of various Web technologies, which enable people to contribute information on the Web. User-Generated Geo-Content (UGGC) is a potential resource of geographic information. Due to the different production methods, UGGC often cannot fit in geographic information model. There is a semantic gap between UGGC and formal geographic information. To integrate UGGC into geographic information, this study conducts an ontology-based process to bridge this semantic gap. This ontology-based process includes five steps: Collection, Extraction, Formalization, Mapping, and Deployment. In addition, this study implements this process on Twitter messages, which is relevant to Japan Earthquake disaster. By using this process, we extract disaster relief information from Twitter messages, and develop a knowledge base for GeoSPARQL queries in disaster relief information.
A technological infrastructure to sustain Internetworked Enterprises
NASA Astrophysics Data System (ADS)
La Mattina, Ernesto; Savarino, Vincenzo; Vicari, Claudia; Storelli, Davide; Bianchini, Devis
In the Web 3.0 scenario, where information and services are connected by means of their semantics, organizations can improve their competitive advantage by publishing their business and service descriptions. In this scenario, Semantic Peer to Peer (P2P) can play a key role in defining dynamic and highly reconfigurable infrastructures. Organizations can share knowledge and services, using this infrastructure to move towards value networks, an emerging organizational model characterized by fluid boundaries and complex relationships. This chapter collects and defines the technological requirements and architecture of a modular and multi-Layer Peer to Peer infrastructure for SOA-based applications. This technological infrastructure, based on the combination of Semantic Web and P2P technologies, is intended to sustain Internetworked Enterprise configurations, defining a distributed registry and enabling more expressive queries and efficient routing mechanisms. The following sections focus on the overall architecture, while describing the layers that form it.
Dynamic User Interfaces for Service Oriented Architectures in Healthcare.
Schweitzer, Marco; Hoerbst, Alexander
2016-01-01
Electronic Health Records (EHRs) play a crucial role in healthcare today. Considering a data-centric view, EHRs are very advanced as they provide and share healthcare data in a cross-institutional and patient-centered way adhering to high syntactic and semantic interoperability. However, the EHR functionalities available for the end users are rare and hence often limited to basic document query functions. Future EHR use necessitates the ability to let the users define their needed data according to a certain situation and how this data should be processed. Workflow and semantic modelling approaches as well as Web services provide means to fulfil such a goal. This thesis develops concepts for dynamic interfaces between EHR end users and a service oriented eHealth infrastructure, which allow the users to design their flexible EHR needs, modeled in a dynamic and formal way. These are used to discover, compose and execute the right Semantic Web services.
Kobayashi, Norio; Ishii, Manabu; Takahashi, Satoshi; Mochizuki, Yoshiki; Matsushima, Akihiro; Toyoda, Tetsuro
2011-01-01
Global cloud frameworks for bioinformatics research databases become huge and heterogeneous; solutions face various diametric challenges comprising cross-integration, retrieval, security and openness. To address this, as of March 2011 organizations including RIKEN published 192 mammalian, plant and protein life sciences databases having 8.2 million data records, integrated as Linked Open or Private Data (LOD/LPD) using SciNetS.org, the Scientists' Networking System. The huge quantity of linked data this database integration framework covers is based on the Semantic Web, where researchers collaborate by managing metadata across public and private databases in a secured data space. This outstripped the data query capacity of existing interface tools like SPARQL. Actual research also requires specialized tools for data analysis using raw original data. To solve these challenges, in December 2009 we developed the lightweight Semantic-JSON interface to access each fragment of linked and raw life sciences data securely under the control of programming languages popularly used by bioinformaticians such as Perl and Ruby. Researchers successfully used the interface across 28 million semantic relationships for biological applications including genome design, sequence processing, inference over phenotype databases, full-text search indexing and human-readable contents like ontology and LOD tree viewers. Semantic-JSON services of SciNetS.org are provided at http://semanticjson.org. PMID:21632604
A Practical Ontology Query Expansion Algorithm for Semantic-Aware Learning Objects Retrieval
ERIC Educational Resources Information Center
Lee, Ming-Che; Tsai, Kun Hua; Wang, Tzone I.
2008-01-01
Following the rapid development of Internet, particularly web page interaction technology, distant e-learning has become increasingly realistic and popular. To solve the problems associated with sharing and reusing teaching materials in different e-learning systems, several standard formats, including SCORM, IMS, LOM, and AICC, etc., recently have…
A semantic web framework to integrate cancer omics data with biological knowledge
2012-01-01
Background The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. Results For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. Conclusions We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily. PMID:22373303
linkedISA: semantic representation of ISA-Tab experimental metadata.
González-Beltrán, Alejandra; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe
2014-01-01
Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Semantic similarity measures in the biomedical domain by leveraging a web search engine.
Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching
2013-07-01
Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.
Query Auto-Completion Based on Word2vec Semantic Similarity
NASA Astrophysics Data System (ADS)
Shao, Taihua; Chen, Honghui; Chen, Wanyu
2018-04-01
Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.
Modeling and formal representation of geospatial knowledge for the Geospatial Semantic Web
NASA Astrophysics Data System (ADS)
Huang, Hong; Gong, Jianya
2008-12-01
GML can only achieve geospatial interoperation at syntactic level. However, it is necessary to resolve difference of spatial cognition in the first place in most occasions, so ontology was introduced to describe geospatial information and services. But it is obviously difficult and improper to let users to find, match and compose services, especially in some occasions there are complicated business logics. Currently, with the gradual introduction of Semantic Web technology (e.g., OWL, SWRL), the focus of the interoperation of geospatial information has shifted from syntactic level to Semantic and even automatic, intelligent level. In this way, Geospatial Semantic Web (GSM) can be put forward as an augmentation to the Semantic Web that additionally includes geospatial abstractions as well as related reasoning, representation and query mechanisms. To advance the implementation of GSM, we first attempt to construct the mechanism of modeling and formal representation of geospatial knowledge, which are also two mostly foundational phases in knowledge engineering (KE). Our attitude in this paper is quite pragmatical: we argue that geospatial context is a formal model of the discriminate environment characters of geospatial knowledge, and the derivation, understanding and using of geospatial knowledge are located in geospatial context. Therefore, first, we put forward a primitive hierarchy of geospatial knowledge referencing first order logic, formal ontologies, rules and GML. Second, a metamodel of geospatial context is proposed and we use the modeling methods and representation languages of formal ontologies to process geospatial context. Thirdly, we extend Web Process Service (WPS) to be compatible with local DLL for geoprocessing and possess inference capability based on OWL.
Chiba, Hirokazu; Nishide, Hiroyo; Uchiyama, Ikuo
2015-01-01
Recently, various types of biological data, including genomic sequences, have been rapidly accumulating. To discover biological knowledge from such growing heterogeneous data, a flexible framework for data integration is necessary. Ortholog information is a central resource for interlinking corresponding genes among different organisms, and the Semantic Web provides a key technology for the flexible integration of heterogeneous data. We have constructed an ortholog database using the Semantic Web technology, aiming at the integration of numerous genomic data and various types of biological information. To formalize the structure of the ortholog information in the Semantic Web, we have constructed the Ortholog Ontology (OrthO). While the OrthO is a compact ontology for general use, it is designed to be extended to the description of database-specific concepts. On the basis of OrthO, we described the ortholog information from our Microbial Genome Database for Comparative Analysis (MBGD) in the form of Resource Description Framework (RDF) and made it available through the SPARQL endpoint, which accepts arbitrary queries specified by users. In this framework based on the OrthO, the biological data of different organisms can be integrated using the ortholog information as a hub. Besides, the ortholog information from different data sources can be compared with each other using the OrthO as a shared ontology. Here we show some examples demonstrating that the ortholog information described in RDF can be used to link various biological data such as taxonomy information and Gene Ontology. Thus, the ortholog database using the Semantic Web technology can contribute to biological knowledge discovery through integrative data analysis.
Content-based image retrieval with ontological ranking
NASA Astrophysics Data System (ADS)
Tsai, Shen-Fu; Tsai, Min-Hsuan; Huang, Thomas S.
2010-02-01
Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a thousand words." It is because compared with text consisting of an array of words, an image has more degrees of freedom and therefore a more complicated structure. However, the less limited structure of images presents researchers in the computer vision community a tough task of teaching machines to understand and organize images, especially when a limit number of learning examples and background knowledge are given. The advance of internet and web technology in the past decade has changed the way human gain knowledge. People, hence, can exchange knowledge with others by discussing and contributing information on the web. As a result, the web pages in the internet have become a living and growing source of information. One is therefore tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to make computer learn from the internet and provide human with more meaningful knowledge. In this work, we explore this novel possibility on image understanding applied to semantic image search. We exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's general knowledge. The former maps visual content to related text in contrast to the traditional way of associating images with surrounding text; the latter provides relations between concepts for machines to understand to what extent and in what sense an image is close to the image search query. With the aid of these two tools, the resulting image search system is thus content-based and moreover, organized. The returned images are ranked and organized such that semantically similar images are grouped together and given a rank based on the semantic closeness to the input query. The novelty of the system is twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the grouping is different from pure visual similarity clustering. More specifically, the inferred concepts of each image in the group are examined in the context of a huge concept ontology to determine their true relations with what people have in mind when doing image search.
Isosemantic rendering of clinical information using formal ontologies and RDF.
Martínez-Costa, Catalina; Bosca, Diego; Legaz-García, Mari Carmen; Tao, Cui; Fernández Breis, Jesualdo Tomás; Schulz, Stefan; Chute, Christopher G
2013-01-01
The generation of a semantic clinical infostructure requires linking ontologies, clinical models and terminologies [1]. Here we describe an approach that would permit data coming from different sources and represented in different standards to be queried in a homogeneous and integrated way. Our assumption is that data providers should be able to agree and share the meaning of the data they want to exchange and to exploit. We will describe how Clinical Element Model (CEM) and OpenEHR datasets can be jointly exploited in Semantic Web environments.
Ontology for Transforming Geo-Spatial Data for Discovery and Integration of Scientific Data
NASA Astrophysics Data System (ADS)
Nguyen, L.; Chee, T.; Minnis, P.
2013-12-01
Discovery and access to geo-spatial scientific data across heterogeneous repositories and multi-discipline datasets can present challenges for scientist. We propose to build a workflow for transforming geo-spatial datasets into semantic environment by using relationships to describe the resource using OWL Web Ontology, RDF, and a proposed geo-spatial vocabulary. We will present methods for transforming traditional scientific dataset, use of a semantic repository, and querying using SPARQL to integrate and access datasets. This unique repository will enable discovery of scientific data by geospatial bound or other criteria.
WebGIS based on semantic grid model and web services
NASA Astrophysics Data System (ADS)
Zhang, WangFei; Yue, CaiRong; Gao, JianGuo
2009-10-01
As the combination point of the network technology and GIS technology, WebGIS has got the fast development in recent years. With the restriction of Web and the characteristics of GIS, traditional WebGIS has some prominent problems existing in development. For example, it can't accomplish the interoperability of heterogeneous spatial databases; it can't accomplish the data access of cross-platform. With the appearance of Web Service and Grid technology, there appeared great change in field of WebGIS. Web Service provided an interface which can give information of different site the ability of data sharing and inter communication. The goal of Grid technology was to make the internet to a large and super computer, with this computer we can efficiently implement the overall sharing of computing resources, storage resource, data resource, information resource, knowledge resources and experts resources. But to WebGIS, we only implement the physically connection of data and information and these is far from the enough. Because of the different understanding of the world, following different professional regulations, different policies and different habits, the experts in different field will get different end when they observed the same geographic phenomenon and the semantic heterogeneity produced. Since these there are large differences to the same concept in different field. If we use the WebGIS without considering of the semantic heterogeneity, we will answer the questions users proposed wrongly or we can't answer the questions users proposed. To solve this problem, this paper put forward and experienced an effective method of combing semantic grid and Web Services technology to develop WebGIS. In this paper, we studied the method to construct ontology and the method to combine Grid technology and Web Services and with the detailed analysis of computing characteristics and application model in the distribution of data, we designed the WebGIS query system driven by ontology based on Grid technology and Web Services.
Semantic orchestration of image processing services for environmental analysis
NASA Astrophysics Data System (ADS)
Ranisavljević, Élisabeth; Devin, Florent; Laffly, Dominique; Le Nir, Yannick
2013-09-01
In order to analyze environmental dynamics, a major process is the classification of the different phenomena of the site (e.g. ice and snow for a glacier). When using in situ pictures, this classification requires data pre-processing. Not all the pictures need the same sequence of processes depending on the disturbances. Until now, these sequences have been done manually, which restricts the processing of large amount of data. In this paper, we present how to realize a semantic orchestration to automate the sequencing for the analysis. It combines two advantages: solving the problem of the amount of processing, and diversifying the possibilities in the data processing. We define a BPEL description to express the sequences. This BPEL uses some web services to run the data processing. Each web service is semantically annotated using an ontology of image processing. The dynamic modification of the BPEL is done using SPARQL queries on these annotated web services. The results obtained by a prototype implementing this method validate the construction of the different workflows that can be applied to a large number of pictures.
Tao, Cui; Jiang, Guoqian; Oniki, Thomas A; Freimuth, Robert R; Zhu, Qian; Sharma, Deepak; Pathak, Jyotishman; Huff, Stanley M; Chute, Christopher G
2013-05-01
The clinical element model (CEM) is an information model designed for representing clinical information in electronic health records (EHR) systems across organizations. The current representation of CEMs does not support formal semantic definitions and therefore it is not possible to perform reasoning and consistency checking on derived models. This paper introduces our efforts to represent the CEM specification using the Web Ontology Language (OWL). The CEM-OWL representation connects the CEM content with the Semantic Web environment, which provides authoring, reasoning, and querying tools. This work may also facilitate the harmonization of the CEMs with domain knowledge represented in terminology models as well as other clinical information models such as the openEHR archetype model. We have created the CEM-OWL meta ontology based on the CEM specification. A convertor has been implemented in Java to automatically translate detailed CEMs from XML to OWL. A panel evaluation has been conducted, and the results show that the OWL modeling can faithfully represent the CEM specification and represent patient data.
NASA Astrophysics Data System (ADS)
Poux, F.; Neuville, R.; Hallot, P.; Van Wersch, L.; Luczfalvy Jancsó, A.; Billen, R.
2017-05-01
While virtual copies of the real world tend to be created faster than ever through point clouds and derivatives, their working proficiency by all professionals' demands adapted tools to facilitate knowledge dissemination. Digital investigations are changing the way cultural heritage researchers, archaeologists, and curators work and collaborate to progressively aggregate expertise through one common platform. In this paper, we present a web application in a WebGL framework accessible on any HTML5-compatible browser. It allows real time point cloud exploration of the mosaics in the Oratory of Germigny-des-Prés, and emphasises the ease of use as well as performances. Our reasoning engine is constructed over a semantically rich point cloud data structure, where metadata has been injected a priori. We developed a tool that directly allows semantic extraction and visualisation of pertinent information for the end users. It leads to efficient communication between actors by proposing optimal 3D viewpoints as a basis on which interactions can grow.
Tao, Cui; Jiang, Guoqian; Oniki, Thomas A; Freimuth, Robert R; Zhu, Qian; Sharma, Deepak; Pathak, Jyotishman; Huff, Stanley M; Chute, Christopher G
2013-01-01
The clinical element model (CEM) is an information model designed for representing clinical information in electronic health records (EHR) systems across organizations. The current representation of CEMs does not support formal semantic definitions and therefore it is not possible to perform reasoning and consistency checking on derived models. This paper introduces our efforts to represent the CEM specification using the Web Ontology Language (OWL). The CEM-OWL representation connects the CEM content with the Semantic Web environment, which provides authoring, reasoning, and querying tools. This work may also facilitate the harmonization of the CEMs with domain knowledge represented in terminology models as well as other clinical information models such as the openEHR archetype model. We have created the CEM-OWL meta ontology based on the CEM specification. A convertor has been implemented in Java to automatically translate detailed CEMs from XML to OWL. A panel evaluation has been conducted, and the results show that the OWL modeling can faithfully represent the CEM specification and represent patient data. PMID:23268487
2006-08-01
effective for describing taxonomic categories and properties of things, the structures found in SWRL and SPARQL are better suited to describing conditions...up the query processing time, which may occur many times and furthermore it is time critical. In order to maintain information about the...that time spent during this phase does not depend linearly on the number of concepts present in the data structure , but in the order of log of concepts
A Statistical Ontology-Based Approach to Ranking for Multiword Search
ERIC Educational Resources Information Center
Kim, Jinwoo
2013-01-01
Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships…
Chen, Xi; Chen, Huajun; Bi, Xuan; Gu, Peiqin; Chen, Jiaoyan; Wu, Zhaohui
2014-01-01
Understanding the functional mechanisms of the complex biological system as a whole is drawing more and more attention in global health care management. Traditional Chinese Medicine (TCM), essentially different from Western Medicine (WM), is gaining increasing attention due to its emphasis on individual wellness and natural herbal medicine, which satisfies the goal of integrative medicine. However, with the explosive growth of biomedical data on the Web, biomedical researchers are now confronted with the problem of large-scale data analysis and data query. Besides that, biomedical data also has a wide coverage which usually comes from multiple heterogeneous data sources and has different taxonomies, making it hard to integrate and query the big biomedical data. Embedded with domain knowledge from different disciplines all regarding human biological systems, the heterogeneous data repositories are implicitly connected by human expert knowledge. Traditional search engines cannot provide accurate and comprehensive search results for the semantically associated knowledge since they only support keywords-based searches. In this paper, we present BioTCM-SE, a semantic search engine for the information retrieval of modern biology and TCM, which provides biologists with a comprehensive and accurate associated knowledge query platform to greatly facilitate the implicit knowledge discovery between WM and TCM.
Chen, Xi; Chen, Huajun; Bi, Xuan; Gu, Peiqin; Chen, Jiaoyan; Wu, Zhaohui
2014-01-01
Understanding the functional mechanisms of the complex biological system as a whole is drawing more and more attention in global health care management. Traditional Chinese Medicine (TCM), essentially different from Western Medicine (WM), is gaining increasing attention due to its emphasis on individual wellness and natural herbal medicine, which satisfies the goal of integrative medicine. However, with the explosive growth of biomedical data on the Web, biomedical researchers are now confronted with the problem of large-scale data analysis and data query. Besides that, biomedical data also has a wide coverage which usually comes from multiple heterogeneous data sources and has different taxonomies, making it hard to integrate and query the big biomedical data. Embedded with domain knowledge from different disciplines all regarding human biological systems, the heterogeneous data repositories are implicitly connected by human expert knowledge. Traditional search engines cannot provide accurate and comprehensive search results for the semantically associated knowledge since they only support keywords-based searches. In this paper, we present BioTCM-SE, a semantic search engine for the information retrieval of modern biology and TCM, which provides biologists with a comprehensive and accurate associated knowledge query platform to greatly facilitate the implicit knowledge discovery between WM and TCM. PMID:24772189
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies
Köhler, Sebastian; Schulz, Marcel H.; Krawitz, Peter; Bauer, Sebastian; Dölken, Sandra; Ott, Claus E.; Mundlos, Christine; Horn, Denise; Mundlos, Stefan; Robinson, Peter N.
2009-01-01
The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign p values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders. PMID:19800049
TOPSAN: a dynamic web database for structural genomics.
Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John
2011-01-01
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
Ontology Based Quality Evaluation for Spatial Data
NASA Astrophysics Data System (ADS)
Yılmaz, C.; Cömert, Ç.
2015-08-01
Many institutions will be providing data to the National Spatial Data Infrastructure (NSDI). Current technical background of the NSDI is based on syntactic web services. It is expected that this will be replaced by semantic web services. The quality of the data provided is important in terms of the decision-making process and the accuracy of transactions. Therefore, the data quality needs to be tested. This topic has been neglected in Turkey. Data quality control for NSDI may be done by private or public "data accreditation" institutions. A methodology is required for data quality evaluation. There are studies for data quality including ISO standards, academic studies and software to evaluate spatial data quality. ISO 19157 standard defines the data quality elements. Proprietary software such as, 1Spatial's 1Validate and ESRI's Data Reviewer offers quality evaluation based on their own classification of rules. Commonly, rule based approaches are used for geospatial data quality check. In this study, we look for the technical components to devise and implement a rule based approach with ontologies using free and open source software in semantic web context. Semantic web uses ontologies to deliver well-defined web resources and make them accessible to end-users and processes. We have created an ontology conforming to the geospatial data and defined some sample rules to show how to test data with respect to data quality elements including; attribute, topo-semantic and geometrical consistency using free and open source software. To test data against rules, sample GeoSPARQL queries are created, associated with specifications.
Bulen, Andrew; Carter, Jonathan J.; Varanka, Dalia E.
2011-01-01
To expand data functionality and capabilities for users of The National Map of the U.S. Geological Survey, data sets for six watersheds and three urban areas were converted from the Best Practices vector data model formats to Semantic Web data formats. This report describes and documents the conver-sion process. The report begins with an introduction to basic Semantic Web standards and the background of The National Map. Data were converted from a proprietary format to Geog-raphy Markup Language to capture the geometric footprint of topographic data features. Configuration files were designed to eliminate redundancy and make the conversion more efficient. A SPARQL endpoint was established for data validation and queries. The report concludes by describing the results of the conversion.
A Semantic Approach for Geospatial Information Extraction from Unstructured Documents
NASA Astrophysics Data System (ADS)
Sallaberry, Christian; Gaio, Mauro; Lesbegueries, Julien; Loustau, Pierre
Local cultural heritage document collections are characterized by their content, which is strongly attached to a territory and its land history (i.e., geographical references). Our contribution aims at making the content retrieval process more efficient whenever a query includes geographic criteria. We propose a core model for a formal representation of geographic information. It takes into account characteristics of different modes of expression, such as written language, captures of drawings, maps, photographs, etc. We have developed a prototype that fully implements geographic information extraction (IE) and geographic information retrieval (IR) processes. All PIV prototype processing resources are designed as Web Services. We propose a geographic IE process based on semantic treatment as a supplement to classical IE approaches. We implement geographic IR by using intersection computing algorithms that seek out any intersection between formal geocoded representations of geographic information in a user query and similar representations in document collection indexes.
BioSearch: a semantic search engine for Bio2RDF
Qiu, Honglei; Huang, Jiacheng
2017-01-01
Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451
NASA Astrophysics Data System (ADS)
Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.
2015-05-01
A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.
Unifying Access to National Hydrologic Data Repositories via Web Services
NASA Astrophysics Data System (ADS)
Valentine, D. W.; Jennings, B.; Zaslavsky, I.; Maidment, D. R.
2006-12-01
The CUAHSI hydrologic information system (HIS) is designed to be a live, multiscale web portal system for accessing, querying, visualizing, and publishing distributed hydrologic observation data and models for any location or region in the United States. The HIS design follows the principles of open service oriented architecture, i.e. system components are represented as web services with well defined standard service APIs. WaterOneFlow web services are the main component of the design. The currently available services have been completely re-written compared to the previous version, and provide programmatic access to USGS NWIS. (steam flow, groundwater and water quality repositories), DAYMET daily observations, NASA MODIS, and Unidata NAM streams, with several additional web service wrappers being added (EPA STORET, NCDC and others.). Different repositories of hydrologic data use different vocabularies, and support different types of query access. Resolving semantic and structural heterogeneities across different hydrologic observation archives and distilling a generic set of service signatures is one of the main scalability challenges in this project, and a requirement in our web service design. To accomplish the uniformity of the web services API, data repositories are modeled following the CUAHSI Observation Data Model. The web service responses are document-based, and use an XML schema to express the semantics in a standard format. Access to station metadata is provided via web service methods, GetSites, GetSiteInfo and GetVariableInfo. The methdods form the foundation of CUAHSI HIS discovery interface and may execute over locally-stored metadata or request the information from remote repositories directly. Observation values are retrieved via a generic GetValues method which is executed against national data repositories. The service is implemented in ASP.Net, and other providers are implementing WaterOneFlow services in java. Reference implementation of WaterOneFlow web services is available. More information about the ongoing development of CUAHSI HIS is available from http://www.cuahsi.org/his/.
NASA Astrophysics Data System (ADS)
Wright, D. J.; Lassoued, Y.; Dwyer, N.; Haddad, T.; Bermudez, L. E.; Dunne, D.
2009-12-01
Coastal mapping plays an important role in informing marine spatial planning, resource management, maritime safety, hazard assessment and even national sovereignty. As such, there is now a plethora of data/metadata catalogs, pre-made maps, tabular and text information on resource availability and exploitation, and decision-making tools. A recent trend has been to encapsulate these in a special class of web-enabled geographic information systems called a coastal web atlas (CWA). While multiple benefits are derived from tailor-made atlases, there is great value added from the integration of disparate CWAs. CWAs linked to one another can query more successfully to optimize planning and decision-making. If a dataset is missing in one atlas, it may be immediately located in another. Similar datasets in two atlases may be combined to enhance study in either region. *But how best to achieve semantic interoperability to mitigate vague data queries, concepts or natural language semantics when retrieving and integrating data and information?* We report on the development of a new prototype seeking to interoperate between two initial CWAs: the Marine Irish Digital Atlas (MIDA) and the Oregon Coastal Atlas (OCA). These two mature atlases are used as a testbed for more regional connections, with the intent for the OCA to use lessons learned to develop a regional network of CWAs along the west coast, and for MIDA to do the same in building and strengthening atlas networks with the UK, Belgium, and other parts of Europe. Our prototype uses semantic interoperability via services harmonization and ontology mediation, allowing local atlases to use their own data structures, and vocabularies (ontologies). We use standard technologies such as OGC Web Map Services (WMS) for delivering maps, and OGC Catalogue Service for the Web (CSW) for delivering and querying ISO-19139 metadata. The metadata records of a given CWA use a given ontology of terms called local ontology. Human or machine users formulate their requests using a common ontology of metadata terms, called global ontology. A CSW mediator rewrites the user’s request into CSW requests over local CSWs using their own (local) ontologies, collects the results and sends them back to the user. To extend the system, we have recently added global maritime boundaries and are also considering nearshore ocean observing system data. Ongoing work includes adding WFS, error management, and exception handling, enabling Smart Searches, and writing full documentation. This prototype is a central research project of the new International Coastal Atlas Network (ICAN), a group of 30+ organizations from 14 nations (and growing) dedicated to seeking interoperability approaches to CWAs in support of coastal zone management and the translation of coastal science to coastal decision-making.
iSMART: Ontology-based Semantic Query of CDA Documents
Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue
2009-01-01
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883
Semantic-based surveillance video retrieval.
Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve
2007-04-01
Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.
Persistent Identifiers for Improved Accessibility for Linked Data Querying
NASA Astrophysics Data System (ADS)
Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.
2016-12-01
The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.
Optimizing a Query by Transformation and Expansion.
Glocker, Katrin; Knurr, Alexander; Dieter, Julia; Dominick, Friederike; Forche, Melanie; Koch, Christian; Pascoe Pérez, Analie; Roth, Benjamin; Ückert, Frank
2017-01-01
In the biomedical sector not only the amount of information produced and uploaded into the web is enormous, but also the number of sources where these data can be found. Clinicians and researchers spend huge amounts of time on trying to access this information and to filter the most important answers to a given question. As the formulation of these queries is crucial, automated query expansion is an effective tool to optimize a query and receive the best possible results. In this paper we introduce the concept of a workflow for an optimization of queries in the medical and biological sector by using a series of tools for expansion and transformation of the query. After the definition of attributes by the user, the query string is compared to previous queries in order to add semantic co-occurring terms to the query. Additionally, the query is enlarged by an inclusion of synonyms. The translation into database specific ontologies ensures the optimal query formulation for the chosen database(s). As this process can be performed in various databases at once, the results are ranked and normalized in order to achieve a comparable list of answers for a question.
Cognitive search model and a new query paradigm
NASA Astrophysics Data System (ADS)
Xu, Zhonghui
2001-06-01
This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.
PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.
Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J
2017-09-20
There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.
Translating standards into practice - one Semantic Web API for Gene Expression.
Deus, Helena F; Prud'hommeaux, Eric; Miller, Michael; Zhao, Jun; Malone, James; Adamusiak, Tomasz; McCusker, Jim; Das, Sudeshna; Rocca Serra, Philippe; Fox, Ronan; Marshall, M Scott
2012-08-01
Sharing and describing experimental results unambiguously with sufficient detail to enable replication of results is a fundamental tenet of scientific research. In today's cluttered world of "-omics" sciences, data standards and standardized use of terminologies and ontologies for biomedical informatics play an important role in reporting high-throughput experiment results in formats that can be interpreted by both researchers and analytical tools. Increasing adoption of Semantic Web and Linked Data technologies for the integration of heterogeneous and distributed health care and life sciences (HCLSs) datasets has made the reuse of standards even more pressing; dynamic semantic query federation can be used for integrative bioinformatics when ontologies and identifiers are reused across data instances. We present here a methodology to integrate the results and experimental context of three different representations of microarray-based transcriptomic experiments: the Gene Expression Atlas, the W3C BioRDF task force approach to reporting Provenance of Microarray Experiments, and the HSCI blood genomics project. Our approach does not attempt to improve the expressivity of existing standards for genomics but, instead, to enable integration of existing datasets published from microarray-based transcriptomic experiments. SPARQL Construct is used to create a posteriori mappings of concepts and properties and linking rules that match entities based on query constraints. We discuss how our integrative approach can encourage reuse of the Experimental Factor Ontology (EFO) and the Ontology for Biomedical Investigations (OBIs) for the reporting of experimental context and results of gene expression studies. Copyright © 2012 Elsevier Inc. All rights reserved.
Conservation-Oriented Hbim. The Bimexplorer Web Tool
NASA Astrophysics Data System (ADS)
Quattrini, R.; Pierdicca, R.; Morbidoni, C.; Malinverni, E. S.
2017-05-01
The application of (H)BIM within the domain of Architectural Historical Heritage has huge potential that can be even exploited within the restoration domain. The work presents a novel approach to solve the widespread interoperability issue related to the data enrichment in BIM environment, by developing and testing a web tool based on a specific workflow experienced choosing as the case study a Romanic church in Portonovo, Ancona, Italy. Following the need to make the data, organized in a BIM environment, usable for the different actors involved in the restoration phase, we have created a pipeline that take advantage of BIM existing platforms and semantic-web technologies, enabling the end user to query a repository composed of semantically structured data. The pipeline of work consists in four major steps: i) modelling an ontology with the main information needs for the domain of interest, providing a data structure that can be leveraged to inform the data-enrichment phase and, later, to meaningfully query the data; ii) data enrichment, by creating a set of shared parameters reflecting the properties in our domain ontology; iii) structuring data in a machine-readable format (through a data conversion) to represent the domain (ontology) and analyse data of specific buildings respectively; iv) development of a demonstrative data exploration web application based on the faceted browsing paradigm and allowing to exploit both structured metadata and 3D visualization. The application can be configured by a domain expert to reflect a given domain ontology, and used by an operator to query and explore the data in a more efficient and reliable way. With the proposed solution the analysis of data can be reused together with the 3D model, providing the end-user with a non proprietary tool; in this way, the planned maintenance or the restoration project became more collaborative and interactive, optimizing the whole process of HBIM data collection.
SIDD: A Semantically Integrated Database towards a Global View of Human Disease
Cheng, Liang; Wang, Guohua; Li, Jie; Zhang, Tianjiao; Xu, Peigang; Wang, Yadong
2013-01-01
Background A number of databases have been developed to collect disease-related molecular, phenotypic and environmental features (DR-MPEs), such as genes, non-coding RNAs, genetic variations, drugs, phenotypes and environmental factors. However, each of current databases focused on only one or two DR-MPEs. There is an urgent demand to develop an integrated database, which can establish semantic associations among disease-related databases and link them to provide a global view of human disease at the biological level. This database, once developed, will facilitate researchers to query various DR-MPEs through disease, and investigate disease mechanisms from different types of data. Methodology To establish an integrated disease-associated database, disease vocabularies used in different databases are mapped to Disease Ontology (DO) through semantic match. 4,284 and 4,186 disease terms from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) respectively are mapped to DO. Then, the relationships between DR-MPEs and diseases are extracted and merged from different source databases for reducing the data redundancy. Conclusions A semantically integrated disease-associated database (SIDD) is developed, which integrates 18 disease-associated databases, for researchers to browse multiple types of DR-MPEs in a view. A web interface allows easy navigation for querying information through browsing a disease ontology tree or searching a disease term. Furthermore, a network visualization tool using Cytoscape Web plugin has been implemented in SIDD. It enhances the SIDD usage when viewing the relationships between diseases and DR-MPEs. The current version of SIDD (Jul 2013) documents 4,465,131 entries relating to 139,365 DR-MPEs, and to 3,824 human diseases. The database can be freely accessed from: http://mlg.hit.edu.cn/SIDD. PMID:24146757
NASA Astrophysics Data System (ADS)
Albeke, S. E.; Perkins, D. G.; Ewers, S. L.; Ewers, B. E.; Holbrook, W. S.; Miller, S. N.
2015-12-01
The sharing of data and results is paramount for advancing scientific research. The Wyoming Center for Environmental Hydrology and Geophysics (WyCEHG) is a multidisciplinary group that is driving scientific breakthroughs to help manage water resources in the Western United States. WyCEHG is mandated by the National Science Foundation (NSF) to share their data. However, the infrastructure from which to share such diverse, complex and massive amounts of data did not exist within the University of Wyoming. We developed an innovative framework to meet the data organization, sharing, and discovery requirements of WyCEHG by integrating both open and closed source software, embedded metadata tags, semantic web technologies, and a web-mapping application. The infrastructure uses a Relational Database Management System as the foundation, providing a versatile platform to store, organize, and query myriad datasets, taking advantage of both structured and unstructured formats. Detailed metadata are fundamental to the utility of datasets. We tag data with Uniform Resource Identifiers (URI's) to specify concepts with formal descriptions (i.e. semantic ontologies), thus allowing users the ability to search metadata based on the intended context rather than conventional keyword searches. Additionally, WyCEHG data are geographically referenced. Using the ArcGIS API for Javascript, we developed a web mapping application leveraging database-linked spatial data services, providing a means to visualize and spatially query available data in an intuitive map environment. Using server-side scripting (PHP), the mapping application, in conjunction with semantic search modules, dynamically communicates with the database and file system, providing access to available datasets. Our approach provides a flexible, comprehensive infrastructure from which to store and serve WyCEHG's highly diverse research-based data. This framework has not only allowed WyCEHG to meet its data stewardship requirements, but can provide a template for others to follow.
User centered and ontology based information retrieval system for life sciences.
Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent
2012-01-25
Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.
User centered and ontology based information retrieval system for life sciences
2012-01-01
Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375
Linked Data: what does it offer Earth Sciences?
NASA Astrophysics Data System (ADS)
Cox, Simon; Schade, Sven
2010-05-01
'Linked Data' is a current buzz-phrase promoting access to various forms of data on the internet. It starts from the two principles that have underpinned the architecture and scalability of the World Wide Web: 1. Universal Resource Identifiers - using the http protocol which is supported by the DNS system. 2. Hypertext - in which URIs of related resources are embedded within a document. Browsing is the key mode of interaction, with traversal of links between resources under control of the client. Linked Data also adds, or re-emphasizes: • Content negotiation - whereby the client uses http headers to tell the service what representation of a resource is acceptable, • Semantic Web principles - formal semantics for links, following the RDF data model and encoding, and • The 'mashup' effect - in which original and unexpected value may emerge from reuse of data, even if published in raw or unpolished form. Linked Data promotes typed links to all kinds of data, so is where the semantic web meets the 'deep web', i.e. resources which may be accessed using web protocols, but are in representations not indexed by search engines. Earth sciences are data rich, but with a strong legacy of specialized formats managed and processed by disconnected applications. However, most contemporary research problems require a cross-disciplinary approach, in which the heterogeneity resulting from that legacy is a significant challenge. In this context, Linked Data clearly has much to offer the earth sciences. But, there are some important questions to answer. What is a resource? Most earth science data is organized in arrays and databases. A subset useful for a particular study is usually identified by a parameterized query. The Linked Data paradigm emerged from the world of documents, and will often only resolve data-sets. It is impractical to create even nested navigation resources containing links to all potentially useful objects or subsets. From the viewpoint of human user interfaces, the browse metaphor, which has been such an important part of the success of the web, must be augmented with other interaction mechanisms, including query. What are the impacts on search and metadata? Hypertext provides links selected by the page provider. However, science should endeavor to be exhaustive in its use of data. Resource discovery through links must be supplemented by more systematic data discovery through search. Conversely, the crawlers that generate search indexes must be fed by resource providers (a) serving navigation pages with links to every dataset (b) adding enough 'metadata' (semantics) on each link to effectively populate the indexes. Linked Data makes this easier due to its integration with semantic web technologies, including structured vocabularies. What is the relation between structured data and Linked Data? Linked Data has focused on web-pages (primarily HTML) for human browsing, and RDF for semantics, assuming that other representations are opaque. However, this overlooks the wealth of XML data on the web, some of which is structured according to XML Schemas that provide semantics. Technical applications can use content-negotiation to get a structured representation, and exploit its semantics. Particularly relevant for earth sciences are data representations based on OGC Geography Markup Language (GML), such as GeoSciML, O&M and MOLES. GML was strongly influenced by RDF, and typed links are intrinsic: xlink:href plays the role that rdf:resource does in RDF representations. Services which expose GML-formatted resources (such as OGC Web Feature Service) are a prototype of Linked Data. Giving credit where it is due. Organizations investing in data collection may be reluctant to publish the raw data prior to completing an initial analysis. To encourage early data publication the system must provide suitable incentives, and citation analysis must recognize the increasing diversity of publication routes and forms. Linked Data makes it easier to include rich citation information when data is both published and used.
Building a semi-automatic ontology learning and construction system for geosciences
NASA Astrophysics Data System (ADS)
Babaie, H. A.; Sunderraman, R.; Zhu, Y.
2013-12-01
We are developing an ontology learning and construction framework that allows continuous, semi-automatic knowledge extraction, verification, validation, and maintenance by potentially a very large group of collaborating domain experts in any geosciences field. The system brings geoscientists from the side-lines to the center stage of ontology building, allowing them to collaboratively construct and enrich new ontologies, and merge, align, and integrate existing ontologies and tools. These constantly evolving ontologies can more effectively address community's interests, purposes, tools, and change. The goal is to minimize the cost and time of building ontologies, and maximize the quality, usability, and adoption of ontologies by the community. Our system will be a domain-independent ontology learning framework that applies natural language processing, allowing users to enter their ontology in a semi-structured form, and a combined Semantic Web and Social Web approach that lets direct participation of geoscientists who have no skill in the design and development of their domain ontologies. A controlled natural language (CNL) interface and an integrated authoring and editing tool automatically convert syntactically correct CNL text into formal OWL constructs. The WebProtege-based system will allow a potentially large group of geoscientists, from multiple domains, to crowd source and participate in the structuring of their knowledge model by sharing their knowledge through critiquing, testing, verifying, adopting, and updating of the concept models (ontologies). We will use cloud storage for all data and knowledge base components of the system, such as users, domain ontologies, discussion forums, and semantic wikis that can be accessed and queried by geoscientists in each domain. We will use NoSQL databases such as MongoDB as a service in the cloud environment. MongoDB uses the lightweight JSON format, which makes it convenient and easy to build Web applications using just HTML5 and Javascript, thereby avoiding cumbersome server side coding present in the traditional approaches. The JSON format used in MongoDB is also suitable for storing and querying RDF data. We will store the domain ontologies and associated linked data in JSON/RDF formats. Our Web interface will be built upon the open source and configurable WebProtege ontology editor. We will develop a simplified mobile version of our user interface which will automatically detect the hosting device and adjust the user interface layout to accommodate different screen sizes. We will also use the Semantic Media Wiki that allows the user to store and query the data within the wiki pages. By using HTML 5, JavaScript, and WebGL, we aim to create an interactive, dynamic, and multi-dimensional user interface that presents various geosciences data sets in a natural and intuitive way.
CartograTree: connecting tree genomes, phenotypes and environment.
Vasquez-Gross, Hans A; Yu, John J; Figueroa, Ben; Gessler, Damian D G; Neale, David B; Wegrzyn, Jill L
2013-05-01
Today, researchers spend a tremendous amount of time gathering, formatting, filtering and visualizing data collected from disparate sources. Under the umbrella of forest tree biology, we seek to provide a platform and leverage modern technologies to connect biotic and abiotic data. Our goal is to provide an integrated web-based workspace that connects environmental, genomic and phenotypic data via geo-referenced coordinates. Here, we connect the genomic query web-based workspace, DiversiTree and a novel geographical interface called CartograTree to data housed on the TreeGenes database. To accomplish this goal, we implemented Simple Semantic Web Architecture and Protocol to enable the primary genomics database, TreeGenes, to communicate with semantic web services regardless of platform or back-end technologies. The novelty of CartograTree lies in the interactive workspace that allows for geographical visualization and engagement of high performance computing (HPC) resources. The application provides a unique tool set to facilitate research on the ecology, physiology and evolution of forest tree species. CartograTree can be accessed at: http://dendrome.ucdavis.edu/cartogratree. © 2013 Blackwell Publishing Ltd.
Creating personalised clinical pathways by semantic interoperability with electronic health records.
Wang, Hua-Qiong; Li, Jing-Song; Zhang, Yi-Fan; Suzuki, Muneou; Araki, Kenji
2013-06-01
There is a growing realisation that clinical pathways (CPs) are vital for improving the treatment quality of healthcare organisations. However, treatment personalisation is one of the main challenges when implementing CPs, and the inadequate dynamic adaptability restricts the practicality of CPs. The purpose of this study is to improve the practicality of CPs using semantic interoperability between knowledge-based CPs and semantic electronic health records (EHRs). Simple protocol and resource description framework query language is used to gather patient information from semantic EHRs. The gathered patient information is entered into the CP ontology represented by web ontology language. Then, after reasoning over rules described by semantic web rule language in the Jena semantic framework, we adjust the standardised CPs to meet different patients' practical needs. A CP for acute appendicitis is used as an example to illustrate how to achieve CP customisation based on the semantic interoperability between knowledge-based CPs and semantic EHRs. A personalised care plan is generated by comprehensively analysing the patient's personal allergy history and past medical history, which are stored in semantic EHRs. Additionally, by monitoring the patient's clinical information, an exception is recorded and handled during CP execution. According to execution results of the actual example, the solutions we present are shown to be technically feasible. This study contributes towards improving the clinical personalised practicality of standardised CPs. In addition, this study establishes the foundation for future work on the research and development of an independent CP system. Copyright © 2013 Elsevier B.V. All rights reserved.
Yokochi, Masashi; Kobayashi, Naohiro; Ulrich, Eldon L; Kinjo, Akira R; Iwata, Takeshi; Ioannidis, Yannis E; Livny, Miron; Markley, John L; Nakamura, Haruki; Kojima, Chojiro; Fujiwara, Toshimichi
2016-05-05
The nuclear magnetic resonance (NMR) spectroscopic data for biological macromolecules archived at the BioMagResBank (BMRB) provide a rich resource of biophysical information at atomic resolution. The NMR data archived in NMR-STAR ASCII format have been implemented in a relational database. However, it is still fairly difficult for users to retrieve data from the NMR-STAR files or the relational database in association with data from other biological databases. To enhance the interoperability of the BMRB database, we present a full conversion of BMRB entries to two standard structured data formats, XML and RDF, as common open representations of the NMR-STAR data. Moreover, a SPARQL endpoint has been deployed. The described case study demonstrates that a simple query of the SPARQL endpoints of the BMRB, UniProt, and Online Mendelian Inheritance in Man (OMIM), can be used in NMR and structure-based analysis of proteins combined with information of single nucleotide polymorphisms (SNPs) and their phenotypes. We have developed BMRB/XML and BMRB/RDF and demonstrate their use in performing a federated SPARQL query linking the BMRB to other databases through standard semantic web technologies. This will facilitate data exchange across diverse information resources.
Cameron, Delroy; Sheth, Amit P; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A
2014-12-01
While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and "intelligible constructs" not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems.
Cameron, Delroy; Sheth, Amit P.; Jaykumar, Nishita; Thirunarayan, Krishnaprasad; Anand, Gaurish; Smith, Gary A.
2015-01-01
While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and “intelligible constructs” not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called lexico-ontology concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems. PMID:25814917
Content-Based Discovery for Web Map Service using Support Vector Machine and User Relevance Feedback
Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi
2016-01-01
Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery. PMID:27861505
Hu, Kai; Gui, Zhipeng; Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi
2016-01-01
Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery.
Jadhav, Ashutosh; Sheth, Amit; Pathak, Jyotishman
2014-01-01
Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites. PMID:25954380
Digital Workflows for a 3d Semantic Representation of AN Ancient Mining Landscape
NASA Astrophysics Data System (ADS)
Hiebel, G.; Hanke, K.
2017-08-01
The ancient mining landscape of Schwaz/Brixlegg in the Tyrol, Austria witnessed mining from prehistoric times to modern times creating a first order cultural landscape when it comes to one of the most important inventions in human history: the production of metal. In 1991 a part of this landscape was lost due to an enormous landslide that reshaped part of the mountain. With our work we want to propose a digital workflow to create a 3D semantic representation of this ancient mining landscape with its mining structures to preserve it for posterity. First, we define a conceptual model to integrate the data. It is based on the CIDOC CRM ontology and CRMgeo for geometric data. To transform our information sources to a formal representation of the classes and properties of the ontology we applied semantic web technologies and created a knowledge graph in RDF (Resource Description Framework). Through the CRMgeo extension coordinate information of mining features can be integrated into the RDF graph and thus related to the detailed digital elevation model that may be visualized together with the mining structures using Geoinformation systems or 3D visualization tools. The RDF network of the triple store can be queried using the SPARQL query language. We created a snapshot of mining, settlement and burial sites in the Bronze Age. The results of the query were loaded into a Geoinformation system and a visualization of known bronze age sites related to mining, settlement and burial activities was created.
Sharing and executing linked data queries in a collaborative environment.
García Godoy, María Jesús; López-Camacho, Esteban; Navas-Delgado, Ismael; Aldana-Montes, José F
2013-07-01
Life Sciences have emerged as a key domain in the Linked Data community because of the diversity of data semantics and formats available through a great variety of databases and web technologies. Thus, it has been used as the perfect domain for applications in the web of data. Unfortunately, bioinformaticians are not exploiting the full potential of this already available technology, and experts in Life Sciences have real problems to discover, understand and devise how to take advantage of these interlinked (integrated) data. In this article, we present Bioqueries, a wiki-based portal that is aimed at community building around biological Linked Data. This tool has been designed to aid bioinformaticians in developing SPARQL queries to access biological databases exposed as Linked Data, and also to help biologists gain a deeper insight into the potential use of this technology. This public space offers several services and a collaborative infrastructure to stimulate the consumption of biological Linked Data and, therefore, contribute to implementing the benefits of the web of data in this domain. Bioqueries currently contains 215 query entries grouped by database and theme, 230 registered users and 44 end points that contain biological Resource Description Framework information. The Bioqueries portal is freely accessible at http://bioqueries.uma.es. Supplementary data are available at Bioinformatics online.
2011-01-01
Background Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information. Results Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase. Conclusions By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research. PMID:21595881
Chepelev, Leonid L; Dumontier, Michel
2011-05-19
Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information. Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase. By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research.
Publishing high-quality climate data on the semantic web
NASA Astrophysics Data System (ADS)
Woolf, Andrew; Haller, Armin; Lefort, Laurent; Taylor, Kerry
2013-04-01
The effort over more than a decade to establish the semantic web [Berners-Lee et. al., 2001] has received a major boost in recent years through the Open Government movement. Governments around the world are seeking technical solutions to enable more open and transparent access to Public Sector Information (PSI) they hold. Existing technical protocols and data standards tend to be domain specific, and so limit the ability to publish and integrate data across domains (health, environment, statistics, education, etc.). The web provides a domain-neutral platform for information publishing, and has proven itself beyond expectations for publishing and linking human-readable electronic documents. Extending the web pattern to data (often called Web 3.0) offers enormous potential. The semantic web applies the basic web principles to data [Berners-Lee, 2006]: using URIs as identifiers (for data objects and real-world 'things', instead of documents) making the URIs actionable by providing useful information via HTTP using a common exchange standard (serialised RDF for data instead of HTML for documents) establishing typed links between information objects to enable linking and integration Leading examples of 'linked data' for publishing PSI may be found in both the UK (http://data.gov.uk/linked-data) and US (http://www.data.gov/page/semantic-web). The Bureau of Meteorology (BoM) is Australia's national meteorological agency, and has a new mandate to establish a national environmental information infrastructure (under the National Plan for Environmental Information, NPEI [BoM, 2012a]). While the initial approach is based on the existing best practice Spatial Data Infrastructure (SDI) architecture, linked-data is being explored as a technological alternative that shows great promise for the future. We report here the first trial of government linked-data in Australia under data.gov.au. In this initial pilot study, we have taken BoM's new high-quality reference surface temperature dataset, Australian Climate Observations Reference Network - Surface Air Temperature (ACORN-SAT) [BoM, 2012b]. This dataset contains daily homogenised surface temperature observations for 112 locations around Australia, dating back to 1910. An ontology for the dataset was developed [Lefort et. al., 2012], based on the existing Semantic Sensor Network ontology [Compton et. al., 2012] and the W3C RDF Data Cube vocabulary [W3C, 2012]. Additional vocabularies were developed, e.g. for BoM weather stations and rainfall districts. The dataset was converted to RDF and loaded into an RDF triplestore. The Linked-Data API (http://code.google.com/p/linked-data-api) was used to configure specific URI query patterns (e.g. for observation timeseries slices by station), and a SPARQL endpoint was provided for direct querying. In addition, some demonstration 'mash-ups' were developed, providing an interactive browser-based interface to the temperature timeseries. References [Berners-Lee et. al., 2001] Tim Berners-Lee, James Hendler and Ora Lassila (2001), "The Semantic Web", Scientific American, May 2001. [Berners-Lee, 2006] Tim Berners-Lee (2006), "Linked Data - Design Issues", W3C [http://www.w3.org/DesignIssues/LinkedData.html] [BoM, 2012a] Bureau of Meteorology (2012), "Environmental information" [http://www.bom.gov.au/environment/] [BoM, 2012b] Bureau of Meteorology (2012), "Australian Climate Observations Reference Network - Surface Air Temperature" [http://www.bom.gov.au/climate/change/acorn-sat/] [Compton et. al., 2012] Michael Compton, Payam Barnaghi, Luis Bermudez, Raul Garcia-Castro, Oscar Corcho, Simon Cox, John Graybeal, Manfred Hauswirth, Cory Henson, Arthur Herzog, Vincent Huang, Krzysztof Janowicz, W. David Kelsey, Danh Le Phuoc, Laurent Lefort, Myriam Leggieri, Holger Neuhaus, Andriy Nikolov, Kevin Page, Alexandre Passant, Amit Sheth, Kerry Taylor (2012), "The SSN Ontology of the W3C Semantic Sensor Network Incubator Group", J. Web Semantics, 17 (2012) [http://dx.doi.org/10.1016/j.websem.2012.05.003] [Lefort et. al., 2012] Laurent Lefort, Josh Bobruk, Armin Haller, Kerry Taylor and Andrew Woolf (2012), "A Linked Sensor Data Cube for a 100 Year Homogenised daily temperature dataset", Proc. Semantic Sensor Networks 2012 [http://ceur-ws.org/Vol-904/paper10.pdf] [W3C, 2012] W3C (2012), "The RDF Data Cube Vocabulary", [http://www.w3.org/TR/vocab-data-cube/
Can social semantic web techniques foster collaborative curriculum mapping in medicine?
Spreckelsen, Cord; Finsterer, Sonja; Cremer, Jan; Schenkat, Hennig
2013-08-15
Curriculum mapping, which is aimed at the systematic realignment of the planned, taught, and learned curriculum, is considered a challenging and ongoing effort in medical education. Second-generation curriculum managing systems foster knowledge management processes including curriculum mapping in order to give comprehensive support to learners, teachers, and administrators. The large quantity of custom-built software in this field indicates a shortcoming of available IT tools and standards. The project reported here aims at the systematic adoption of techniques and standards of the Social Semantic Web to implement collaborative curriculum mapping for a complete medical model curriculum. A semantic MediaWiki (SMW)-based Web application has been introduced as a platform for the elicitation and revision process of the Aachen Catalogue of Learning Objectives (ACLO). The semantic wiki uses a domain model of the curricular context and offers structured (form-based) data entry, multiple views, structured querying, semantic indexing, and commenting for learning objectives ("LOs"). Semantic indexing of learning objectives relies on both a controlled vocabulary of international medical classifications (ICD, MeSH) and a folksonomy maintained by the users. An additional module supporting the global checking of consistency complements the semantic wiki. Statements of the Object Constraint Language define the consistency criteria. We evaluated the application by a scenario-based formative usability study, where the participants solved tasks in the (fictional) context of 7 typical situations and answered a questionnaire containing Likert-scaled items and free-text questions. At present, ACLO contains roughly 5350 operational (ie, specific and measurable) objectives acquired during the last 25 months. The wiki-based user interface uses 13 online forms for data entry and 4 online forms for flexible searches of LOs, and all the forms are accessible by standard Web browsers. The formative usability study yielded positive results (median rating of 2 ("good") in all 7 general usability items) and produced valuable qualitative feedback, especially concerning navigation and comprehensibility. Although not asked to, the participants (n=5) detected critical aspects of the curriculum (similar learning objectives addressed repeatedly and missing objectives), thus proving the system's ability to support curriculum revision. The SMW-based approach enabled an agile implementation of computer-supported knowledge management. The approach, based on standard Social Semantic Web formats and technology, represents a feasible and effectively applicable compromise between answering to the individual requirements of curriculum management at a particular medical school and using proprietary systems.
Case Studies in Describing Scientific Research Efforts as Linked Data
NASA Astrophysics Data System (ADS)
Gandara, A.; Villanueva-Rosales, N.; Gates, A.
2013-12-01
The Web is growing with numerous scientific resources, prompting increased efforts in information management to consider integration and exchange of scientific resources. Scientists have many options to share scientific resources on the Web; however, existing options provide limited support to scientists in annotating and relating research resources resulting from a scientific research effort. Moreover, there is no systematic approach to documenting scientific research and sharing it on the Web. This research proposes the Collect-Annotate-Refine-Publish (CARP) Methodology as an approach for guiding documentation of scientific research on the Semantic Web as scientific collections. Scientific collections are structured descriptions about scientific research that make scientific results accessible based on context. In addition, scientific collections enhance the Linked Data data space and can be queried by machines. Three case studies were conducted on research efforts at the Cyber-ShARE Research Center of Excellence in order to assess the effectiveness of the methodology to create scientific collections. The case studies exposed the challenges and benefits of leveraging the Semantic Web and Linked Data data space to facilitate access, integration and processing of Web-accessible scientific resources and research documentation. As such, we present the case study findings and lessons learned in documenting scientific research using CARP.
The EuroGEOSS Advanced Operating Capacity
NASA Astrophysics Data System (ADS)
Nativi, S.; Vaccari, L.; Stock, K.; Diaz, L.; Santoro, M.
2012-04-01
The concept of multidisciplinary interoperability for managing societal issues is a major challenge presently faced by the Earth and Space Science Informatics community. With this in mind, EuroGEOSS project was launched on May 1st 2009 for a three year period aiming to demonstrate the added value to the scientific community and society of providing existing earth observing systems and applications in an interoperable manner and used within the GEOSS and INSPIRE frameworks. In the first period, the project built an Initial Operating Capability (IOC) in the three strategic areas of Drought, Forestry and Biodiversity; this was then enhanced into an Advanced Operating Capacity (AOC) for multidisciplinary interoperability. Finally, the project extended the infrastructure to other scientific domains (geology, hydrology, etc.). The EuroGEOSS multidisciplinary AOC is based on the Brokering Approach. This approach aims to achieve multidisciplinary interoperability by developing an extended SOA (Service Oriented Architecture) where a new type of "expert" components is introduced: the Broker. These implement all mediation and distribution functionalities needed to interconnect the distributed and heterogeneous resources characterizing a System of Systems (SoS) environment. The EuroGEOSS AOC is comprised of the following components: • EuroGEOSS Discovery Broker: providing harmonized discovery functionalities by mediating and distributing user queries against tens of heterogeneous services; • EuroGEOSS Access Broker: enabling users to seamlessly access and use heterogeneous remote resources via a unique and standard service; • EuroGEOSS Web 2.0 Broker: enhancing the capabilities of the Discovery Broker with queries towards the new Web 2.0 services; • EuroGEOSS Semantic Discovery Broker: enhancing the capabilities of the Discovery Broker with semantic query-expansion; • EuroGEOSS Natural Language Search Component: providing users with the possibilities to search for resources using natural language queries; • Service Composition Broker: allowing users to compose and execute complex Business Processes, based on the technology developed by the FP7 UncertWeb project. Recently, the EuroGEOSS Brokering framework was presented at the GEO-VIII Plenary and Exhibition in Istanbul and introduced into the GEOSS Common Infrastructure.
Semantic Web-based Vocabulary Broker for Open Science
NASA Astrophysics Data System (ADS)
Ritschel, B.; Neher, G.; Iyemori, T.; Murayama, Y.; Kondo, Y.; Koyama, Y.; King, T. A.; Galkin, I. A.; Fung, S. F.; Wharton, S.; Cecconi, B.
2016-12-01
Keyword vocabularies are used to tag and to identify data of science data repositories. Such vocabularies consist of controlled terms and the appropriate concepts, such as GCMD1 keywords or the ESPAS2 keyword ontology. The Semantic Web-based mash-up of domain-specific, cross- or even trans-domain vocabularies provides unique capabilities in the network of appropriate data resources. Based on a collaboration between GFZ3, the FHP4, the WDC for Geomagnetism5 and the NICT6 we developed the concept of a vocabulary broker for inter- and trans-disciplinary data detection and integration. Our prototype of the Semantic Web-based vocabulary broker uses OSF7 for the mash-up of geo and space research vocabularies, such as GCMD keywords, ESPAS keyword ontology and SPASE8 keyword vocabulary. The vocabulary broker starts the search with "free" keywords or terms of a specific vocabulary scheme. The vocabulary broker almost automatically connects the different science data repositories which are tagged by terms of the aforementioned vocabularies. Therefore the mash-up of the SKOS9 based vocabularies with appropriate metadata from different domains can be realized by addressing LOD10 resources or virtual SPARQL11 endpoints which maps relational structures into the RDF format12. In order to demonstrate such a mash-up approach in real life, we installed and use a D2RQ13 server for the integration of IUGONET14 data which are managed by a relational database. The OSF based vocabulary broker and the D2RQ platform are installed at virtual LINUX machines at the Kyoto University. The vocabulary broker meets the standard of a main component of the WDS15 knowledge network. The Web address of the vocabulary broker is http://wdcosf.kugi.kyoto-u.ac.jp 1 Global Change Master Directory2 Near earth space data infrastructure for e-science3 German Research Centre for Geosciences4 University of Applied Sciences Potsdam5 World Data Center for Geomagnetism Kyoto6 National Institute of Information and Communications Technology Tokyo7 Open Semantic Framework8 Space Physics Archive Search and Extract9 Simple Knowledge Organization System10 Linked Open Data11 SPARQL Protocol And RDF Query12 Resource Description Framework13 Database to RDF Query14 Inter-university Upper atmosphere Global Observation NETwork15 World Data System
Enhancing Geoscience Research Discovery Through the Semantic Web
NASA Astrophysics Data System (ADS)
Rowan, Linda R.; Gross, M. Benjamin; Mayernik, Matthew; Khan, Huda; Boler, Frances; Maull, Keith; Stott, Don; Williams, Steve; Corson-Rikert, Jon; Johns, Erica M.; Daniels, Michael; Krafft, Dean B.; Meertens, Charles
2016-04-01
UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, a U.S. National Science Foundation EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to enhance connectivity across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Much of the VIVO ontology was built for the life sciences, so we have added some components of existing geoscience-based ontologies and a few terms from a local ontology that we created. The UNAVCO VIVO instance, connect.unavco.org, utilizes persistent identifiers whenever possible; for example using ORCIDs for people, publication DOIs, data DOIs and unique NSF grant numbers. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page shows, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can be queried using SPARQL, a query language for semantic data. EarthCollab is extending the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. About half of UNAVCO's membership is international and we hope to connect our data to institutions in other countries with a similar approach. Additional extensions, including enhanced geospatial capabilities, will be developed based on task-centered usability testing.
Exposing SAMOS Data and Vocabularies within the Semantic Web
NASA Astrophysics Data System (ADS)
Dockery, Nkemdirim; Elya, Jocelyn; Smith, Shawn
2014-05-01
As part of the Ocean Data Interoperability Platform (ODIP), we at the Center for Ocean-Atmospheric Prediction Studies (COAPS) will present the development process for the exposure of quality-controlled data and core vocabularies managed by the Shipboard Automated Meteorological Oceanographic System (SAMOS) initiative using Semantic Web technologies. Participants in the SAMOS initiative collect continuous navigational (position, course, heading, speed), meteorological (winds, pressure, temperature, humidity, radiation), and near-surface oceanographic (sea temperature, salinity) parameters while at sea. One-minute interval observations are packaged and transmitted back to COAPS via daily emails, where they undergo standardized formatting and quality control. The authors will present methods used to expose these daily datasets. The Semantic Web, a vision of the World Wide Web Consortium, focuses on extending the principles of the web from connecting documents to connecting data. The creation of a web of Linked Data that can be used across different applications in a machine-readable way is the ultimate goal. The Resource Description Framework (RDF) is the standard language and format used in the Semantic Web. RDF pages may be queried using the SPARQL Protocol and RDF Query Language (SPARQL). The authors will showcase the development of RDF resources that map SAMOS vocabularies to internationally served vocabularies such as those found in the Natural Environment Research Council (NERC) Vocabulary Server. Each individual SAMOS vocabulary term (data parameter and quality control flag) will be described in an RDF resource page. These RDF resources will define each SAMOS vocabulary term and provide a link to the mapped vocabulary term (or multiple terms) served externally. Along with enhanced retrieval by parameter, time, and location, we will be able to add additional parameters with the confidence that they follow an international standard. The production of RDF resources that link daily SAMOS data to descriptors such as parameters, time and location information, quality assurance reports, and cruise tracks will also be described. The data is housed on a Thematic Real-time Environmental Distributed Data Services (THREDDS) data server, so these RDF resources will enable enhanced retrieval by any of the linked descriptors. We will showcase our collaboration with the Rolling Deck to Repository (R2R) program to develop SPARQL endpoints that distribute SAMOS content. R2R packages and transmits data on a per cruise basis, so an immediate result of the SAMOS exposure will be the narrowing of the gap between expedition type data (e.g. R2R cruises) and SAMOS observatory type data. The authors will present the development of RDF resources that will collectively expose shipboard data, vocabularies, and quality assurance reports in an overall structure which will serve as the basis for a COAPS SPARQL endpoint, enabling easier programmatic access to SAMOS data.
The MMI Device Ontology: Enabling Sensor Integration
NASA Astrophysics Data System (ADS)
Rueda, C.; Galbraith, N.; Morris, R. A.; Bermudez, L. E.; Graybeal, J.; Arko, R. A.; Mmi Device Ontology Working Group
2010-12-01
The Marine Metadata Interoperability (MMI) project has developed an ontology for devices to describe sensors and sensor networks. This ontology is implemented in the W3C Web Ontology Language (OWL) and provides an extensible conceptual model and controlled vocabularies for describing heterogeneous instrument types, with different data characteristics, and their attributes. It can help users populate metadata records for sensors; associate devices with their platforms, deployments, measurement capabilities and restrictions; aid in discovery of sensor data, both historic and real-time; and improve the interoperability of observational oceanographic data sets. We developed the MMI Device Ontology following a community-based approach. By building on and integrating other models and ontologies from related disciplines, we sought to facilitate semantic interoperability while avoiding duplication. Key concepts and insights from various communities, including the Open Geospatial Consortium (eg., SensorML and Observations and Measurements specifications), Semantic Web for Earth and Environmental Terminology (SWEET), and W3C Semantic Sensor Network Incubator Group, have significantly enriched the development of the ontology. Individuals ranging from instrument designers, science data producers and consumers to ontology specialists and other technologists contributed to the work. Applications of the MMI Device Ontology are underway for several community use cases. These include vessel-mounted multibeam mapping sonars for the Rolling Deck to Repository (R2R) program and description of diverse instruments on deepwater Ocean Reference Stations for the OceanSITES program. These trials involve creation of records completely describing instruments, either by individual instances or by manufacturer and model. Individual terms in the MMI Device Ontology can be referenced with their corresponding Uniform Resource Identifiers (URIs) in sensor-related metadata specifications (e.g., SensorML, NetCDF). These identifiers can be resolved through a web browser, or other client applications via HTTP against the MMI Ontology Registry and Repository (ORR), where the ontology is maintained. SPARQL-based query capabilities, which are enhanced with reasoning, along with several supported output formats, allow the effective interaction of diverse client applications with the semantic information associated with the device ontology. In this presentation we describe the process for the development of the MMI Device Ontology and illustrate extensions and applications that demonstrate the benefits of adopting this semantic approach, including example queries involving inference. We also highlight the issues encountered and future work.
Centrality based Document Ranking
2014-11-01
clinical domain and very uncommon elsewhere. A regular IR system may fail to rank documents from such a domain, dealing with symptoms, diagnosis and...description). We prepared a hand-crafted list of synonyms for each of the query types, viz. diagnosis , test and treatment. This list was used to expand the...Miller. Semantic search. In INTERNATIONAL WORLD WIDE WEB CONFERENCE, pages 700–709. ACM, 2003. 8. A. Hanbury and M. Lupu . Toward a Model of Domain
Dugan, J M; Berrios, D C; Liu, X; Kim, D K; Kaizer, H; Fagan, L M
1999-01-01
Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models.
NASA Astrophysics Data System (ADS)
Willmes, C.
2017-12-01
In the frame of the Collaborative Research Centre 806 (CRC 806) an interdisciplinary research project, that needs to manage data, information and knowledge from heterogeneous domains, such as archeology, cultural sciences, and the geosciences, a collaborative internal knowledge base system was developed. The system is based on the open source MediaWiki software, that is well known as the software that enables Wikipedia, for its facilitation of a web based collaborative knowledge and information management platform. This software is additionally enhanced with the Semantic MediaWiki (SMW) extension, that allows to store and manage structural data within the Wiki platform, as well as it facilitates complex query and API interfaces to the structured data stored in the SMW data base. Using an additional open source software called mobo, it is possible to improve the data model development process, as well as automated data imports, from small spreadsheets to large relational databases. Mobo is a command line tool that helps building and deploying SMW structure in an agile, Schema-Driven Development way, and allows to manage and collaboratively develop the data model formalizations, that are formalized in JSON-Schema format, using version control systems like git. The combination of a well equipped collaborative web platform facilitated by Mediawiki, the possibility to store and query structured data in this collaborative database provided by SMW, as well as the possibility for automated data import and data model development enabled by mobo, result in a powerful but flexible system to build and develop a collaborative knowledge base system. Furthermore, SMW allows the application of Semantic Web technology, the structured data can be exported into RDF, thus it is possible to set a triple-store including a SPARQL endpoint on top of the database. The JSON-Schema based data models, can be enhanced into JSON-LD, to facilitate and profit from the possibilities of Linked Data technology.
NASA Astrophysics Data System (ADS)
Gray, A. J. G.; Gray, N.; Ounis, I.
2009-09-01
There are multiple vocabularies and thesauri within astronomy, of which the best known are the 1993 IAU Thesaurus and the keyword list maintained by A&A, ApJ and MNRAS. The IVOA has agreed on a standard for publishing vocabularies, based on the W3C skos standard, to allow greater automated interaction with them, in particular on the Web. This allows links with the Semantic Web and looks forward to richer applications using the technologies of that domain. Vocabulary-aware applications can benefit from improvements in both precision and recall when searching for bibliographic or science data, and lightweight intelligent filtering for services such as VOEvent streams. In this paper we present two applications, the Vocabulary Explorer and its companion the Mapping Editor, which have been developed to support the use of vocabularies in the Virtual Observatory. These combine Semantic Web and Information Retrieval technologies to illustrate the way in which formal vocabularies might be used in a practical application, provide an online service which will allow astronomers to explore and relate existing vocabularies, and provide a service which translates free text user queries into vocabulary terms.
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.
Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W
2015-06-01
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E.; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data. PMID:21045053
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.
Visual Exploratory Search of Relationship Graphs on Smartphones
Ouyang, Jianquan; Zheng, Hao; Kong, Fanbin; Liu, Tianming
2013-01-01
This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era. PMID:24223936
EarthCube GeoLink: Semantics and Linked Data for the Geosciences
NASA Astrophysics Data System (ADS)
Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Hitzler, P.; Janowicz, K.; Ji, P.; Jones, M. B.; Krisnadhi, A.; Lehnert, K. A.; Mickle, A.; Narock, T.; O'Brien, M.; Raymond, L. M.; Schildhauer, M.; Shepherd, A.; Wiebe, P. H.
2015-12-01
The NSF EarthCube initiative is building next-generation cyberinfrastructure to aid geoscientists in collecting, accessing, analyzing, sharing, and visualizing their data and knowledge. The EarthCube GeoLink Building Block project focuses on a specific set of software protocols and vocabularies, often characterized as the Semantic Web and "Linked Data", to publish data online in a way that is easily discoverable, accessible, and interoperable. GeoLink brings together specialists from the computer science, geoscience, and library science domains, and includes data from a network of NSF-funded repositories that support scientific studies in marine geology, marine ecosystems, biogeochemistry, and paleoclimatology. We are working collaboratively with closely-related Building Block projects including EarthCollab and CINERGI, and solicit feedback from RCN projects including Cyberinfrastructure for Paleogeosciences (C4P) and iSamples. GeoLink has developed a modular ontology that describes essential geoscience research concepts; published data from seven collections (to date) on the Web as geospatially-enabled Linked Data using this ontology; matched and mapped data between collections using shared identifiers for investigators, repositories, datasets, funding awards, platforms, research cruises, physical specimens, and gazetteer features; and aggregated the results in a shared knowledgebase that can be queried via a standard SPARQL endpoint. Client applications have been built around the knowledgebase, including a Web/map-based data browser using the Leaflet JavaScript library and a simple query service using the OpenSearch format. Future development will include extending and refining the GeoLink ontology, adding content from additional repositories, developing semi-automated algorithms to enhance metadata, and further work on client applications.
Semantic Metadata for Heterogeneous Spatial Planning Documents
NASA Astrophysics Data System (ADS)
Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.
2016-09-01
Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Generating Personalized Web Search Using Semantic Context
Xu, Zheng; Chen, Hai-Yan; Yu, Jie
2015-01-01
The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs. PMID:26000335
Exposing the cancer genome atlas as a SPARQL endpoint
Deus, Helena F.; Veiga, Diogo F.; Freire, Pablo R.; Weinstein, John N.; Mills, Gordon B.; Almeida, Jonas S.
2011-01-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. PMID:20851208
An ontology design pattern for surface water features
Sinha, Gaurav; Mark, David; Kolas, Dave; Varanka, Dalia; Romero, Boleslo E.; Feng, Chen-Chieh; Usery, E. Lynn; Liebermann, Joshua; Sorokine, Alexandre
2014-01-01
Surface water is a primary concept of human experience but concepts are captured in cultures and languages in many different ways. Still, many commonalities exist due to the physical basis of many of the properties and categories. An abstract ontology of surface water features based only on those physical properties of landscape features has the best potential for serving as a foundational domain ontology for other more context-dependent ontologies. The Surface Water ontology design pattern was developed both for domain knowledge distillation and to serve as a conceptual building-block for more complex or specialized surface water ontologies. A fundamental distinction is made in this ontology between landscape features that act as containers (e.g., stream channels, basins) and the bodies of water (e.g., rivers, lakes) that occupy those containers. Concave (container) landforms semantics are specified in a Dry module and the semantics of contained bodies of water in a Wet module. The pattern is implemented in OWL, but Description Logic axioms and a detailed explanation is provided in this paper. The OWL ontology will be an important contribution to Semantic Web vocabulary for annotating surface water feature datasets. Also provided is a discussion of why there is a need to complement the pattern with other ontologies, especially the previously developed Surface Network pattern. Finally, the practical value of the pattern in semantic querying of surface water datasets is illustrated through an annotated geospatial dataset and sample queries using the classes of the Surface Water pattern.
A Semantic Basis for Proof Queries and Transformations
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen W.; Luth, Christoph
2013-01-01
We extend the query language PrQL, designed for inspecting machine representations of proofs, to also allow transformation of proofs. PrQL natively supports hiproofs which express proof structure using hierarchically nested labelled trees, which we claim is a natural way of taming the complexity of huge proofs. Query-driven transformations enable manipulation of this structure, in particular, to transform proofs produced by interactive theorem provers into forms that assist their understanding, or that could be consumed by other tools. In this paper we motivate and define basic transformation operations, using an abstract denotational semantics of hiproofs and queries. This extends our previous semantics for queries based on syntactic tree representations.We define update operations that add and remove sub-proofs, and manipulate the hierarchy to group and ungroup nodes. We show that
An RDF/OWL knowledge base for query answering and decision support in clinical pharmacogenetics.
Samwald, Matthias; Freimuth, Robert; Luciano, Joanne S; Lin, Simon; Powers, Robert L; Marshall, M Scott; Adlassnig, Klaus-Peter; Dumontier, Michel; Boyce, Richard D
2013-01-01
Genetic testing for personalizing pharmacotherapy is bound to become an important part of clinical routine. To address associated issues with data management and quality, we are creating a semantic knowledge base for clinical pharmacogenetics. The knowledge base is made up of three components: an expressive ontology formalized in the Web Ontology Language (OWL 2 DL), a Resource Description Framework (RDF) model for capturing detailed results of manual annotation of pharmacogenomic information in drug product labels, and an RDF conversion of relevant biomedical datasets. Our work goes beyond the state of the art in that it makes both automated reasoning as well as query answering as simple as possible, and the reasoning capabilities go beyond the capabilities of previously described ontologies.
Can Social Semantic Web Techniques Foster Collaborative Curriculum Mapping In Medicine?
Finsterer, Sonja; Cremer, Jan; Schenkat, Hennig
2013-01-01
Background Curriculum mapping, which is aimed at the systematic realignment of the planned, taught, and learned curriculum, is considered a challenging and ongoing effort in medical education. Second-generation curriculum managing systems foster knowledge management processes including curriculum mapping in order to give comprehensive support to learners, teachers, and administrators. The large quantity of custom-built software in this field indicates a shortcoming of available IT tools and standards. Objective The project reported here aims at the systematic adoption of techniques and standards of the Social Semantic Web to implement collaborative curriculum mapping for a complete medical model curriculum. Methods A semantic MediaWiki (SMW)-based Web application has been introduced as a platform for the elicitation and revision process of the Aachen Catalogue of Learning Objectives (ACLO). The semantic wiki uses a domain model of the curricular context and offers structured (form-based) data entry, multiple views, structured querying, semantic indexing, and commenting for learning objectives (“LOs”). Semantic indexing of learning objectives relies on both a controlled vocabulary of international medical classifications (ICD, MeSH) and a folksonomy maintained by the users. An additional module supporting the global checking of consistency complements the semantic wiki. Statements of the Object Constraint Language define the consistency criteria. We evaluated the application by a scenario-based formative usability study, where the participants solved tasks in the (fictional) context of 7 typical situations and answered a questionnaire containing Likert-scaled items and free-text questions. Results At present, ACLO contains roughly 5350 operational (ie, specific and measurable) objectives acquired during the last 25 months. The wiki-based user interface uses 13 online forms for data entry and 4 online forms for flexible searches of LOs, and all the forms are accessible by standard Web browsers. The formative usability study yielded positive results (median rating of 2 (“good”) in all 7 general usability items) and produced valuable qualitative feedback, especially concerning navigation and comprehensibility. Although not asked to, the participants (n=5) detected critical aspects of the curriculum (similar learning objectives addressed repeatedly and missing objectives), thus proving the system’s ability to support curriculum revision. Conclusions The SMW-based approach enabled an agile implementation of computer-supported knowledge management. The approach, based on standard Social Semantic Web formats and technology, represents a feasible and effectively applicable compromise between answering to the individual requirements of curriculum management at a particular medical school and using proprietary systems. PMID:23948519
RelFinder: Revealing Relationships in RDF Knowledge Bases
NASA Astrophysics Data System (ADS)
Heim, Philipp; Hellmann, Sebastian; Lehmann, Jens; Lohmann, Steffen; Stegemann, Timo
The Semantic Web has recently seen a rise of large knowledge bases (such as DBpedia) that are freely accessible via SPARQL endpoints. The structured representation of the contained information opens up new possibilities in the way it can be accessed and queried. In this paper, we present an approach that extracts a graph covering relationships between two objects of interest. We show an interactive visualization of this graph that supports the systematic analysis of the found relationships by providing highlighting, previewing, and filtering features.
Semantic Integration for Marine Science Interoperability Using Web Technologies
NASA Astrophysics Data System (ADS)
Rueda, C.; Bermudez, L.; Graybeal, J.; Isenor, A. W.
2008-12-01
The Marine Metadata Interoperability Project, MMI (http://marinemetadata.org) promotes the exchange, integration, and use of marine data through enhanced data publishing, discovery, documentation, and accessibility. A key effort is the definition of an Architectural Framework and Operational Concept for Semantic Interoperability (http://marinemetadata.org/sfc), which is complemented with the development of tools that realize critical use cases in semantic interoperability. In this presentation, we describe a set of such Semantic Web tools that allow performing important interoperability tasks, ranging from the creation of controlled vocabularies and the mapping of terms across multiple ontologies, to the online registration, storage, and search services needed to work with the ontologies (http://mmisw.org). This set of services uses Web standards and technologies, including Resource Description Framework (RDF), Web Ontology language (OWL), Web services, and toolkits for Rich Internet Application development. We will describe the following components: MMI Ontology Registry: The MMI Ontology Registry and Repository provides registry and storage services for ontologies. Entries in the registry are associated with projects defined by the registered users. Also, sophisticated search functions, for example according to metadata items and vocabulary terms, are provided. Client applications can submit search requests using the WC3 SPARQL Query Language for RDF. Voc2RDF: This component converts an ASCII comma-delimited set of terms and definitions into an RDF file. Voc2RDF facilitates the creation of controlled vocabularies by using a simple form-based user interface. Created vocabularies and their descriptive metadata can be submitted to the MMI Ontology Registry for versioning and community access. VINE: The Vocabulary Integration Environment component allows the user to map vocabulary terms across multiple ontologies. Various relationships can be established, for example exactMatch, narrowerThan, and subClassOf. VINE can compute inferred mappings based on the given associations. Attributes about each mapping, like comments and a confidence level, can also be included. VINE also supports registering and storing resulting mapping files in the Ontology Registry. The presentation will describe the application of semantic technologies in general, and our planned applications in particular, to solve data management problems in the marine and environmental sciences.
Connecting geoscience systems and data using Linked Open Data in the Web of Data
NASA Astrophysics Data System (ADS)
Ritschel, Bernd; Neher, Günther; Iyemori, Toshihiko; Koyama, Yukinobu; Yatagai, Akiyo; Murayama, Yasuhiro; Galkin, Ivan; King, Todd; Fung, Shing F.; Hughes, Steve; Habermann, Ted; Hapgood, Mike; Belehaki, Anna
2014-05-01
Linked Data or Linked Open Data (LOD) in the realm of free and publically accessible data is one of the most promising and most used semantic Web frameworks connecting various types of data and vocabularies including geoscience and related domains. The semantic Web extension to the commonly existing and used World Wide Web is based on the meaning of entities and relationships or in different words classes and properties used for data in a global data and information space, the Web of Data. LOD data is referenced and mash-uped by URIs and is retrievable using simple parameter controlled HTTP-requests leading to a result which is human-understandable or machine-readable. Furthermore the publishing and mash-up of data in the semantic Web realm is realized by specific Web standards, such as RDF, RDFS, OWL and SPARQL defined for the Web of Data. Semantic Web based mash-up is the Web method to aggregate and reuse various contents from different sources, such as e.g. using FOAF as a model and vocabulary for the description of persons and organizations -in our case- related to geoscience projects, instruments, observations, data and so on. On the example of three different geoscience data and information management systems, such as ESPAS, IUGONET and GFZ ISDC and the associated science data and related metadata or better called context data, the concept of the mash-up of systems and data using the semantic Web approach and the Linked Open Data framework is described in this publication. Because the three systems are based on different data models, data storage structures and technical implementations an extra semantic Web layer upon the existing interfaces is used for mash-up solutions. In order to satisfy the semantic Web standards, data transition processes, such as the transfer of content stored in relational databases or mapped in XML documents into SPARQL capable databases or endpoints using D2R or XSLT is necessary. In addition, the use of mapped and/or merged domain specific and cross-domain vocabularies in the sense of terminological ontologies are the foundation for a virtually unified data retrieval and access in IUGONET, ESPAS and GFZ ISDC data management systems. SPARQL endpoints realized either by originally RDF databases, e.g. Virtuoso or by virtual SPARQL endpoints, e.g. D2R services enable an only upon Web standard-based mash-up of domain-specific systems and data, such as in this case the space weather and geomagnetic domain but also cross-domain connection to data and vocabularies, e.g. related to NASA's VxOs, particularly VWO or NASA's PDS data system within LOD. LOD - Linked Open Data RDF - Resource Description Framework RDFS - RDF Schema OWL - Ontology Web Language SPARQL - SPARQL Protocol and RDF Query Language FOAF - Friends of a Friend ontology ESPAS - Near Earth Space Data Infrastructure for e-Science (Project) IUGONET - Inter-university Upper Atmosphere Global Observation Network (Project) GFZ ISDC - German Research Centre for Geosciences Information System and Data Center XML - Extensible Mark-up Language D2R - (Relational) Database to RDF (Transformation) XSLT - Extensible Stylesheet Language Transformation Virtuoso - OpenLink Virtuoso Universal Server (including RDF data management) NASA - National Aeronautics and Space Administration VOx - Virtual Observatories VWO - Virtual Wave Observatory PDS - Planetary Data System
Personalized query suggestion based on user behavior
NASA Astrophysics Data System (ADS)
Chen, Wanyu; Hao, Zepeng; Shao, Taihua; Chen, Honghui
Query suggestions help users refine their queries after they input an initial query. Previous work mainly concentrated on similarity-based and context-based query suggestion approaches. However, models that focus on adapting to a specific user (personalization) can help to improve the probability of the user being satisfied. In this paper, we propose a personalized query suggestion model based on users’ search behavior (UB model), where we inject relevance between queries and users’ search behavior into a basic probabilistic model. For the relevance between queries, we consider their semantical similarity and co-occurrence which indicates the behavior information from other users in web search. Regarding the current user’s preference to a query, we combine the user’s short-term and long-term search behavior in a linear fashion and deal with the data sparse problem with Bayesian probabilistic matrix factorization (BPMF). In particular, we also investigate the impact of different personalization strategies (the combination of the user’s short-term and long-term search behavior) on the performance of query suggestion reranking. We quantify the improvement of our proposed UB model against a state-of-the-art baseline using the public AOL query logs and show that it beats the baseline in terms of metrics used in query suggestion reranking. The experimental results show that: (i) for personalized ranking, users’ behavioral information helps to improve query suggestion effectiveness; and (ii) given a query, merging information inferred from the short-term and long-term search behavior of a particular user can result in a better performance than both plain approaches.
Mohammadhassanzadeh, Hossein; Van Woensel, William; Abidi, Samina Raza; Abidi, Syed Sibte Raza
2017-01-01
Capturing complete medical knowledge is challenging-often due to incomplete patient Electronic Health Records (EHR), but also because of valuable, tacit medical knowledge hidden away in physicians' experiences. To extend the coverage of incomplete medical knowledge-based systems beyond their deductive closure, and thus enhance their decision-support capabilities, we argue that innovative, multi-strategy reasoning approaches should be applied. In particular, plausible reasoning mechanisms apply patterns from human thought processes, such as generalization, similarity and interpolation, based on attributional, hierarchical, and relational knowledge. Plausible reasoning mechanisms include inductive reasoning , which generalizes the commonalities among the data to induce new rules, and analogical reasoning , which is guided by data similarities to infer new facts. By further leveraging rich, biomedical Semantic Web ontologies to represent medical knowledge, both known and tentative, we increase the accuracy and expressivity of plausible reasoning, and cope with issues such as data heterogeneity, inconsistency and interoperability. In this paper, we present a Semantic Web-based, multi-strategy reasoning approach, which integrates deductive and plausible reasoning and exploits Semantic Web technology to solve complex clinical decision support queries. We evaluated our system using a real-world medical dataset of patients with hepatitis, from which we randomly removed different percentages of data (5%, 10%, 15%, and 20%) to reflect scenarios with increasing amounts of incomplete medical knowledge. To increase the reliability of the results, we generated 5 independent datasets for each percentage of missing values, which resulted in 20 experimental datasets (in addition to the original dataset). The results show that plausibly inferred knowledge extends the coverage of the knowledge base by, on average, 2%, 7%, 12%, and 16% for datasets with, respectively, 5%, 10%, 15%, and 20% of missing values. This expansion in the KB coverage allowed solving complex disease diagnostic queries that were previously unresolvable, without losing the correctness of the answers. However, compared to deductive reasoning, data-intensive plausible reasoning mechanisms yield a significant performance overhead. We observed that plausible reasoning approaches, by generating tentative inferences and leveraging domain knowledge of experts, allow us to extend the coverage of medical knowledge bases, resulting in improved clinical decision support. Second, by leveraging OWL ontological knowledge, we are able to increase the expressivity and accuracy of plausible reasoning methods. Third, our approach is applicable to clinical decision support systems for a range of chronic diseases.
Image Retrieval by Color Semantics with Incomplete Knowledge.
ERIC Educational Resources Information Center
Corridoni, Jacopo M.; Del Bimbo, Alberto; Vicario, Enrico
1998-01-01
Presents a system which supports image retrieval by high-level chromatic contents, the sensations that color accordances generate on the observer. Surveys Itten's theory of color semantics and discusses image description and query specification. Presents examples of visual querying. (AEF)
LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.
Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias
2018-03-01
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Sinaci, A Anil; Laleci Erturkmen, Gokce B
2013-10-01
In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Patton, E. W.; West, P.; Greer, R.; Jin, B.
2011-12-01
Following on work presented at the 2010 AGU Fall Meeting, we present a number of real-world collections of semantically-enabled scientific metadata ingested into the Tetherless World RDF2HTML system as structured data and presented and edited using that system. Two separate datasets from two different domains (oceanography and solar sciences) are made available using existing web standards and services, e.g. encoded using ontologies represented with the Web Ontology Language (OWL) and stored in a SPARQL endpoint for querying. These datasets are deployed for use in three different web environments, i.e. Drupal, MediaWiki, and a custom web portal written in Java, to highlight the cross-platform nature of the data presentation. Stylesheets used to transform concepts in each domain as well as shared terms into HTML will be presented to show the power of using common ontologies to publish data and support reuse of existing terminologies. In addition, a single domain dataset is shared between two separate portal instances to demonstrate the ability for this system to offer distributed access and modification of content across the Internet. Lastly, we will highlight challenges that arose in the software engineering process, outline the design choices we made in solving those issues, and discuss how future improvements to this and other systems will enable the evolution of distributed, decentralized collaborations for scientific data sharing across multiple research groups.
Assisting Consumer Health Information Retrieval with Query Recommendations
Zeng, Qing T.; Crowell, Jonathan; Plovnick, Robert M.; Kim, Eunjung; Ngo, Long; Dibble, Emily
2006-01-01
Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. Design: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. Measurements: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. Results: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16–2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. Conclusion: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR. PMID:16221944
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Exposing the cancer genome atlas as a SPARQL endpoint.
Deus, Helena F; Veiga, Diogo F; Freire, Pablo R; Weinstein, John N; Mills, Gordon B; Almeida, Jonas S
2010-12-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source. Copyright © 2010 Elsevier Inc. All rights reserved.
Semantic based man-machine interface for real-time communication
NASA Technical Reports Server (NTRS)
Ali, M.; Ai, C.-S.
1988-01-01
A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.
Dugan, J. M.; Berrios, D. C.; Liu, X.; Kim, D. K.; Kaizer, H.; Fagan, L. M.
1999-01-01
Our group has built an information retrieval system based on a complex semantic markup of medical textbooks. We describe the construction of a set of web-based knowledge-acquisition tools that expedites the collection and maintenance of the concepts required for text markup and the search interface required for information retrieval from the marked text. In the text markup system, domain experts (DEs) identify sections of text that contain one or more elements from a finite set of concepts. End users can then query the text using a predefined set of questions, each of which identifies a subset of complementary concepts. The search process matches that subset of concepts to relevant points in the text. The current process requires that the DE invest significant time to generate the required concepts and questions. We propose a new system--called ACQUIRE (Acquisition of Concepts and Queries in an Integrated Retrieval Environment)--that assists a DE in two essential tasks in the text-markup process. First, it helps her to develop, edit, and maintain the concept model: the set of concepts with which she marks the text. Second, ACQUIRE helps her to develop a query model: the set of specific questions that end users can later use to search the marked text. The DE incorporates concepts from the concept model when she creates the questions in the query model. The major benefit of the ACQUIRE system is a reduction in the time and effort required for the text-markup process. We compared the process of concept- and query-model creation using ACQUIRE to the process used in previous work by rebuilding two existing models that we previously constructed manually. We observed a significant decrease in the time required to build and maintain the concept and query models. Images Figure 1 Figure 2 Figure 4 Figure 5 PMID:10566457
Semantics based approach for analyzing disease-target associations.
Kaalia, Rama; Ghosh, Indira
2016-08-01
A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations. Copyright © 2016 Elsevier Inc. All rights reserved.
A semantic proteomics dashboard (SemPoD) for data management in translational research.
Jayapandian, Catherine P; Zhao, Meng; Ewing, Rob M; Zhang, Guo-Qiang; Sahoo, Satya S
2012-01-01
One of the primary challenges in translational research data management is breaking down the barriers between the multiple data silos and the integration of 'omics data with clinical information to complete the cycle from the bench to the bedside. The role of contextual metadata, also called provenance information, is a key factor ineffective data integration, reproducibility of results, correct attribution of original source, and answering research queries involving "What", "Where", "When", "Which", "Who", "How", and "Why" (also known as the W7 model). But, at present there is limited or no effective approach to managing and leveraging provenance information for integrating data across studies or projects. Hence, there is an urgent need for a paradigm shift in creating a "provenance-aware" informatics platform to address this challenge. We introduce an ontology-driven, intuitive Semantic Proteomics Dashboard (SemPoD) that uses provenance together with domain information (semantic provenance) to enable researchers to query, compare, and correlate different types of data across multiple projects, and allow integration with legacy data to support their ongoing research. The SemPoD platform, currently in use at the Case Center for Proteomics and Bioinformatics (CPB), consists of three components: (a) Ontology-driven Visual Query Composer, (b) Result Explorer, and (c) Query Manager. Currently, SemPoD allows provenance-aware querying of 1153 mass-spectrometry experiments from 20 different projects. SemPod uses the systems molecular biology provenance ontology (SysPro) to support a dynamic query composition interface, which automatically updates the components of the query interface based on previous user selections and efficiently prunes the result set usinga "smart filtering" approach. The SysPro ontology re-uses terms from the PROV-ontology (PROV-O) being developed by the World Wide Web Consortium (W3C) provenance working group, the minimum information required for reporting a molecular interaction experiment (MIMIx), and the minimum information about a proteomics experiment (MIAPE) guidelines. The SemPoD was evaluated both in terms of user feedback and as scalability of the system. SemPoD is an intuitive and powerful provenance ontology-driven data access and query platform that uses the MIAPE and MIMIx metadata guideline to create an integrated view over large-scale systems molecular biology datasets. SemPoD leverages the SysPro ontology to create an intuitive dashboard for biologists to compose queries, explore the results, and use a query manager for storing queries for later use. SemPoD can be deployed over many existing database applications storing 'omics data, including, as illustrated here, the LabKey data-management system. The initial user feedback evaluating the usability and functionality of SemPoD has been very positive and it is being considered for wider deployment beyond the proteomics domain, and in other 'omics' centers.
Towards linked open gene mutations data
2012-01-01
Background With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. Methods A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. Results We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. Conclusions This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine. PMID:22536974
Towards linked open gene mutations data.
Zappa, Achille; Splendiani, Andrea; Romano, Paolo
2012-03-28
With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
WikiHyperGlossary (WHG): an information literacy technology for chemistry documents.
Bauer, Michael A; Berleant, Daniel; Cornell, Andrew P; Belford, Robert E
2015-01-01
The WikiHyperGlossary is an information literacy technology that was created to enhance reading comprehension of documents by connecting them to socially generated multimedia definitions as well as semantically relevant data. The WikiHyperGlossary enhances reading comprehension by using the lexicon of a discipline to generate dynamic links in a document to external resources that can provide implicit information the document did not explicitly provide. Currently, the most common method to acquire additional information when reading a document is to access a search engine and browse the web. This may lead to skimming of multiple documents with the novice actually never returning to the original document of interest. The WikiHyperGlossary automatically brings information to the user within the current document they are reading, enhancing the potential for deeper document understanding. The WikiHyperGlossary allows users to submit a web URL or text to be processed against a chosen lexicon, returning the document with tagged terms. The selection of a tagged term results in the appearance of the WikiHyperGlossary Portlet containing a definition, and depending on the type of word, tabs to additional information and resources. Current types of content include multimedia enhanced definitions, ChemSpider query results, 3D molecular structures, and 2D editable structures connected to ChemSpider queries. Existing glossaries can be bulk uploaded, locked for editing and associated with multiple social generated definitions. The WikiHyperGlossary leverages both social and semantic web technologies to bring relevant information to a document. This can not only aid reading comprehension, but increases the users' ability to obtain additional information within the document. We have demonstrated a molecular editor enabled knowledge framework that can result in a semantic web inductive reasoning process, and integration of the WikiHyperGlossary into other software technologies, like the Jikitou Biomedical Question and Answer system. Although this work was developed in the chemical sciences and took advantage of open science resources and initiatives, the technology is extensible to other knowledge domains. Through the DeepLit (Deeper Literacy: Connecting Documents to Data and Discourse) startup, we seek to extend WikiHyperGlossary technologies to other knowledge domains, and integrate them into other knowledge acquisition workflows.
Kawazoe, Yoshimasa; Imai, Takeshi; Ohe, Kazuhiko
2016-04-05
Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without modifying the structure. As such, the proposed method benefits not only our hospitals, but also numerous hospitals that handle HL7 v2.5 messages. Our approach highlights a potential of large-scale data federation techniques to retrieve clinical information, which could be applied as applications of clinical intelligence to improve clinical practices, such as adverse drug event monitoring and cohort selection for a clinical study as well as discovering new knowledge from clinical information.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.
Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-01-01
Background Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)—a new Semantic Web set of best practice of standards to publish and link heterogeneous data—can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. Objective The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. Methods We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk—a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. Results We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. Conclusions The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development. PMID:25601195
Tilahun, Binyam; Kauppinen, Tomi; Keßler, Carsten; Fritz, Fleur
2014-10-25
Healthcare organizations around the world are challenged by pressures to reduce cost, improve coordination and outcome, and provide more with less. This requires effective planning and evidence-based practice by generating important information from available data. Thus, flexible and user-friendly ways to represent, query, and visualize health data becomes increasingly important. International organizations such as the World Health Organization (WHO) regularly publish vital data on priority health topics that can be utilized for public health policy and health service development. However, the data in most portals is displayed in either Excel or PDF formats, which makes information discovery and reuse difficult. Linked Open Data (LOD)-a new Semantic Web set of best practice of standards to publish and link heterogeneous data-can be applied to the representation and management of public level health data to alleviate such challenges. However, the technologies behind building LOD systems and their effectiveness for health data are yet to be assessed. The objective of this study is to evaluate whether Linked Data technologies are potential options for health information representation, visualization, and retrieval systems development and to identify the available tools and methodologies to build Linked Data-based health information systems. We used the Resource Description Framework (RDF) for data representation, Fuseki triple store for data storage, and Sgvizler for information visualization. Additionally, we integrated SPARQL query interface for interacting with the data. We primarily use the WHO health observatory dataset to test the system. All the data were represented using RDF and interlinked with other related datasets on the Web of Data using Silk-a link discovery framework for Web of Data. A preliminary usability assessment was conducted following the System Usability Scale (SUS) method. We developed an LOD-based health information representation, querying, and visualization system by using Linked Data tools. We imported more than 20,000 HIV-related data elements on mortality, prevalence, incidence, and related variables, which are freely available from the WHO global health observatory database. Additionally, we automatically linked 5312 data elements from DBpedia, Bio2RDF, and LinkedCT using the Silk framework. The system users can retrieve and visualize health information according to their interests. For users who are not familiar with SPARQL queries, we integrated a Linked Data search engine interface to search and browse the data. We used the system to represent and store the data, facilitating flexible queries and different kinds of visualizations. The preliminary user evaluation score by public health data managers and users was 82 on the SUS usability measurement scale. The need to write queries in the interface was the main reported difficulty of LOD-based systems to the end user. The system introduced in this article shows that current LOD technologies are a promising alternative to represent heterogeneous health data in a flexible and reusable manner so that they can serve intelligent queries, and ultimately support decision-making. However, the development of advanced text-based search engines is necessary to increase its usability especially for nontechnical users. Further research with large datasets is recommended in the future to unfold the potential of Linked Data and Semantic Web for future health information systems development.
Towards Semantic e-Science for Traditional Chinese Medicine
Chen, Huajun; Mao, Yuxin; Zheng, Xiaoqing; Cui, Meng; Feng, Yi; Deng, Shuiguang; Yin, Aining; Zhou, Chunying; Tang, Jinming; Jiang, Xiaohong; Wu, Zhaohui
2007-01-01
Background Recent advances in Web and information technologies with the increasing decentralization of organizational structures have resulted in massive amounts of information resources and domain-specific services in Traditional Chinese Medicine. The massive volume and diversity of information and services available have made it difficult to achieve seamless and interoperable e-Science for knowledge-intensive disciplines like TCM. Therefore, information integration and service coordination are two major challenges in e-Science for TCM. We still lack sophisticated approaches to integrate scientific data and services for TCM e-Science. Results We present a comprehensive approach to build dynamic and extendable e-Science applications for knowledge-intensive disciplines like TCM based on semantic and knowledge-based techniques. The semantic e-Science infrastructure for TCM supports large-scale database integration and service coordination in a virtual organization. We use domain ontologies to integrate TCM database resources and services in a semantic cyberspace and deliver a semantically superior experience including browsing, searching, querying and knowledge discovering to users. We have developed a collection of semantic-based toolkits to facilitate TCM scientists and researchers in information sharing and collaborative research. Conclusion Semantic and knowledge-based techniques are suitable to knowledge-intensive disciplines like TCM. It's possible to build on-demand e-Science system for TCM based on existing semantic and knowledge-based techniques. The presented approach in the paper integrates heterogeneous distributed TCM databases and services, and provides scientists with semantically superior experience to support collaborative research in TCM discipline. PMID:17493289
Huang, Chung-Chi; Lu, Zhiyong
2016-01-01
Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698
2007-08-01
In this domain, queries typically show a deeply nested structure, which makes the semantic parsing task rather challenging , e.g.: What states border...only 80% of the GEOQUERY queries are semantically tractable, which shows that GEOQUERY is indeed a more challenging domain than ATIS. Note that none...a particularly challenging task, because of the inherent ambiguity of natural languages on both sides. It has inspired a large body of research. In
ReVeaLD: a user-driven domain-specific interactive search platform for biomedical research.
Kamdar, Maulik R; Zeginis, Dimitris; Hasnain, Ali; Decker, Stefan; Deus, Helena F
2014-02-01
Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan
2017-01-01
ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143
Improving integrative searching of systems chemical biology data using semantic annotation.
Chen, Bin; Ding, Ying; Wild, David J
2012-03-08
Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.
Time series patterns and language support in DBMS
NASA Astrophysics Data System (ADS)
Telnarova, Zdenka
2017-07-01
This contribution is focused on pattern type Time Series as a rich in semantics representation of data. Some example of implementation of this pattern type in traditional Data Base Management Systems is briefly presented. There are many approaches how to manipulate with patterns and query patterns. Crucial issue can be seen in systematic approach to pattern management and specific pattern query language which takes into consideration semantics of patterns. Query language SQL-TS for manipulating with patterns is shown on Time Series data.
A topic clustering approach to finding similar questions from large question and answer archives.
Zhang, Wei-Nan; Liu, Ting; Yang, Yang; Cao, Liujuan; Zhang, Yu; Ji, Rongrong
2014-01-01
With the blooming of Web 2.0, Community Question Answering (CQA) services such as Yahoo! Answers (http://answers.yahoo.com), WikiAnswer (http://wiki.answers.com), and Baidu Zhidao (http://zhidao.baidu.com), etc., have emerged as alternatives for knowledge and information acquisition. Over time, a large number of question and answer (Q&A) pairs with high quality devoted by human intelligence have been accumulated as a comprehensive knowledge base. Unlike the search engines, which return long lists of results, searching in the CQA services can obtain the correct answers to the question queries by automatically finding similar questions that have already been answered by other users. Hence, it greatly improves the efficiency of the online information retrieval. However, given a question query, finding the similar and well-answered questions is a non-trivial task. The main challenge is the word mismatch between question query (query) and candidate question for retrieval (question). To investigate this problem, in this study, we capture the word semantic similarity between query and question by introducing the topic modeling approach. We then propose an unsupervised machine-learning approach to finding similar questions on CQA Q&A archives. The experimental results show that our proposed approach significantly outperforms the state-of-the-art methods.
Discovering, Indexing and Interlinking Information Resources
Celli, Fabrizio; Keizer, Johannes; Jaques, Yves; Konstantopoulos, Stasinos; Vudragović, Dušan
2015-01-01
The social media revolution is having a dramatic effect on the world of scientific publication. Scientists now publish their research interests, theories and outcomes across numerous channels, including personal blogs and other thematic web spaces where ideas, activities and partial results are discussed. Accordingly, information systems that facilitate access to scientific literature must learn to cope with this valuable and varied data, evolving to make this research easily discoverable and available to end users. In this paper we describe the incremental process of discovering web resources in the domain of agricultural science and technology. Making use of Linked Open Data methodologies, we interlink a wide array of custom-crawled resources with the AGRIS bibliographic database in order to enrich the user experience of the AGRIS website. We also discuss the SemaGrow Stack, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS. PMID:26834982
Semantic concept-enriched dependence model for medical information retrieval.
Choi, Sungbin; Choi, Jinwook; Yoo, Sooyoung; Kim, Heechun; Lee, Youngho
2014-02-01
In medical information retrieval research, semantic resources have been mostly used by expanding the original query terms or estimating the concept importance weight. However, implicit term-dependency information contained in semantic concept terms has been overlooked or at least underused in most previous studies. In this study, we incorporate a semantic concept-based term-dependence feature into a formal retrieval model to improve its ranking performance. Standardized medical concept terms used by medical professionals were assumed to have implicit dependency within the same concept. We hypothesized that, by elaborately revising the ranking algorithms to favor documents that preserve those implicit dependencies, the ranking performance could be improved. The implicit dependence features are harvested from the original query using MetaMap. These semantic concept-based dependence features were incorporated into a semantic concept-enriched dependence model (SCDM). We designed four different variants of the model, with each variant having distinct characteristics in the feature formulation method. We performed leave-one-out cross validations on both a clinical document corpus (TREC Medical records track) and a medical literature corpus (OHSUMED), which are representative test collections in medical information retrieval research. Our semantic concept-enriched dependence model consistently outperformed other state-of-the-art retrieval methods. Analysis shows that the performance gain has occurred independently of the concept's explicit importance in the query. By capturing implicit knowledge with regard to the query term relationships and incorporating them into a ranking model, we could build a more robust and effective retrieval model, independent of the concept importance. Copyright © 2013 Elsevier Inc. All rights reserved.
SSWAP: A Simple Semantic Web Architecture and Protocol for Semantic Web Services
USDA-ARS?s Scientific Manuscript database
SSWAP (Simple Semantic Web Architecture and Protocol) is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP is the driving technology behind the Virtual Plant Information Network, an NSF-funded semantic w...
Briache, Abdelaali; Marrakchi, Kamar; Kerzazi, Amine; Navas-Delgado, Ismael; Rossi Hassani, Badr D; Lairini, Khalid; Aldana-Montes, José F
2012-01-25
Saccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data. To provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface. YeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.
NASA Astrophysics Data System (ADS)
Lyapin, Sergey; Kukovyakin, Alexey
Within the framework of the research program "Textaurus" an operational prototype of multifunctional library T-Libra v.4.1. has been created which makes it possible to carry out flexible parametrizable search within a full-text database. The information system is realized in the architecture Web-browser / Web-server / SQL-server. This allows to achieve an optimal combination of universality and efficiency of text processing, on the one hand, and convenience and minimization of expenses for an end user (due to applying of a standard Web-browser as a client application), on the other one. The following principles underlie the information system: a) multifunctionality, b) intelligence, c) multilingual primary texts and full-text searching, d) development of digital library (DL) by a user ("administrative client"), e) multi-platform working. A "library of concepts", i.e. a block of functional models of semantic (concept-oriented) searching, as well as a subsystem of parametrizable queries to a full-text database, which is closely connected with the "library", serve as a conceptual basis of multifunctionality and "intelligence" of the DL T-Libra v.4.1. An author's paragraph is a unit of full-text searching in the suggested technology. At that, the "logic" of an educational / scientific topic or a problem can be built in a multilevel flexible structure of a query and the "library of concepts", replenishable by the developers and experts. About 10 queries of various level of complexity and conceptuality are realized in the suggested version of the information system: from simple terminological searching (taking into account lexical and grammatical paradigms of Russian) to several kinds of explication of terminological fields and adjustable two-parameter thematic searching (a [set of terms] and a [distance between terms] within the limits of an author's paragraph are such parameters correspondingly).
NASA Astrophysics Data System (ADS)
Hoebelheinrich, N. J.; Lynnes, C.; West, P.; Ferritto, M.
2014-12-01
Two problems common to many geoscience domains are the difficulties in finding tools to work with a given dataset collection, and conversely, the difficulties in finding data for a known tool. A collaborative team from the Earth Science Information Partnership (ESIP) has gotten together to design and create a web service, called ToolMatch, to address these problems. The team began their efforts by defining an initial, relatively simple conceptual model that addressed the two uses cases briefly described above. The conceptual model is expressed as an ontology using OWL (Web Ontology Language) and DCterms (Dublin Core Terms), and utilizing standard ontologies such as DOAP (Description of a Project), FOAF (Friend of a Friend), SKOS (Simple Knowledge Organization System) and DCAT (Data Catalog Vocabulary). The ToolMatch service will be taking advantage of various Semantic Web and Web standards, such as OpenSearch, RESTful web services, SWRL (Semantic Web Rule Language) and SPARQL (Simple Protocol and RDF Query Language). The first version of the ToolMatch service was deployed in early fall 2014. While more complete testing is required, a number of communities besides ESIP member organizations have expressed interest in collaborating to create, test and use the service and incorporate it into their own web pages, tools and / or services including the USGS Data Catalog service, DataONE, the Deep Carbon Observatory, Virtual Solar Terrestrial Observatory (VSTO), and the U.S. Global Change Research Program. In this session, presenters will discuss the inception and development of the ToolMatch service, the collaborative process used to design, refine, and test the service, and future plans for the service.
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
An advanced web query interface for biological databases
Latendresse, Mario; Karp, Peter D.
2010-01-01
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. PMID:20624715
Huang, Chung-Chi; Lu, Zhiyong
2016-01-01
Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
An index-based algorithm for fast on-line query processing of latent semantic analysis
Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
An index-based algorithm for fast on-line query processing of latent semantic analysis.
Zhang, Mingxi; Li, Pohan; Wang, Wei
2017-01-01
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Algorithms and semantic infrastructure for mutation impact extraction and grounding.
Laurila, Jonas B; Naderi, Nona; Witte, René; Riazanov, Alexandre; Kouznetsov, Alexandre; Baker, Christopher J O
2010-12-02
Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
Towards ontology-driven navigation of the lipid bibliosphere
Baker, Christopher JO; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R
2008-01-01
Background The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. Results We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. Conclusion As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology. PMID:18315858
Towards ontology-driven navigation of the lipid bibliosphere.
Baker, Christopher Jo; Kanagasabai, Rajaraman; Ang, Wee Tiong; Veeramani, Anitha; Low, Hong-Sang; Wenk, Markus R
2008-01-01
The indexing of scientific literature and content is a relevant and contemporary requirement within life science information systems. Navigating information available in legacy formats continues to be a challenge both in enterprise and academic domains. The emergence of semantic web technologies and their fusion with artificial intelligence techniques has provided a new toolkit with which to address these data integration challenges. In the emerging field of lipidomics such navigation challenges are barriers to the translation of scientific results into actionable knowledge, critical to the treatment of diseases such as Alzheimer's syndrome, Mycobacterium infections and cancer. We present a literature-driven workflow involving document delivery and natural language processing steps generating tagged sentences containing lipid, protein and disease names, which are instantiated to custom designed lipid ontology. We describe the design challenges in capturing lipid nomenclature, the mandate of the ontology and its role as query model in the navigation of the lipid bibliosphere. We illustrate the extent of the description logic-based A-box query capability provided by the instantiated ontology using a graphical query composer to query sentences describing lipid-protein and lipid-disease correlations. As scientists accept the need to readjust the manner in which we search for information and derive knowledge we illustrate a system that can constrain the literature explosion and knowledge navigation problems. Specifically we have focussed on solving this challenge for lipidomics researchers who have to deal with the lack of standardized vocabulary, differing classification schemes, and a wide array of synonyms before being able to derive scientific insights. The use of the OWL-DL variant of the Web Ontology Language (OWL) and description logic reasoning is pivotal in this regard, providing the lipid scientist with advanced query access to the results of text mining algorithms instantiated into the ontology. The visual query paradigm assists in the adoption of this technology.
SPARQL-enabled identifier conversion with Identifiers.org
Wimalaratne, Sarala M.; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille
2015-01-01
Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: sarala@ebi.ac.uk PMID:25638809
SPARQL-enabled identifier conversion with Identifiers.org.
Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille
2015-06-01
On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. © The Author 2015. Published by Oxford University Press.
An Educational Tool for Browsing the Semantic Web
ERIC Educational Resources Information Center
Yoo, Sujin; Kim, Younghwan; Park, Seongbin
2013-01-01
The Semantic Web is an extension of the current Web where information is represented in a machine processable way. It is not separate from the current Web and one of the confusions that novice users might have is where the Semantic Web is. In fact, users can easily encounter RDF documents that are components of the Semantic Web while they navigate…
Towards a Consistent and Scientifically Accurate Drug Ontology.
Hogan, William R; Hanna, Josh; Joseph, Eric; Brochhausen, Mathias
2013-01-01
Our use case for comparative effectiveness research requires an ontology of drugs that enables querying National Drug Codes (NDCs) by active ingredient, mechanism of action, physiological effect, and therapeutic class of the drug products they represent. We conducted an ontological analysis of drugs from the realist perspective, and evaluated existing drug terminology, ontology, and database artifacts from (1) the technical perspective, (2) the perspective of pharmacology and medical science (3) the perspective of description logic semantics (if they were available in Web Ontology Language or OWL), and (4) the perspective of our realism-based analysis of the domain. No existing resource was sufficient. Therefore, we built the Drug Ontology (DrOn) in OWL, which we populated with NDCs and other classes from RxNorm using only content created by the National Library of Medicine. We also built an application that uses DrOn to query for NDCs as outlined above, available at: http://ingarden.uams.edu/ingredients. The application uses an OWL-based description logic reasoner to execute end-user queries. DrOn is available at http://code.google.com/p/dr-on.
A Services-Oriented Architecture for Water Observations Data
NASA Astrophysics Data System (ADS)
Maidment, D. R.; Zaslavsky, I.; Valentine, D.; Tarboton, D. G.; Whitenack, T.; Whiteaker, T.; Hooper, R.; Kirschtel, D.
2009-04-01
Water observations data are time series of measurements made at point locations of water level, flow, and quality and corresponding data for climatic observations at point locations such as gaged precipitation and weather variables. A services-oriented architecture has been built for such information for the United States that has three components: hydrologic information servers, hydrologic information clients, and a centralized metadata cataloging system. These are connected using web services for observations data and metadata defined by an XML-based language called WaterML. A Hydrologic Information Server can be built by storing observations data in a relational database schema in the CUAHSI Observations Data Model, in which case, web services access to the data and metadata is automatically provided by query functions for WaterML that are wrapped around the relational database within a web server. A Hydrologic Information Server can also be constructed by custom-programming an interface to an existing water agency web site so that responds to the same queries by producing data in WaterML as do the CUAHSI Observations Data Model based servers. A Hydrologic Information Client is one which can interpret and ingest WaterML metadata and data. We have two client applications for Excel and ArcGIS and have shown how WaterML web services can be ingested into programming environments such as Matlab and Visual Basic. HIS Central, maintained at the San Diego Supercomputer Center is a repository of observational metadata for WaterML web services which presently indexes 342 million data measured at 1.75 million locations. This is the largest catalog water observational data for the United States presently in existence. As more observation networks join what we term "CUAHSI Water Data Federation", and the system accommodates a growing number of sites, measured parameters, applications, and users, rapid and reliable access to large heterogeneous hydrologic data repositories becomes critical. The CUAHSI HIS solution to the scalability and heterogeneity challenges has several components. Structural differences across the data repositories are addressed by building a standard services foundation for the exchange of hydrologic data, as derived from a common information model for observational data measured at stationary points and its implementation as a relational schema (ODM) and an XML schema (WaterML). Semantic heterogeneity is managed by mapping water quantity, water quality, and other parameters collected by government agencies and academic projects to a common ontology. The WaterML-compliant web services are indexed in a community services registry called HIS Central (hiscentral.cuahsi.org). Once a web service is registered in HIS Central, its metadata (site and variable characteristics, period of record for each variable at each site, etc.) is harvested and appended to the central catalog. The catalog is further updated as the service publisher associates the variables in the published service with ontology concepts. After this, the newly published service becomes available for spatial and semantics-based queries from online and desktop client applications developed by the project. Hydrologic system server software is now deployed at more than a dozen locations in the United States and Australia. To provide rapid access to data summaries, in particular for several nation-wide data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observation data catalogs and databases with harvested data values into special representations that support high-performance analysis and visualization. The construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005/2008, to the analysis of the catalogs from several agencies. OLAP analysis results reflect geography and history of observation data availability from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of the available measurements for several key nutrient-related parameters. Our experience developing the CUAHSI HIS cyberinfrastructure demonstrated that efficient integration of hydrologic observations from multiple government and academic sources requires a range of technical approaches focused on managing different components of data heterogeneity and system scalability. While this submission addresses technical aspects of developing a national-scale information system for hydrologic observations, the challenges of explicating shared semantics of hydrologic observations and building a community of HIS users and developers remain critical in constructing a nation-wide federation of water data services.
NASA Astrophysics Data System (ADS)
McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.
2010-12-01
Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of the GSL is based on a model-view paradigm that decouples the underlying data model semantics from particular representations of the data model. This will allow for the GSAC-WS enabled repositories to evolve their service offerings to incorporate new metadata definition formats (e.g., ISO-19115, FGDC, JSON, etc.) and new techniques for accessing their holdings. Building on the core GSAC-WS implementations the project is also developing a federated/distributed query service. This service will seamlessly integrate with the GSAC Service Layer and will support data and metadata queries across a collection of federated GSAC repositories.
SAFOD Brittle Microstructure and Mechanics Knowledge Base (BM2KB)
NASA Astrophysics Data System (ADS)
Babaie, Hassan A.; Broda Cindi, M.; Hadizadeh, Jafar; Kumar, Anuj
2013-07-01
Scientific drilling near Parkfield, California has established the San Andreas Fault Observatory at Depth (SAFOD), which provides the solid earth community with short range geophysical and fault zone material data. The BM2KB ontology was developed in order to formalize the knowledge about brittle microstructures in the fault rocks sampled from the SAFOD cores. A knowledge base, instantiated from this domain ontology, stores and presents the observed microstructural and analytical data with respect to implications for brittle deformation and mechanics of faulting. These data can be searched on the knowledge base‧s Web interface by selecting a set of terms (classes, properties) from different drop-down lists that are dynamically populated from the ontology. In addition to this general search, a query can also be conducted to view data contributed by a specific investigator. A search by sample is done using the EarthScope SAFOD Core Viewer that allows a user to locate samples on high resolution images of core sections belonging to different runs and holes. The class hierarchy of the BM2KB ontology was initially designed using the Unified Modeling Language (UML), which was used as a visual guide to develop the ontology in OWL applying the Protégé ontology editor. Various Semantic Web technologies such as the RDF, RDFS, and OWL ontology languages, SPARQL query language, and Pellet reasoning engine, were used to develop the ontology. An interactive Web application interface was developed through Jena, a java based framework, with AJAX technology, jsp pages, and java servlets, and deployed via an Apache tomcat server. The interface allows the registered user to submit data related to their research on a sample of the SAFOD core. The submitted data, after initial review by the knowledge base administrator, are added to the extensible knowledge base and become available in subsequent queries to all types of users. The interface facilitates inference capabilities in the ontology, supports SPARQL queries, allows for modifications based on successive discoveries, and provides an accessible knowledge base on the Web.
Conceptual mapping of user's queries to medical subject headings.
Zieman, Y. L.; Bleich, H. L.
1997-01-01
This paper describes a way to map users' queries to relevant Medical Subject Headings (MeSH terms) used by the National Library of Medicine to index the biomedical literature. The method, called SENSE (SEarch with New SEmantics), transforms words and phrases in the users' queries into primary conceptual components and compares these components with those of the MeSH vocabulary. Similar to the way in which most numbers can be split into numerical factors and expressed as their product--for example, 42 can be expressed as 2*21, 6*7, 3*14, 2*3*7,--so most medical concepts can be split into "semantic factors" and expressed as their juxtaposition. Note that if we split 42 into its primary factors, the breakdown is unique: 2*3*7. Similarly, when we split medical concepts into their "primary semantic factors" the breakdown is also unique. For example, the MeSH term 'renovascular hypertension' can be split morphologically into reno, vascular, hyper, and tension--morphemes that can then be translated into their primary semantic factors--kidney, blood vessel, high, and pressure. By "factoring" each MeSH term in this way, and by similarly factoring the user's query, we can match query to MeSH term by searching for combinations of common factors. Unlike UMLS and other methods that match at the level of words or phrases, SENSE matches at the level of concepts; in this way, a wide variety of words and phrases that have the same meaning produce the same match. Now used in PaperChase, the method is surprisingly powerful in matching users' queries to Medical Subject Headings. PMID:9357680
Information Retrieval Using UMLS-based Structured Queries
Fagan, Lawrence M.; Berrios, Daniel C.; Chan, Albert; Cucina, Russell; Datta, Anupam; Shah, Maulik; Surendran, Sujith
2001-01-01
During the last three years, we have developed and described components of ELBook, a semantically based information-retrieval system [1-4]. Using these components, domain experts can specify a query model, indexers can use the query model to index documents, and end-users can search these documents for instances of indexed queries.
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong
2016-01-01
Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.
Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong
2016-01-01
Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider. PMID:27571421
A web-based system architecture for ontology-based data integration in the domain of IT benchmarking
NASA Astrophysics Data System (ADS)
Pfaff, Matthias; Krcmar, Helmut
2018-03-01
In the domain of IT benchmarking (ITBM), a variety of data and information are collected. Although these data serve as the basis for business analyses, no unified semantic representation of such data yet exists. Consequently, data analysis across different distributed data sets and different benchmarks is almost impossible. This paper presents a system architecture and prototypical implementation for an integrated data management of distributed databases based on a domain-specific ontology. To preserve the semantic meaning of the data, the ITBM ontology is linked to data sources and functions as the central concept for database access. Thus, additional databases can be integrated by linking them to this domain-specific ontology and are directly available for further business analyses. Moreover, the web-based system supports the process of mapping ontology concepts to external databases by introducing a semi-automatic mapping recommender and by visualizing possible mapping candidates. The system also provides a natural language interface to easily query linked databases. The expected result of this ontology-based approach of knowledge representation and data access is an increase in knowledge and data sharing in this domain, which will enhance existing business analysis methods.
Semantic annotation of consumer health questions.
Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina
2018-02-06
Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
DOORS to the semantic web and grid with a PORTAL for biomedical computing.
Taswell, Carl
2008-03-01
The semantic web remains in the early stages of development. It has not yet achieved the goals envisioned by its founders as a pervasive web of distributed knowledge and intelligence. Success will be attained when a dynamic synergism can be created between people and a sufficient number of infrastructure systems and tools for the semantic web in analogy with those for the original web. The domain name system (DNS), web browsers, and the benefits of publishing web pages motivated many people to register domain names and publish web sites on the original web. An analogous resource label system, semantic search applications, and the benefits of collaborative semantic networks will motivate people to register resource labels and publish resource descriptions on the semantic web. The Domain Ontology Oriented Resource System (DOORS) and Problem Oriented Registry of Tags and Labels (PORTAL) are proposed as infrastructure systems for resource metadata within a paradigm that can serve as a bridge between the original web and the semantic web. The Internet Registry Information Service (IRIS) registers [corrected] domain names while DNS publishes domain addresses with mapping of names to addresses for the original web. Analogously, PORTAL registers resource labels and tags while DOORS publishes resource locations and descriptions with mapping of labels to locations for the semantic web. BioPORT is proposed as a prototype PORTAL registry specific for the problem domain of biomedical computing.
Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai
2017-03-01
The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.
Representations for Semantic Learning Webs: Semantic Web Technology in Learning Support
ERIC Educational Resources Information Center
Dzbor, M.; Stutt, A.; Motta, E.; Collins, T.
2007-01-01
Recent work on applying semantic technologies to learning has concentrated on providing novel means of accessing and making use of learning objects. However, this is unnecessarily limiting: semantic technologies will make it possible to develop a range of educational Semantic Web services, such as interpretation, structure-visualization, support…
Virtual Patients on the Semantic Web: A Proof-of-Application Study
Dafli, Eleni; Antoniou, Panagiotis; Ioannidis, Lazaros; Dombros, Nicholas; Topps, David
2015-01-01
Background Virtual patients are interactive computer simulations that are increasingly used as learning activities in modern health care education, especially in teaching clinical decision making. A key challenge is how to retrieve and repurpose virtual patients as unique types of educational resources between different platforms because of the lack of standardized content-retrieving and repurposing mechanisms. Semantic Web technologies provide the capability, through structured information, for easy retrieval, reuse, repurposing, and exchange of virtual patients between different systems. Objective An attempt to address this challenge has been made through the mEducator Best Practice Network, which provisioned frameworks for the discovery, retrieval, sharing, and reuse of medical educational resources. We have extended the OpenLabyrinth virtual patient authoring and deployment platform to facilitate the repurposing and retrieval of existing virtual patient material. Methods A standalone Web distribution and Web interface, which contains an extension for the OpenLabyrinth virtual patient authoring system, was implemented. This extension was designed to semantically annotate virtual patients to facilitate intelligent searches, complex queries, and easy exchange between institutions. The OpenLabyrinth extension enables OpenLabyrinth authors to integrate and share virtual patient case metadata within the mEducator3.0 network. Evaluation included 3 successive steps: (1) expert reviews; (2) evaluation of the ability of health care professionals and medical students to create, share, and exchange virtual patients through specific scenarios in extended OpenLabyrinth (OLabX); and (3) evaluation of the repurposed learning objects that emerged from the procedure. Results We evaluated 30 repurposed virtual patient cases. The evaluation, with a total of 98 participants, demonstrated the system’s main strength: the core repurposing capacity. The extensive metadata schema presentation facilitated user exploration and filtering of resources. Usability weaknesses were primarily related to standard computer applications’ ease of use provisions. Most evaluators provided positive feedback regarding educational experiences on both content and system usability. Evaluation results replicated across several independent evaluation events. Conclusions The OpenLabyrinth extension, as part of the semantic mEducator3.0 approach, is a virtual patient sharing approach that builds on a collection of Semantic Web services and federates existing sources of clinical and educational data. It is an effective sharing tool for virtual patients and has been merged into the next version of the app (OpenLabyrinth 3.3). Such tool extensions may enhance the medical education arsenal with capacities of creating simulation/game-based learning episodes, massive open online courses, curricular transformations, and a future robust infrastructure for enabling mobile learning. PMID:25616272
Virtual patients on the semantic Web: a proof-of-application study.
Dafli, Eleni; Antoniou, Panagiotis; Ioannidis, Lazaros; Dombros, Nicholas; Topps, David; Bamidis, Panagiotis D
2015-01-22
Virtual patients are interactive computer simulations that are increasingly used as learning activities in modern health care education, especially in teaching clinical decision making. A key challenge is how to retrieve and repurpose virtual patients as unique types of educational resources between different platforms because of the lack of standardized content-retrieving and repurposing mechanisms. Semantic Web technologies provide the capability, through structured information, for easy retrieval, reuse, repurposing, and exchange of virtual patients between different systems. An attempt to address this challenge has been made through the mEducator Best Practice Network, which provisioned frameworks for the discovery, retrieval, sharing, and reuse of medical educational resources. We have extended the OpenLabyrinth virtual patient authoring and deployment platform to facilitate the repurposing and retrieval of existing virtual patient material. A standalone Web distribution and Web interface, which contains an extension for the OpenLabyrinth virtual patient authoring system, was implemented. This extension was designed to semantically annotate virtual patients to facilitate intelligent searches, complex queries, and easy exchange between institutions. The OpenLabyrinth extension enables OpenLabyrinth authors to integrate and share virtual patient case metadata within the mEducator3.0 network. Evaluation included 3 successive steps: (1) expert reviews; (2) evaluation of the ability of health care professionals and medical students to create, share, and exchange virtual patients through specific scenarios in extended OpenLabyrinth (OLabX); and (3) evaluation of the repurposed learning objects that emerged from the procedure. We evaluated 30 repurposed virtual patient cases. The evaluation, with a total of 98 participants, demonstrated the system's main strength: the core repurposing capacity. The extensive metadata schema presentation facilitated user exploration and filtering of resources. Usability weaknesses were primarily related to standard computer applications' ease of use provisions. Most evaluators provided positive feedback regarding educational experiences on both content and system usability. Evaluation results replicated across several independent evaluation events. The OpenLabyrinth extension, as part of the semantic mEducator3.0 approach, is a virtual patient sharing approach that builds on a collection of Semantic Web services and federates existing sources of clinical and educational data. It is an effective sharing tool for virtual patients and has been merged into the next version of the app (OpenLabyrinth 3.3). Such tool extensions may enhance the medical education arsenal with capacities of creating simulation/game-based learning episodes, massive open online courses, curricular transformations, and a future robust infrastructure for enabling mobile learning.
Semantics Enabled Queries in EuroGEOSS: a Discovery Augmentation Approach
NASA Astrophysics Data System (ADS)
Santoro, M.; Mazzetti, P.; Fugazza, C.; Nativi, S.; Craglia, M.
2010-12-01
One of the main challenges in Earth Science Informatics is to build interoperability frameworks which allow users to discover, evaluate, and use information from different scientific domains. This needs to address multidisciplinary interoperability challenges concerning both technological and scientific aspects. From the technological point of view, it is necessary to provide a set of special interoperability arrangement in order to develop flexible frameworks that allow a variety of loosely-coupled services to interact with each other. From a scientific point of view, it is necessary to document clearly the theoretical and methodological assumptions underpinning applications in different scientific domains, and develop cross-domain ontologies to facilitate interdisciplinary dialogue and understanding. In this presentation we discuss a brokering approach that extends the traditional Service Oriented Architecture (SOA) adopted by most Spatial Data Infrastructures (SDIs) to provide the necessary special interoperability arrangements. In the EC-funded EuroGEOSS (A European approach to GEOSS) project, we distinguish among three possible functional brokering components: discovery, access and semantics brokers. This presentation focuses on the semantics broker, the Discovery Augmentation Component (DAC), which was specifically developed to address the three thematic areas covered by the EuroGEOSS project: biodiversity, forestry and drought. The EuroGEOSS DAC federates both semantics (e.g. SKOS repositories) and ISO-compliant geospatial catalog services. The DAC can be queried using common geospatial constraints (i.e. what, where, when, etc.). Two different augmented discovery styles are supported: a) automatic query expansion; b) user assisted query expansion. In the first case, the main discovery steps are: i. the query keywords (the what constraint) are “expanded” with related concepts/terms retrieved from the set of federated semantic services. A default expansion regards the multilinguality relationship; ii. The resulting queries are submitted to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. In the second case, the main discovery steps are: i. the user browses the federated semantic repositories and selects the concepts/terms-of-interest; ii. The DAC creates the set of geospatial queries based on the selected concepts/terms and submits them to the federated catalog services; iii. The DAC performs a “smart” aggregation of the queries results and provides them back to the client. A Graphical User Interface (GUI) was also developed for testing and interacting with the DAC. The entire brokering framework is deployed in the context of EuroGEOSS infrastructure and it is used in a couple of GEOSS AIP-3 use scenarios: the “e-Habitat Use Scenario” for the Biodiversity and Climate Change topic, and the “Comprehensive Drought Index Use Scenario” for Water/Drought topic
Towards Hybrid Online On-Demand Querying of Realtime Data with Stateful Complex Event Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor K.
Emerging Big Data applications in areas like e-commerce and energy industry require both online and on-demand queries to be performed over vast and fast data arriving as streams. These present novel challenges to Big Data management systems. Complex Event Processing (CEP) is recognized as a high performance online query scheme which in particular deals with the velocity aspect of the 3-V’s of Big Data. However, traditional CEP systems do not consider data variety and lack the capability to embed ad hoc queries over the volume of data streams. In this paper, we propose H2O, a stateful complex event processing framework,more » to support hybrid online and on-demand queries over realtime data. We propose a semantically enriched event and query model to address data variety. A formal query algebra is developed to precisely capture the stateful and containment semantics of online and on-demand queries. We describe techniques to achieve the interactive query processing over realtime data featured by efficient online querying, dynamic stream data persistence and on-demand access. The system architecture is presented and the current implementation status reported.« less
Biomedical semantics in the Semantic Web
2011-01-01
The Semantic Web offers an ideal platform for representing and linking biomedical information, which is a prerequisite for the development and application of analytical tools to address problems in data-intensive areas such as systems biology and translational medicine. As for any new paradigm, the adoption of the Semantic Web offers opportunities and poses questions and challenges to the life sciences scientific community: which technologies in the Semantic Web stack will be more beneficial for the life sciences? Is biomedical information too complex to benefit from simple interlinked representations? What are the implications of adopting a new paradigm for knowledge representation? What are the incentives for the adoption of the Semantic Web, and who are the facilitators? Is there going to be a Semantic Web revolution in the life sciences? We report here a few reflections on these questions, following discussions at the SWAT4LS (Semantic Web Applications and Tools for Life Sciences) workshop series, of which this Journal of Biomedical Semantics special issue presents selected papers from the 2009 edition, held in Amsterdam on November 20th. PMID:21388570
Biomedical semantics in the Semantic Web.
Splendiani, Andrea; Burger, Albert; Paschke, Adrian; Romano, Paolo; Marshall, M Scott
2011-03-07
The Semantic Web offers an ideal platform for representing and linking biomedical information, which is a prerequisite for the development and application of analytical tools to address problems in data-intensive areas such as systems biology and translational medicine. As for any new paradigm, the adoption of the Semantic Web offers opportunities and poses questions and challenges to the life sciences scientific community: which technologies in the Semantic Web stack will be more beneficial for the life sciences? Is biomedical information too complex to benefit from simple interlinked representations? What are the implications of adopting a new paradigm for knowledge representation? What are the incentives for the adoption of the Semantic Web, and who are the facilitators? Is there going to be a Semantic Web revolution in the life sciences?We report here a few reflections on these questions, following discussions at the SWAT4LS (Semantic Web Applications and Tools for Life Sciences) workshop series, of which this Journal of Biomedical Semantics special issue presents selected papers from the 2009 edition, held in Amsterdam on November 20th.
Finding gene regulatory network candidates using the gene expression knowledge base.
Venkatesan, Aravind; Tripathi, Sushil; Sanz de Galdeano, Alejandro; Blondé, Ward; Lægreid, Astrid; Mironov, Vladimir; Kuiper, Martin
2014-12-10
Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis. We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions. Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.
Visual analytics for semantic queries of TerraSAR-X image content
NASA Astrophysics Data System (ADS)
Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai
2015-10-01
With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?
Meeting medical terminology needs--the Ontology-Enhanced Medical Concept Mapper.
Leroy, G; Chen, H
2001-12-01
This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.
Taking Open Innovation to the Molecular Level - Strengths and Limitations.
Zdrazil, Barbara; Blomberg, Niklas; Ecker, Gerhard F
2012-08-01
The ever-growing availability of large-scale open data and its maturation is having a significant impact on industrial drug-discovery, as well as on academic and non-profit research. As industry is changing to an 'open innovation' business concept, precompetitive initiatives and strong public-private partnerships including academic research cooperation partners are gaining more and more importance. Now, the bioinformatics and cheminformatics communities are seeking for web tools which allow the integration of this large volume of life science datasets available in the public domain. Such a data exploitation tool would ideally be able to answer complex biological questions by formulating only one search query. In this short review/perspective, we outline the use of semantic web approaches for data and knowledge integration. Further, we discuss strengths and current limitations of public available data retrieval tools and integrated platforms.
Albin, Aaron; Ji, Xiaonan; Borlawsky, Tara B; Ye, Zhan; Lin, Simon; Payne, Philip Ro; Huang, Kun; Xiang, Yang
2014-10-07
The Unified Medical Language System (UMLS) contains many important ontologies in which terms are connected by semantic relations. For many studies on the relationships between biomedical concepts, the use of transitively associated information from ontologies and the UMLS has been shown to be effective. Although there are a few tools and methods available for extracting transitive relationships from the UMLS, they usually have major restrictions on the length of transitive relations or on the number of data sources. Our goal was to design an efficient online platform that enables efficient studies on the conceptual relationships between any medical terms. To overcome the restrictions of available methods and to facilitate studies on the conceptual relationships between medical terms, we developed a Web platform, onGrid, that supports efficient transitive queries and conceptual relationship studies using the UMLS. This framework uses the latest technique in converting natural language queries into UMLS concepts, performs efficient transitive queries, and visualizes the result paths. It also dynamically builds a relationship matrix for two sets of input biomedical terms. We are thus able to perform effective studies on conceptual relationships between medical terms based on their relationship matrix. The advantage of onGrid is that it can be applied to study any two sets of biomedical concept relations and the relations within one set of biomedical concepts. We use onGrid to study the disease-disease relationships in the Online Mendelian Inheritance in Man (OMIM). By crossvalidating our results with an external database, the Comparative Toxicogenomics Database (CTD), we demonstrated that onGrid is effective for the study of conceptual relationships between medical terms. onGrid is an efficient tool for querying the UMLS for transitive relations, studying the relationship between medical terms, and generating hypotheses.
Rapid Deployment of a RESTful Service for Oceanographic Research Cruises
NASA Astrophysics Data System (ADS)
Fu, Linyun; Arko, Robert; Leadbetter, Adam
2014-05-01
The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.
Teodoro, Douglas; Pasche, Emilie; Gobeill, Julien; Emonet, Stéphane; Ruch, Patrick; Lovis, Christian
2012-05-29
Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1 × 10(2) seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval -0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval -0.004 to 0.028; P = .004). We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.
Pasche, Emilie; Gobeill, Julien; Emonet, Stéphane; Ruch, Patrick; Lovis, Christian
2012-01-01
Background Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. Objective To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. Methods Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. Results We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1×102 seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval –0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval –0.004 to 0.028; P = .004). Conclusions We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions. PMID:22642960
Producing approximate answers to database queries
NASA Technical Reports Server (NTRS)
Vrbsky, Susan V.; Liu, Jane W. S.
1993-01-01
We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.
Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study
NASA Astrophysics Data System (ADS)
Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.
Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing
NASA Astrophysics Data System (ADS)
Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.
2008-12-01
The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with many interconnected entities (artifacts or agents) are represented as RDF triples using semantic content repository middleware Tupelo in one or many data/metadata RDF stores. Queries to the RDF enables discovery of relations among data, process and people, digging out valuable aspects, making recommendations to users, such as what tools are typically used to answer certain kinds of questions or with certain types of dataset. This innovative concept brings out coherent information about entities from four different perspectives of the social context (Who-human relations and interactions), the casual context (Why - provenance and history), the geo-spatial context (Where - location or spatially referenced information) and the conceptual context (What - domain specific relations, ontologies etc.).
Facilitating Semantic Interoperability Among Ocean Data Systems: ODIP-R2R Student Outcomes
NASA Astrophysics Data System (ADS)
Stocks, K. I.; Chen, Y.; Shepherd, A.; Chandler, C. L.; Dockery, N.; Elya, J. L.; Smith, S. R.; Ferreira, R.; Fu, L.; Arko, R. A.
2014-12-01
With informatics providing an increasingly important set of tools for geoscientists, it is critical to train the next generation of scientists in information and data techniques. The NSF-supported Rolling Deck to Repository (R2R) Program works with the academic fleet community to routinely document, assess, and preserve the underway sensor data from U.S. research vessels. The Ocean Data Interoperability Platform (ODIP) is an EU-US-Australian collaboration fostering interoperability among regional e-infrastructures through workshops and joint prototype development. The need to align terminology between systems is a common challenge across all of the ODIP prototypes. Five R2R students were supported to address aspects of semantic interoperability within ODIP. Developing a vocabulary matching service that links terms from different vocabularies with similar concept. The service implements Google Refine reconciliation service interface such that users can leverage Google Refine application as a friendly user interface while linking different vocabulary terms. Developing Resource Description Framework (RDF) resources that map Shipboard Automated Meteorological Oceanographic System (SAMOS) vocabularies to internationally served vocabularies. Each SAMOS vocabulary term (data parameter and quality control flag) will be described as an RDF resource page. These RDF resources allow for enhanced discoverability and retrieval of SAMOS data by enabling data searches based on parameter. Improving data retrieval and interoperability by exposing data and mapped vocabularies using Semantic Web technologies. We have collaborated with ODIP participating organizations in order to build a generalized data model that will be used to populate a SPARQL endpoint in order to provide expressive querying over our data files. Mapping local and regional vocabularies used by R2R to those used by ODIP partners. This work is described more fully in a companion poster. Making published Linked Data Web developer-friendly with a RESTful service. This goal was achieved by defining a proxy layer on top of the existing SPARQL endpoint that 1) translates HTTP requests into SPARQL queries, and 2) renders the returned results as required by the request sender using content negotiation, suffixes and parameters.
A Semantic Approach for Knowledge Discovery to Help Mitigate Habitat Loss in the Gulf of Mexico
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Maskey, M.; Graves, S.; Hardin, D.
2008-12-01
Noesis is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities. Ontologies enable Noesis to help users refine their searches for information on the open web and in hidden web locations such as data catalogues with standardized, but discipline specific vocabularies. Through its ontologies Noesis provides a guided refinement of search queries which produces complete and accurate searches while reducing the user's burden to experiment with different search strings. All search results are organized by categories (e. g. all results from Google are grouped together) which may be selected or omitted according to the desire of the user. During the past two years ontologies were developed for sea grasses in the Gulf of Mexico and were used to support a habitat restoration demonstration project. Currently these ontologies are being augmented to address the special characteristics of mangroves. These new ontologies will extend the demonstration project to broader regions of the Gulf including protected mangrove locations in coastal Mexico. Noesis contributes to the decision making process by producing a comprehensive list of relevant resources based on the semantic information contained in the ontologies. Ontologies are organized in a tree like taxonomies, where the child nodes represent the Specializations and the parent nodes represent the Generalizations of a node or concept. Specializations can be used to provide more detailed search, while generalizations are used to make the search broader. Ontologies are also used to link two syntactically different terms to one semantic concept (synonyms). Appending a synonym to the query expands the search, thus providing better search coverage. Every concept has a set of properties that are neither in the same inheritance hierarchy (Specializations / Generalizations) nor equivalent (synonyms). These are called Related Concepts and they are captured in the ontology through property relationships. By using Related Concepts users can search for resources with respect to a particular property. Noesis automatically generates searches that include all of these capabilities, removing the burden from the user and producing broader and more accurate search results. This presentation will demonstrate the features of Noesis and describe its application to habitat studies in the Gulf of Mexico.
The Semantic Web in Teacher Education
ERIC Educational Resources Information Center
Czerkawski, Betül Özkan
2014-01-01
The Semantic Web enables increased collaboration among computers and people by organizing unstructured data on the World Wide Web. Rather than a separate body, the Semantic Web is a functional extension of the current Web made possible by defining relationships among websites and other online content. When explicitly defined, these relationships…
NASA Astrophysics Data System (ADS)
Chmiel, P.; Ganzha, M.; Jaworska, T.; Paprzycki, M.
2017-10-01
Nowadays, as a part of systematic growth of volume, and variety, of information that can be found on the Internet, we observe also dramatic increase in sizes of available image collections. There are many ways to help users browsing / selecting images of interest. One of popular approaches are Content-Based Image Retrieval (CBIR) systems, which allow users to search for images that match their interests, expressed in the form of images (query by example). However, we believe that image search and retrieval could take advantage of semantic technologies. We have decided to test this hypothesis. Specifically, on the basis of knowledge captured in the CBIR, we have developed a domain ontology of residential real estate (detached houses, in particular). This allows us to semantically represent each image (and its constitutive architectural elements) represented within the CBIR. The proposed ontology was extended to capture not only the elements resulting from image segmentation, but also "spatial relations" between them. As a result, a new approach to querying the image database (semantic querying) has materialized, thus extending capabilities of the developed system.
Trust estimation of the semantic web using semantic web clustering
NASA Astrophysics Data System (ADS)
Shirgahi, Hossein; Mohsenzadeh, Mehran; Haj Seyyed Javadi, Hamid
2017-05-01
Development of semantic web and social network is undeniable in the Internet world these days. Widespread nature of semantic web has been very challenging to assess the trust in this field. In recent years, extensive researches have been done to estimate the trust of semantic web. Since trust of semantic web is a multidimensional problem, in this paper, we used parameters of social network authority, the value of pages links authority and semantic authority to assess the trust. Due to the large space of semantic network, we considered the problem scope to the clusters of semantic subnetworks and obtained the trust of each cluster elements as local and calculated the trust of outside resources according to their local trusts and trust of clusters to each other. According to the experimental result, the proposed method shows more than 79% Fscore that is about 11.9% in average more than Eigen, Tidal and centralised trust methods. Mean of error in this proposed method is 12.936, that is 9.75% in average less than Eigen and Tidal trust methods.
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework
Grigis, Antoine; Goyard, David; Cherbonnier, Robin; Gareau, Thomas; Papadopoulos Orfanos, Dimitri; Chauvat, Nicolas; Di Mascio, Adrien; Schumann, Gunter; Spooren, Will; Murphy, Declan; Frouin, Vincent
2017-01-01
In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts. PMID:28360851
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework.
Grigis, Antoine; Goyard, David; Cherbonnier, Robin; Gareau, Thomas; Papadopoulos Orfanos, Dimitri; Chauvat, Nicolas; Di Mascio, Adrien; Schumann, Gunter; Spooren, Will; Murphy, Declan; Frouin, Vincent
2017-01-01
In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts.
The Fusion Model of Intelligent Transportation Systems Based on the Urban Traffic Ontology
NASA Astrophysics Data System (ADS)
Yang, Wang-Dong; Wang, Tao
On these issues unified representation of urban transport information using urban transport ontology, it defines the statute and the algebraic operations of semantic fusion in ontology level in order to achieve the fusion of urban traffic information in the semantic completeness and consistency. Thus this paper takes advantage of the semantic completeness of the ontology to build urban traffic ontology model with which we resolve the problems as ontology mergence and equivalence verification in semantic fusion of traffic information integration. Information integration in urban transport can increase the function of semantic fusion, and reduce the amount of data integration of urban traffic information as well enhance the efficiency and integrity of traffic information query for the help, through the practical application of intelligent traffic information integration platform of Changde city, the paper has practically proved that the semantic fusion based on ontology increases the effect and efficiency of the urban traffic information integration, reduces the storage quantity, and improve query efficiency and information completeness.
Sexual information seeking on web search engines.
Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles
2004-02-01
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
Putting semantics into the semantic web: how well can it capture biology?
Kazic, Toni
2006-01-01
Could the Semantic Web work for computations of biological interest in the way it's intended to work for movie reviews and commercial transactions? It would be wonderful if it could, so it's worth looking to see if its infrastructure is adequate to the job. The technologies of the Semantic Web make several crucial assumptions. I examine those assumptions; argue that they create significant problems; and suggest some alternative ways of achieving the Semantic Web's goals for biology.
Amatchmethod Based on Latent Semantic Analysis for Earthquakehazard Emergency Plan
NASA Astrophysics Data System (ADS)
Sun, D.; Zhao, S.; Zhang, Z.; Shi, X.
2017-09-01
The structure of the emergency plan on earthquake is complex, and it's difficult for decision maker to make a decision in a short time. To solve the problem, this paper presents a match method based on Latent Semantic Analysis (LSA). After the word segmentation preprocessing of emergency plan, we carry out keywords extraction according to the part-of-speech and the frequency of words. Then through LSA, we map the documents and query information to the semantic space, and calculate the correlation of documents and queries by the relation between vectors. The experiments results indicate that the LSA can improve the accuracy of emergency plan retrieval efficiently.
Web Image Search Re-ranking with Click-based Similarity and Typicality.
Yang, Xiaopeng; Mei, Tao; Zhang, Yong Dong; Liu, Jie; Satoh, Shin'ichi
2016-07-20
In image search re-ranking, besides the well known semantic gap, intent gap, which is the gap between the representation of users' query/demand and the real intent of the users, is becoming a major problem restricting the development of image retrieval. To reduce human effects, in this paper, we use image click-through data, which can be viewed as the "implicit feedback" from users, to help overcome the intention gap, and further improve the image search performance. Generally, the hypothesis visually similar images should be close in a ranking list and the strategy images with higher relevance should be ranked higher than others are widely accepted. To obtain satisfying search results, thus, image similarity and the level of relevance typicality are determinate factors correspondingly. However, when measuring image similarity and typicality, conventional re-ranking approaches only consider visual information and initial ranks of images, while overlooking the influence of click-through data. This paper presents a novel re-ranking approach, named spectral clustering re-ranking with click-based similarity and typicality (SCCST). First, to learn an appropriate similarity measurement, we propose click-based multi-feature similarity learning algorithm (CMSL), which conducts metric learning based on clickbased triplets selection, and integrates multiple features into a unified similarity space via multiple kernel learning. Then based on the learnt click-based image similarity measure, we conduct spectral clustering to group visually and semantically similar images into same clusters, and get the final re-rank list by calculating click-based clusters typicality and withinclusters click-based image typicality in descending order. Our experiments conducted on two real-world query-image datasets with diverse representative queries show that our proposed reranking approach can significantly improve initial search results, and outperform several existing re-ranking approaches.
Exploiting Recurring Structure in a Semantic Network
NASA Technical Reports Server (NTRS)
Wolfe, Shawn R.; Keller, Richard M.
2004-01-01
With the growing popularity of the Semantic Web, an increasing amount of information is becoming available in machine interpretable, semantically structured networks. Within these semantic networks are recurring structures that could be mined by existing or novel knowledge discovery methods. The mining of these semantic structures represents an interesting area that focuses on mining both for and from the Semantic Web, with surprising applicability to problems confronting the developers of Semantic Web applications. In this paper, we present representative examples of recurring structures and show how these structures could be used to increase the utility of a semantic repository deployed at NASA.
Semantic SenseLab: implementing the vision of the Semantic Web in neuroscience
Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi
2011-01-01
Summary Objective Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Methods Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. Conclusion We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/ PMID:20006477
Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience.
Samwald, Matthias; Chen, Huajun; Ruttenberg, Alan; Lim, Ernest; Marenco, Luis; Miller, Perry; Shepherd, Gordon; Cheung, Kei-Hoi
2010-01-01
Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http://neuroweb.med.yale.edu/senselab/. 2009 Elsevier B.V. All rights reserved.
2011-01-01
Background The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control. We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data. Results Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases. Conclusions S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms. PMID:21756325
Deus, Helena F; Correa, Miriã C; Stanislaus, Romesh; Miragaia, Maria; Maass, Wolfgang; de Lencastre, Hermínia; Fox, Ronan; Almeida, Jonas S
2011-07-14
The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data. Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases. S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.
Web 3.0: Implications for Online Learning
ERIC Educational Resources Information Center
Morris, Robin D.
2010-01-01
The impact of Web 3.0, also known as the Semantic Web, on online learning is yet to be determined as the Semantic Web and its technologies continue to develop. Online instructors must have a rudimentary understanding of Web 3.0 to prepare for the next phase of online learning. This paper provides an understandable definition of the Semantic Web…
Semantic Web and Contextual Information: Semantic Network Analysis of Online Journalistic Texts
NASA Astrophysics Data System (ADS)
Lim, Yon Soo
This study examines why contextual information is important to actualize the idea of semantic web, based on a case study of a socio-political issue in South Korea. For this study, semantic network analyses were conducted regarding English-language based 62 blog posts and 101 news stories on the web. The results indicated the differences of the meaning structures between blog posts and professional journalism as well as between conservative journalism and progressive journalism. From the results, this study ascertains empirical validity of current concerns about the practical application of the new web technology, and discusses how the semantic web should be developed.
NASA Astrophysics Data System (ADS)
Brambilla, Marco; Ceri, Stefano; Valle, Emanuele Della; Facca, Federico M.; Tziviskou, Christina
Although Semantic Web Services are expected to produce a revolution in the development of Web-based systems, very few enterprise-wide design experiences are available; one of the main reasons is the lack of sound Software Engineering methods and tools for the deployment of Semantic Web applications. In this chapter, we present an approach to software development for the Semantic Web based on classical Software Engineering methods (i.e., formal business process development, computer-aided and component-based software design, and automatic code generation) and on semantic methods and tools (i.e., ontology engineering, semantic service annotation and discovery).
Similarity Based Semantic Web Service Match
NASA Astrophysics Data System (ADS)
Peng, Hui; Niu, Wenjia; Huang, Ronghuai
Semantic web service discovery aims at returning the most matching advertised services to the service requester by comparing the semantic of the request service with an advertised service. The semantic of a web service are described in terms of inputs, outputs, preconditions and results in Ontology Web Language for Service (OWL-S) which formalized by W3C. In this paper we proposed an algorithm to calculate the semantic similarity of two services by weighted averaging their inputs and outputs similarities. Case study and applications show the effectiveness of our algorithm in service match.
Semantic web for integrated network analysis in biomedicine.
Chen, Huajun; Ding, Li; Wu, Zhaohui; Yu, Tong; Dhanapalan, Lavanya; Chen, Jake Y
2009-03-01
The Semantic Web technology enables integration of heterogeneous data on the World Wide Web by making the semantics of data explicit through formal ontologies. In this article, we survey the feasibility and state of the art of utilizing the Semantic Web technology to represent, integrate and analyze the knowledge in various biomedical networks. We introduce a new conceptual framework, semantic graph mining, to enable researchers to integrate graph mining with ontology reasoning in network data analysis. Through four case studies, we demonstrate how semantic graph mining can be applied to the analysis of disease-causal genes, Gene Ontology category cross-talks, drug efficacy analysis and herb-drug interactions analysis.
Tracking changes in search behaviour at a health web site.
Eklund, Ann-Marie
2012-01-01
Nowadays, the internet is used as a means to provide the public with official information on many different topics, including health related matters and care providers. In this work we have studied a search log from the official Swedish health web site 1177.se for patterns of search behaviour over time. To improve the analysis, we mapped the queries to UMLS semantic types and MeSH categories. Our analysis shows that, as expected, diseases and health care activities are the ones of most interest, but also a clear increased interest in geographical locations in the setting of health care providers. We also note a change over time in which kinds of diseases are of interest. Finally, we conclude that this type of analysis may be useful in studies of what health related topics matter to the public, but also for design and follow-up of public information campaigns.
Kondylakis, Haridimos; Spanakis, Emmanouil G; Sfakianakis, Stelios; Sakkalis, Vangelis; Tsiknakis, Manolis; Marias, Kostas; Xia Zhao; Hong Qing Yu; Feng Dong
2015-08-01
The advancements in healthcare practice have brought to the fore the need for flexible access to health-related information and created an ever-growing demand for the design and the development of data management infrastructures for translational and personalized medicine. In this paper, we present the data management solution implemented for the MyHealthAvatar EU research project, a project that attempts to create a digital representation of a patient's health status. The platform is capable of aggregating several knowledge sources relevant for the provision of individualized personal services. To this end, state of the art technologies are exploited, such as ontologies to model all available information, semantic integration to enable data and query translation and a variety of linking services to allow connecting to external sources. All original information is stored in a NoSQL database for reasons of efficiency and fault tolerance. Then it is semantically uplifted through a semantic warehouse which enables efficient access to it. All different technologies are combined to create a novel web-based platform allowing seamless user interaction through APIs that support personalized, granular and secure access to the relevant information.
Ontologies as integrative tools for plant science
Walls, Ramona L.; Athreya, Balaji; Cooper, Laurel; Elser, Justin; Gandolfo, Maria A.; Jaiswal, Pankaj; Mungall, Christopher J.; Preece, Justin; Rensing, Stefan; Smith, Barry; Stevenson, Dennis W.
2012-01-01
Premise of the study Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web. Methods This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae). Key results Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. Conclusions Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies. PMID:22847540
Remembering the Important Things: Semantic Importance in Stream Reasoning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Rui; Greaves, Mark T.; Smith, William P.
Reasoning and querying over data streams rely on the abil- ity to deliver a sequence of stream snapshots to the processing algo- rithms. These snapshots are typically provided using windows as views into streams and associated window management strategies. Generally, the goal of any window management strategy is to preserve the most im- portant data in the current window and preferentially evict the rest, so that the retained data can continue to be exploited. A simple timestamp- based strategy is rst-in-rst-out (FIFO), in which items are replaced in strict order of arrival. All timestamp-based strategies implicitly assume that a temporalmore » ordering reliably re ects importance to the processing task at hand, and thus that window management using timestamps will maximize the ability of the processing algorithms to deliver accurate interpretations of the stream. In this work, we explore a general no- tion of semantic importance that can be used for window management for streams of RDF data using semantically-aware processing algorithms like deduction or semantic query. Semantic importance exploits the infor- mation carried in RDF and surrounding ontologies for ranking window data in terms of its likely contribution to the processing algorithms. We explore the general semantic categories of query contribution, prove- nance, and trustworthiness, as well as the contribution of domain-specic ontologies. We describe how these categories behave using several con- crete examples. Finally, we consider how a stream window management strategy based on semantic importance could improve overall processing performance, especially as available window sizes decrease.« less
Mining Longitudinal Web Queries: Trends and Patterns.
ERIC Educational Resources Information Center
Wang, Peiling; Berry, Michael W.; Yang, Yiheng
2003-01-01
Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…
The health care and life sciences community profile for dataset descriptions
Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko
2016-01-01
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295
SoyBase Simple Semantic Web Architecture and Protocol (SSWAP) Services
USDA-ARS?s Scientific Manuscript database
Semantic web technologies offer the potential to link internet resources and data by shared concepts without having to rely on absolute lexical matches. Thus two web sites or web resources which are concerned with similar data types could be identified based on similar semantics. In the biological...
Linked data scientometrics in semantic e-Science
NASA Astrophysics Data System (ADS)
Narock, Tom; Wimmer, Hayden
2017-03-01
The Semantic Web is inherently multi-disciplinary and many domains have taken advantage of semantic technologies. Yet, the geosciences are one of the fields leading the way in Semantic Web adoption and validation. Astronomy, Earth science, hydrology, and solar-terrestrial physics have seen a noteworthy amount of semantic integration. The geoscience community has been willing early adopters of semantic technologies and have provided essential feedback to the broader semantic web community. Yet, there has been no systematic study of the community as a whole and there exists no quantitative data on the impact and status of semantic technologies in the geosciences. We explore the applicability of Linked Data to scientometrics in the geosciences. In doing so, we gain an initial understanding of the breadth and depth of the Semantic Web in the geosciences. We identify what appears to be a transitionary period in the applicability of these technologies.
Provenance Usage in the OceanLink Project
NASA Astrophysics Data System (ADS)
Narock, T.; Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Finin, T.; Hitzler, P.; Janowicz, K.; Jones, M.; Krisnadhi, A.; Lehnert, K. A.; Mickle, A.; Raymond, L. M.; Schildhauer, M.; Shepherd, A.; Wiebe, P. H.
2014-12-01
A wide spectrum of maturing methods and tools, collectively characterized as the Semantic Web, is helping to vastly improve thedissemination of scientific research. The OceanLink project, an NSF EarthCube Building Block, is utilizing semantic technologies tointegrate geoscience data repositories, library holdings, conference abstracts, and funded research awards. Provenance is a vital componentin meeting both the scientific and engineering requirements of OceanLink. Provenance plays a key role in justification and understanding when presenting users with results aggregated from multiple sources. In the engineering sense, provenance enables the identification of new data and the ability to determine which data sources to query. Additionally, OceanLink will leverage human and machine computation for crowdsourcing, text mining, and co-reference resolution. The results of these computations, and their associated provenance, will be folded back into the constituent systems to continually enhance precision and utility. We will touch on the various roles provenance is playing in OceanLink as well as present our use of the PROV Ontology and associated Ontology Design Patterns.
Semantic integration of data on transcriptional regulation.
Baitaluk, Michael; Ponomarenko, Julia
2010-07-01
Experimental and predicted data concerning gene transcriptional regulation are distributed among many heterogeneous sources. However, there are no resources to integrate these data automatically or to provide a 'one-stop shop' experience for users seeking information essential for deciphering and modeling gene regulatory networks. IntegromeDB, a semantic graph-based 'deep-web' data integration system that automatically captures, integrates and manages publicly available data concerning transcriptional regulation, as well as other relevant biological information, is proposed in this article. The problems associated with data integration are addressed by ontology-driven data mapping, multiple data annotation and heterogeneous data querying, also enabling integration of the user's data. IntegromeDB integrates over 100 experimental and computational data sources relating to genomics, transcriptomics, genetics, and functional and interaction data concerning gene transcriptional regulation in eukaryotes and prokaryotes. IntegromeDB is accessible through the integrated research environment BiologicalNetworks at http://www.BiologicalNetworks.org baitaluk@sdsc.edu Supplementary data are available at Bioinformatics online.
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.
Lekschas, Fritz; Gehlenborg, Nils
2018-04-01
The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
Temporal Representation in Semantic Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levandoski, J J; Abdulla, G M
2007-08-07
A wide range of knowledge discovery and analysis applications, ranging from business to biological, make use of semantic graphs when modeling relationships and concepts. Most of the semantic graphs used in these applications are assumed to be static pieces of information, meaning temporal evolution of concepts and relationships are not taken into account. Guided by the need for more advanced semantic graph queries involving temporal concepts, this paper surveys the existing work involving temporal representations in semantic graphs.
NASA Technical Reports Server (NTRS)
Aspinall, David; Denney, Ewen; Lueth, Christoph
2012-01-01
We motivate and introduce a query language PrQL designed for inspecting machine representations of proofs. PrQL natively supports hiproofs which express proof structure using hierarchical nested labelled trees. The core language presented in this paper is locally structured (first-order), with queries built using recursion and patterns over proof structure and rule names. We define the syntax and semantics of locally structured queries, demonstrate their power, and sketch some implementation experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Antonio
Advanced Natural Language Processing Tools for Web Information Retrieval, Content Analysis, and Synthesis. The goal of this SBIR was to implement and evaluate several advanced Natural Language Processing (NLP) tools and techniques to enhance the precision and relevance of search results by analyzing and augmenting search queries and by helping to organize the search output obtained from heterogeneous databases and web pages containing textual information of interest to DOE and the scientific-technical user communities in general. The SBIR investigated 1) the incorporation of spelling checkers in search applications, 2) identification of significant phrases and concepts using a combination of linguisticmore » and statistical techniques, and 3) enhancement of the query interface and search retrieval results through the use of semantic resources, such as thesauri. A search program with a flexible query interface was developed to search reference databases with the objective of enhancing search results from web queries or queries of specialized search systems such as DOE's Information Bridge. The DOE ETDE/INIS Joint Thesaurus was processed to create a searchable database. Term frequencies and term co-occurrences were used to enhance the web information retrieval by providing algorithmically-derived objective criteria to organize relevant documents into clusters containing significant terms. A thesaurus provides an authoritative overview and classification of a field of knowledge. By organizing the results of a search using the thesaurus terminology, the output is more meaningful than when the results are just organized based on the terms that co-occur in the retrieved documents, some of which may not be significant. An attempt was made to take advantage of the hierarchy provided by broader and narrower terms, as well as other field-specific information in the thesauri. The search program uses linguistic morphological routines to find relevant entries regardless of whether terms are stored in singular or plural form. Implementation of additional inflectional morphology processes for verbs can enhance retrieval further, but this has to be balanced by the possibility of broadening the results too much. In addition to the DOE energy thesaurus, other sources of specialized organized knowledge such as the Medical Subject Headings (MeSH), the Unified Medical Language System (UMLS), and Wikipedia were investigated. The supporting role of the NLP thesaurus search program was enhanced by incorporating spelling aid and a part-of-speech tagger to cope with misspellings in the queries and to determine the grammatical roles of the query words and identify nouns for special processing. To improve precision, multiple modes of searching were implemented including Boolean operators, and field-specific searches. Programs to convert a thesaurus or reference file into searchable support files can be deployed easily, and the resulting files are immediately searchable to produce relevance-ranked results with builtin spelling aid, morphological processing, and advanced search logic. Demonstration systems were built for several databases, including the DOE energy thesaurus.« less
A service-oriented distributed semantic mediator: integrating multiscale biomedical information.
Mora, Oscar; Engelbrecht, Gerhard; Bisbal, Jesus
2012-11-01
Biomedical research continuously generates large amounts of heterogeneous and multimodal data spread over multiple data sources. These data, if appropriately shared and exploited, could dramatically improve the research practice itself, and ultimately the quality of health care delivered. This paper presents DISMED (DIstributed Semantic MEDiator), an open source semantic mediator that provides a unified view of a federated environment of multiscale biomedical data sources. DISMED is a Web-based software application to query and retrieve information distributed over a set of registered data sources, using semantic technologies. It also offers a userfriendly interface specifically designed to simplify the usage of these technologies by non-expert users. Although the architecture of the software mediator is generic and domain independent, in the context of this paper, DISMED has been evaluated for managing biomedical environments and facilitating research with respect to the handling of scientific data distributed in multiple heterogeneous data sources. As part of this contribution, a quantitative evaluation framework has been developed. It consist of a benchmarking scenario and the definition of five realistic use-cases. This framework, created entirely with public datasets, has been used to compare the performance of DISMED against other available mediators. It is also available to the scientific community in order to evaluate progress in the domain of semantic mediation, in a systematic and comparable manner. The results show an average improvement in the execution time by DISMED of 55% compared to the second best alternative in four out of the five use-cases of the experimental evaluation.
Developing a Domain Ontology: the Case of Water Cycle and Hydrology
NASA Astrophysics Data System (ADS)
Gupta, H.; Pozzi, W.; Piasecki, M.; Imam, B.; Houser, P.; Raskin, R.; Ramachandran, R.; Martinez Baquero, G.
2008-12-01
A semantic web ontology enables semantic data integration and semantic smart searching. Several organizations have attempted to implement smart registration and integration or searching using ontologies. These are the NOESIS (NSF project: LEAD) and HydroSeek (NSF project: CUAHS HIS) data discovery engines and the NSF project GEON. All three applications use ontologies to discover data from multiple sources and projects. The NASA WaterNet project was established to identify creative, innovative ways to bridge NASA research results to real world applications, linking decision support needs to available data, observations, and modeling capability. WaterNet (NASA project) utilized the smart query tool Noesis as a testbed to test whether different ontologies (and different catalog searches) could be combined to match resources with user needs. NOESIS contains the upper level SWEET ontology that accepts plug in domain ontologies to refine user search queries, reducing the burden of multiple keyword searches. Another smart search interface was that developed for CUAHSI, HydroSeek, that uses a multi-layered concept search ontology, tagging variables names from any number of data sources to specific leaf and higher level concepts on which the search is executed. This approach has proven to be quite successful in mitigating semantic heterogeneity as the user does not need to know the semantic specifics of each data source system but just uses a set of common keywords to discover the data for a specific temporal and geospatial domain. This presentation will show tests with Noesis and Hydroseek lead to the conclusion that the construction of a complex, and highly heterogeneous water cycle ontology requires multiple ontology modules. To illustrate the complexity and heterogeneity of a water cycle ontology, Hydroseek successfully utilizes WaterOneFlow to integrate data across multiple different data collections, such as USGS NWIS. However,different methodologies are employed by the Earth Science, the Hydrological, and Hydraulic Engineering Communities, and each community employs models that require different input data. If a sub-domain ontology is created for each of these,describing water balance calculations, then the resulting structure of the semantic network describing these various terms can be rather complex, heterogeneous, and overlapping, and will require "mapping" between equivalent terms in the ontologies, along with the development of an upper level conceptual or domain ontology to utilize and link to those already in existence.
NASA Astrophysics Data System (ADS)
Sauermann, Leo; Kiesel, Malte; Schumacher, Kinga; Bernardi, Ansgar
In diesem Beitrag wird gezeigt, wie der Arbeitsplatz der Zukunft aussehen könnte und wo das Semantic Web neue Möglichkeiten eröffnet. Dazu werden Ansätze aus dem Bereich Semantic Web, Knowledge Representation, Desktop-Anwendungen und Visualisierung vorgestellt, die es uns ermöglichen, die bestehenden Daten eines Benutzers neu zu interpretieren und zu verwenden. Dabei bringt die Kombination von Semantic Web und Desktop Computern besondere Vorteile - ein Paradigma, das unter dem Titel Semantic Desktop bekannt ist. Die beschriebenen Möglichkeiten der Applikationsintegration sind aber nicht auf den Desktop beschränkt, sondern können genauso in Web-Anwendungen Verwendung finden.
Linked Metadata - lightweight semantics for data integration (Invited)
NASA Astrophysics Data System (ADS)
Hendler, J. A.
2013-12-01
The "Linked Open Data" cloud (http://linkeddata.org) is currently used to show how the linking of datasets, supported by SPARQL endpoints, is creating a growing set of linked data assets. This linked data space has been growing rapidly, and the last version collected is estimated to have had over 35 billion 'triples.' As impressive as this may sound, there is an inherent flaw in the way the linked data story is conceived. The idea is that all of the data is represented in a linked format (generally RDF) and applications will essentially query this cloud and provide mashup capabilities between the various kinds of data that are found. The view of linking in the cloud is fairly simple -links are provided by either shared URIs or by URIs that are asserted to be owl:sameAs. This view of the linking, which primarily focuses on shared objects and subjects in RDF's subject-predicate-object representation, misses a critical aspect of Semantic Web technology. Given triples such as * A:person1 foaf:knows A:person2 * B:person3 foaf:knows B:person4 * C:person5 foaf:name 'John Doe' this view would not consider them linked (barring other assertions) even though they share a common vocabulary. In fact, we get significant clues that there are commonalities in these data items from the shared namespaces and predicates, even if the traditional 'graph' view of RDF doesn't appear to join on these. Thus, it is the linking of the data descriptions, whether as metadata or other vocabularies, that provides the linking in these cases. This observation is crucial to scientific data integration where the size of the datasets, or even the individual relationships within them, can be quite large. (Note that this is not restricted to scientific data - search engines, social networks, and massive multiuser games also create huge amounts of data.) To convert all the triples into RDF and provide individual links is often unnecessary, and is both time and space intensive. Those looking to do on the fly integration may prefer to do more traditional data queries and then convert and link the 'views' returned at retrieval time, providing another means of using the linked data infrastructure without having to convert whole datasets to triples to provide linking. Web companies have been taking advantage of 'lightweight' semantic metadata for search quality and optimization (cf. schema.org), linking networks within and without web sites (cf. Facebook's Open Graph Protocol), and in doing various kinds of advertisement and user modeling across datasets. Scientific metadata, on the other hand, has traditionally been geared at being largescale and highly descriptive, and scientific ontologies have been aimed at high expressivity, essentially providing complex reasoning services rather than the less expressive vocabularies needed for data discovery and simple mappings that can allow humans (or more complex systems) when full scale integration is needed. Although this work is just the beginning for providing integration, as the community creates more and more datasets, discovery of these data resources on the Web becomes a crucial starting place. Simple descriptors, that can be combined with textual fields and/or common community vocabularies, can be a great starting place on bringing scientific data into the Web of Data that is growing in other communities. References: [1] Pouchard, Line C., et al. "A Linked Science investigation: enhancing climate change data discovery with semantic technologies." Earth science informatics 6.3 (2013): 175-185.
The STP (Solar-Terrestrial Physics) Semantic Web based on the RSS1.0 and the RDF
NASA Astrophysics Data System (ADS)
Kubo, T.; Murata, K. T.; Kimura, E.; Ishikura, S.; Shinohara, I.; Kasaba, Y.; Watari, S.; Matsuoka, D.
2006-12-01
In the Solar-Terrestrial Physics (STP), it is pointed out that circulation and utilization of observation data among researchers are insufficient. To archive interdisciplinary researches, we need to overcome this circulation and utilization problems. Under such a background, authors' group has developed a world-wide database that manages meta-data of satellite and ground-based observation data files. It is noted that retrieving meta-data from the observation data and registering them to database have been carried out by hand so far. Our goal is to establish the STP Semantic Web. The Semantic Web provides a common framework that allows a variety of data shared and reused across applications, enterprises, and communities. We also expect that the secondary information related with observations, such as event information and associated news, are also shared over the networks. The most fundamental issue on the establishment is who generates, manages and provides meta-data in the Semantic Web. We developed an automatic meta-data collection system for the observation data using the RSS (RDF Site Summary) 1.0. The RSS1.0 is one of the XML-based markup languages based on the RDF (Resource Description Framework), which is designed for syndicating news and contents of news-like sites. The RSS1.0 is used to describe the STP meta-data, such as data file name, file server address and observation date. To describe the meta-data of the STP beyond RSS1.0 vocabulary, we defined original vocabularies for the STP resources using the RDF Schema. The RDF describes technical terms on the STP along with the Dublin Core Metadata Element Set, which is standard for cross-domain information resource descriptions. Researchers' information on the STP by FOAF, which is known as an RDF/XML vocabulary, creates a machine-readable metadata describing people. Using the RSS1.0 as a meta-data distribution method, the workflow from retrieving meta-data to registering them into the database is automated. This technique is applied for several database systems, such as the DARTS database system and NICT Space Weather Report Service. The DARTS is a science database managed by ISAS/JAXA in Japan. We succeeded in generating and collecting the meta-data automatically for the CDF (Common data Format) data, such as Reimei satellite data, provided by the DARTS. We also create an RDF service for space weather report and real-time global MHD simulation 3D data provided by the NICT. Our Semantic Web system works as follows: The RSS1.0 documents generated on the data sites (ISAS and NICT) are automatically collected by a meta-data collection agent. The RDF documents are registered and the agent extracts meta-data to store them in the Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying. The RDF database provides advanced retrieval processing that has considered property and relation. Finally, the STP Semantic Web provides automatic processing or high level search for the data which are not only for observation data but for space weather news, physical events, technical terms and researches information related to the STP.
Bratsas, Charalampos; Koutkias, Vassilis; Kaimakamis, Evangelos; Bamidis, Panagiotis; Maglaveras, Nicos
2007-01-01
Medical Computational Problem (MCP) solving is related to medical problems and their computerized algorithmic solutions. In this paper, an extension of an ontology-based model to fuzzy logic is presented, as a means to enhance the information retrieval (IR) procedure in semantic management of MCPs. We present herein the methodology followed for the fuzzy expansion of the ontology model, the fuzzy query expansion procedure, as well as an appropriate ontology-based Vector Space Model (VSM) that was constructed for efficient mapping of user-defined MCP search criteria and MCP acquired knowledge. The relevant fuzzy thesaurus is constructed by calculating the simultaneous occurrences of terms and the term-to-term similarities derived from the ontology that utilizes UMLS (Unified Medical Language System) concepts by using Concept Unique Identifiers (CUI), synonyms, semantic types, and broader-narrower relationships for fuzzy query expansion. The current approach constitutes a sophisticated advance for effective, semantics-based MCP-related IR.
Semantic Web technologies for the big data in life sciences.
Wu, Hongyan; Yamaguchi, Atsuko
2014-08-01
The life sciences field is entering an era of big data with the breakthroughs of science and technology. More and more big data-related projects and activities are being performed in the world. Life sciences data generated by new technologies are continuing to grow in not only size but also variety and complexity, with great speed. To ensure that big data has a major influence in the life sciences, comprehensive data analysis across multiple data sources and even across disciplines is indispensable. The increasing volume of data and the heterogeneous, complex varieties of data are two principal issues mainly discussed in life science informatics. The ever-evolving next-generation Web, characterized as the Semantic Web, is an extension of the current Web, aiming to provide information for not only humans but also computers to semantically process large-scale data. The paper presents a survey of big data in life sciences, big data related projects and Semantic Web technologies. The paper introduces the main Semantic Web technologies and their current situation, and provides a detailed analysis of how Semantic Web technologies address the heterogeneous variety of life sciences big data. The paper helps to understand the role of Semantic Web technologies in the big data era and how they provide a promising solution for the big data in life sciences.
The Semantic Web and Educational Technology
ERIC Educational Resources Information Center
Maddux, Cleborne D., Ed.
2008-01-01
The "Semantic Web" is an idea proposed by Tim Berners-Lee, the inventor of the "World Wide Web." The topic has been generating a great deal of interest and enthusiasm, and there is a rapidly growing body of literature dealing with it. This article attempts to explain how the Semantic Web would work, and explores short-term and long-term…
ERIC Educational Resources Information Center
Fast, Karl V.; Campbell, D. Grant
2001-01-01
Compares the implied ontological frameworks of the Open Archives Initiative Protocol for Metadata Harvesting and the World Wide Web Consortium's Semantic Web. Discusses current search engine technology, semantic markup, indexing principles of special libraries and online databases, and componentization and the distinction between data and…
PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media
Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel
2013-01-01
Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295
A model-driven approach for representing clinical archetypes for Semantic Web environments.
Martínez-Costa, Catalina; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás; Maldonado, José Alberto
2009-02-01
The life-long clinical information of any person supported by electronic means configures his Electronic Health Record (EHR). This information is usually distributed among several independent and heterogeneous systems that may be syntactically or semantically incompatible. There are currently different standards for representing and exchanging EHR information among different systems. In advanced EHR approaches, clinical information is represented by means of archetypes. Most of these approaches use the Archetype Definition Language (ADL) to specify archetypes. However, ADL has some drawbacks when attempting to perform semantic activities in Semantic Web environments. In this work, Semantic Web technologies are used to specify clinical archetypes for advanced EHR architectures. The advantages of using the Ontology Web Language (OWL) instead of ADL are described and discussed in this work. Moreover, a solution combining Semantic Web and Model-driven Engineering technologies is proposed to transform ADL into OWL for the CEN EN13606 EHR architecture.
SAFOD Brittle Microstructure and Mechanics Knowledge Base (SAFOD BM2KB)
NASA Astrophysics Data System (ADS)
Babaie, H. A.; Hadizadeh, J.; di Toro, G.; Mair, K.; Kumar, A.
2008-12-01
We have developed a knowledge base to store and present the data collected by a group of investigators studying the microstructures and mechanics of brittle faulting using core samples from the SAFOD (San Andreas Fault Observatory at Depth) project. The investigations are carried out with a variety of analytical and experimental methods primarily to better understand the physics of strain localization in fault gouge. The knowledge base instantiates an specially-designed brittle rock deformation ontology developed at Georgia State University. The inference rules embedded in the semantic web languages, such as OWL, RDF, and RDFS, which are used in our ontology, allow the Pellet reasoner used in this application to derive additional truths about the ontology and knowledge of this domain. Access to the knowledge base is via a public website, which is designed to provide the knowledge acquired by all the investigators involved in the project. The stored data will be products of studies such as: experiments (e.g., high-velocity friction experiment), analyses (e.g., microstructural, chemical, mass transfer, mineralogical, surface, image, texture), microscopy (optical, HRSEM, FESEM, HRTEM]), tomography, porosity measurement, microprobe, and cathodoluminesence. Data about laboratories, experimental conditions, methods, assumptions, equipments, and mechanical properties and lithology of the studied samples will also be presented on the website per investigation. The ontology was modeled applying the UML (Unified Modeling Language) in Rational Rose, and implemented in OWL-DL (Ontology Web Language) using the Protégé ontology editor. The UML model was converted to OWL-DL by first mapping it to Ecore (.ecore) and Generator model (.genmodel) with the help of the EMF (Eclipse Modeling Framework) plugin in Eclipse. The Ecore model was then mapped to a .uml file, which later was converted into an .owl file and subsequently imported into the Protégé ontology editing environment. The web-interface was developed in java using eclipse as the IDE. The web interfaces to query and submit data were implemented applying JSP, servlets, javascript, and AJAX. The Jena API, a Java framework for building Semantic Web applications, was used to develop the web-interface. Jena provided a programmatic environment for RDF, RDFS, OWL, and SPARQL query engine. Building web applications with AJAX helps retrieving data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The application was deployed on an apache tomcat server at GSU. The SAFOD BM2KB website provides user-friendly search, submit, feedback, and other services. The General Search option allows users to search the knowledge base by selecting the classes (e.g., Experiment, Surface Analysis), their respective attributes (e.g., apparatus, date performed), and the relationships to other classes (e.g., Sample, Laboratory). The Search by Sample option allows users to search the knowledge base based on sample number. The Search by Investigator lets users to search the knowledge base by choosing an investigator who is involved in this project. The website also allows users to submit new data. The Submit Data option opens a page where users can submit the SAFOD data to our knowledge base by selecting specific classes and attributes. The submitted data then become available for query as part of the knowledge base. The SAFOD BM2KB can be accessed from the main SAFOD website.
Rassinoux, A-M
2011-01-01
To summarize excellent current research in the field of knowledge representation and management (KRM). A synopsis of the articles selected for the IMIA Yearbook 2011 is provided and an attempt to highlight the current trends in the field is sketched. This last decade, with the extension of the text-based web towards a semantic-structured web, NLP techniques have experienced a renewed interest in knowledge extraction. This trend is corroborated through the five papers selected for the KRM section of the Yearbook 2011. They all depict outstanding studies that exploit NLP technologies whenever possible in order to accurately extract meaningful information from various biomedical textual sources. Bringing semantic structure to the meaningful content of textual web pages affords the user with cooperative sharing and intelligent finding of electronic data. As exemplified by the best paper selection, more and more advanced biomedical applications aim at exploiting the meaningful richness of free-text documents in order to generate semantic metadata and recently to learn and populate domain ontologies. These later are becoming a key piece as they allow portraying the semantics of the Semantic Web content. Maintaining their consistency with documents and semantic annotations that refer to them is a crucial challenge of the Semantic Web for the coming years.
ERIC Educational Resources Information Center
Lytras, Miltiadis, Ed.; Naeve, Ambjorn, Ed.
2005-01-01
In the context of Knowledge Society, the convergence of knowledge and learning management is a critical milestone. "Intelligent Learning Infrastructure for Knowledge Intensive Organizations: A Semantic Web Perspective" provides state-of-the art knowledge through a balanced theoretical and technological discussion. The semantic web perspective…
Social Networking on the Semantic Web
ERIC Educational Resources Information Center
Finin, Tim; Ding, Li; Zhou, Lina; Joshi, Anupam
2005-01-01
Purpose: Aims to investigate the way that the semantic web is being used to represent and process social network information. Design/methodology/approach: The Swoogle semantic web search engine was used to construct several large data sets of Resource Description Framework (RDF) documents with social network information that were encoded using the…
Lifting Events in RDF from Interactions with Annotated Web Pages
NASA Astrophysics Data System (ADS)
Stühmer, Roland; Anicic, Darko; Sen, Sinan; Ma, Jun; Schmidt, Kay-Uwe; Stojanovic, Nenad
In this paper we present a method and an implementation for creating and processing semantic events from interaction with Web pages which opens possibilities to build event-driven applications for the (Semantic) Web. Events, simple or complex, are models for things that happen e.g., when a user interacts with a Web page. Events are consumed in some meaningful way e.g., for monitoring reasons or to trigger actions such as responses. In order for receiving parties to understand events e.g., comprehend what has led to an event, we propose a general event schema using RDFS. In this schema we cover the composition of complex events and event-to-event relationships. These events can then be used to route semantic information about an occurrence to different recipients helping in making the Semantic Web active. Additionally, we present an architecture for detecting and composing events in Web clients. For the contents of events we show a way of how they are enriched with semantic information about the context in which they occurred. The paper is presented in conjunction with the use case of Semantic Advertising, which extends traditional clickstream analysis by introducing semantic short-term profiling, enabling discovery of the current interest of a Web user and therefore supporting advertisement providers in responding with more relevant advertisements.
SciFlo: Semantically-Enabled Grid Workflow for Collaborative Science
NASA Astrophysics Data System (ADS)
Yunck, T.; Wilson, B. D.; Raskin, R.; Manipon, G.
2005-12-01
SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators). SciFlo's XML dataflow documents can be a mixture of concrete operators (fully bound operations) and abstract template operators (late binding via semantic lookup). All data objects and operators can be both simply typed (simple and complex types in XML schema) and semantically typed using controlled vocabularies (linked to OWL ontologies such as SWEET). By exploiting ontology-enhanced search and inference, one can discover (and automatically invoke) Web Services and operators that have been semantically labeled as performing the desired transformation, and adapt a particular invocation to the proper interface (number, types, and meaning of inputs and outputs). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data. SciFlo uses and preserves semantics, and also generates and infers new semantic annotations. Specifically, the SciFlo engine uses semantic metadata to understand (infer) what it is doing and potentially improve the data flow; preserves semantics by saving links to the semantics of (metadata describing) the input datasets, related datasets, and the data transformations (algorithms) used to generate downstream products; generates new metadata by allowing the user to add semantic annotations to the generated data products (or simply accept automatically generated provenance annotations); and infers new semantic metadata by understanding and applying logic to the semantics of the data and the transformations performed. Much ontology development still needs to be done but, nevertheless, SciFlo documents provide a substrate for using and preserving more semantics as ontologies develop. We will give a live demonstration of the growing SciFlo network using an example dataflow in which atmospheric temperature and water vapor profiles from three Earth Observing System (EOS) instruments are retrieved using SOAP (geo-location query & data access) services, co-registered, and visually & statistically compared on demand (see http://sciflo.jpl.nasa.gov for more information).
Searching the Web: The Public and Their Queries.
ERIC Educational Resources Information Center
Spink, Amanda; Wolfram, Dietmar; Jansen, Major B. J.; Saracevic, Tefko
2001-01-01
Reports findings from a study of searching behavior by over 200,000 users of the Excite search engine. Analysis of over one million queries revealed most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. Concludes that Web searching by the public differs significantly from searching of…
A Generic Evaluation Model for Semantic Web Services
NASA Astrophysics Data System (ADS)
Shafiq, Omair
Semantic Web Services research has gained momentum over the last few Years and by now several realizations exist. They are being used in a number of industrial use-cases. Soon software developers will be expected to use this infrastructure to build their B2B applications requiring dynamic integration. However, there is still a lack of guidelines for the evaluation of tools developed to realize Semantic Web Services and applications built on top of them. In normal software engineering practice such guidelines can already be found for traditional component-based systems. Also some efforts are being made to build performance models for servicebased systems. Drawing on these related efforts in component-oriented and servicebased systems, we identified the need for a generic evaluation model for Semantic Web Services applicable to any realization. The generic evaluation model will help users and customers to orient their systems and solutions towards using Semantic Web Services. In this chapter, we have presented the requirements for the generic evaluation model for Semantic Web Services and further discussed the initial steps that we took to sketch such a model. Finally, we discuss related activities for evaluating semantic technologies.
Mashup of Geo and Space Science Data Provided via Relational Databases in the Semantic Web
NASA Astrophysics Data System (ADS)
Ritschel, B.; Seelus, C.; Neher, G.; Iyemori, T.; Koyama, Y.; Yatagai, A. I.; Murayama, Y.; King, T. A.; Hughes, J. S.; Fung, S. F.; Galkin, I. A.; Hapgood, M. A.; Belehaki, A.
2014-12-01
The use of RDBMS for the storage and management of geo and space science data and/or metadata is very common. Although the information stored in tables is based on a data model and therefore well organized and structured, a direct mashup with RDF based data stored in triple stores is not possible. One solution of the problem consists in the transformation of the whole content into RDF structures and storage in triple stores. Another interesting way is the use of a specific system/service, such as e.g. D2RQ, for the access to relational database content as virtual, read only RDF graphs. The Semantic Web based -proof of concept- GFZ ISDC uses the triple store Virtuoso for the storage of general context information/metadata to geo and space science satellite and ground station data. There is information about projects, platforms, instruments, persons, product types, etc. available but no detailed metadata about the data granuals itself. Such important information, as e.g. start or end time or the detailed spatial coverage of a single measurement is stored in RDBMS tables of the ISDC catalog system only. In order to provide a seamless access to all available information about the granuals/data products a mashup of the different data resources (triple store and RDBMS) is necessary. This paper describes the use of D2RQ for a Semantic Web/SPARQL based mashup of relational databases used for ISDC data server but also for the access to IUGONET and/or ESPAS and further geo and space science data resources. RDBMS Relational Database Management System RDF Resource Description Framework SPARQL SPARQL Protocol And RDF Query Language D2RQ Accessing Relational Databases as Virtual RDF Graphs GFZ ISDC German Research Centre for Geosciences Information System and Data Center IUGONET Inter-university Upper Atmosphere Global Observation Network (Japanese project) ESPAS Near earth space data infrastructure for e-science (European Union funded project)
An Analysis of Web Image Queries for Search.
ERIC Educational Resources Information Center
Pu, Hsiao-Tieh
2003-01-01
Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)
Information integration from heterogeneous data sources: a Semantic Web approach.
Kunapareddy, Narendra; Mirhaji, Parsa; Richards, David; Casscells, S Ward
2006-01-01
Although the decentralized and autonomous implementation of health information systems has made it possible to extend the reach of surveillance systems to a variety of contextually disparate domains, public health use of data from these systems is not primarily anticipated. The Semantic Web has been proposed to address both representational and semantic heterogeneity in distributed and collaborative environments. We introduce a semantic approach for the integration of health data using the Resource Definition Framework (RDF) and the Simple Knowledge Organization System (SKOS) developed by the Semantic Web community.
Discovering Central Practitioners in a Medical Discussion Forum Using Semantic Web Analytics.
Rajabi, Enayat; Abidi, Syed Sibte Raza
2017-01-01
The aim of this paper is to investigate semantic web based methods to enrich and transform a medical discussion forum in order to perform semantics-driven social network analysis. We use the centrality measures as well as semantic similarity metrics to identify the most influential practitioners within a discussion forum. The centrality results of our approach are in line with centrality measures produced by traditional SNA methods, thus validating the applicability of semantic web based methods for SNA, particularly for analyzing social networks for specialized discussion forums.
Design and Implementation of e-Health System Based on Semantic Sensor Network Using IETF YANG.
Jin, Wenquan; Kim, Do Hyeun
2018-02-20
Recently, healthcare services can be delivered effectively to patients anytime and anywhere using e-Health systems. e-Health systems are developed through Information and Communication Technologies (ICT) that involve sensors, mobiles, and web-based applications for the delivery of healthcare services and information. Remote healthcare is an important purpose of the e-Health system. Usually, the eHealth system includes heterogeneous sensors from diverse manufacturers producing data in different formats. Device interoperability and data normalization is a challenging task that needs research attention. Several solutions are proposed in the literature based on manual interpretation through explicit programming. However, programmatically implementing the interpretation of the data sender and data receiver in the e-Health system for the data transmission is counterproductive as modification will be required for each new device added into the system. In this paper, an e-Health system with the Semantic Sensor Network (SSN) is proposed to address the device interoperability issue. In the proposed system, we have used IETF YANG for modeling the semantic e-Health data to represent the information of e-Health sensors. This modeling scheme helps in provisioning semantic interoperability between devices and expressing the sensing data in a user-friendly manner. For this purpose, we have developed an ontology for e-Health data that supports different styles of data formats. The ontology is defined in YANG for provisioning semantic interpretation of sensing data in the system by constructing meta-models of e-Health sensors. The proposed approach assists in the auto-configuration of eHealth sensors and querying the sensor network with semantic interoperability support for the e-Health system.
Design and Implementation of e-Health System Based on Semantic Sensor Network Using IETF YANG
Kim, Do Hyeun
2018-01-01
Recently, healthcare services can be delivered effectively to patients anytime and anywhere using e-Health systems. e-Health systems are developed through Information and Communication Technologies (ICT) that involve sensors, mobiles, and web-based applications for the delivery of healthcare services and information. Remote healthcare is an important purpose of the e-Health system. Usually, the eHealth system includes heterogeneous sensors from diverse manufacturers producing data in different formats. Device interoperability and data normalization is a challenging task that needs research attention. Several solutions are proposed in the literature based on manual interpretation through explicit programming. However, programmatically implementing the interpretation of the data sender and data receiver in the e-Health system for the data transmission is counterproductive as modification will be required for each new device added into the system. In this paper, an e-Health system with the Semantic Sensor Network (SSN) is proposed to address the device interoperability issue. In the proposed system, we have used IETF YANG for modeling the semantic e-Health data to represent the information of e-Health sensors. This modeling scheme helps in provisioning semantic interoperability between devices and expressing the sensing data in a user-friendly manner. For this purpose, we have developed an ontology for e-Health data that supports different styles of data formats. The ontology is defined in YANG for provisioning semantic interpretation of sensing data in the system by constructing meta-models of e-Health sensors. The proposed approach assists in the auto-configuration of eHealth sensors and querying the sensor network with semantic interoperability support for the e-Health system. PMID:29461493
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products
NASA Astrophysics Data System (ADS)
Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.
2014-12-01
The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
NASA Astrophysics Data System (ADS)
Petrie, C.; Margaria, T.; Lausen, H.; Zaremba, M.
Explores trade-offs among existing approaches. Reveals strengths and weaknesses of proposed approaches, as well as which aspects of the problem are not yet covered. Introduces software engineering approach to evaluating semantic web services. Service-Oriented Computing is one of the most promising software engineering trends because of the potential to reduce the programming effort for future distributed industrial systems. However, only a small part of this potential rests on the standardization of tools offered by the web services stack. The larger part of this potential rests upon the development of sufficient semantics to automate service orchestration. Currently there are many different approaches to semantic web service descriptions and many frameworks built around them. A common understanding, evaluation scheme, and test bed to compare and classify these frameworks in terms of their capabilities and shortcomings, is necessary to make progress in developing the full potential of Service-Oriented Computing. The Semantic Web Services Challenge is an open source initiative that provides a public evaluation and certification of multiple frameworks on common industrially-relevant problem sets. This edited volume reports on the first results in developing common understanding of the various technologies intended to facilitate the automation of mediation, choreography and discovery for Web Services using semantic annotations. Semantic Web Services Challenge: Results from the First Year is designed for a professional audience composed of practitioners and researchers in industry. Professionals can use this book to evaluate SWS technology for their potential practical use. The book is also suitable for advanced-level students in computer science.
ERIC Educational Resources Information Center
Olaniran, Bolanle A.
2010-01-01
The semantic web describes the process whereby information content is made available for machine consumption. With increased reliance on information communication technologies, the semantic web promises effective and efficient information acquisition and dissemination of products and services in the global economy, in particular, e-learning.…
Practical Experiences for the Development of Educational Systems in the Semantic Web
ERIC Educational Resources Information Center
Sánchez Vera, Ma. del Mar; Tomás Fernández Breis, Jesualdo; Serrano Sánchez, José Luis; Prendes Espinosa, Ma. Paz
2013-01-01
Semantic Web technologies have been applied in educational settings for different purposes in recent years, with the type of application being mainly defined by the way in which knowledge is represented and exploited. The basic technology for knowledge representation in Semantic Web settings is the ontology, which represents a common, shareable…
Comprehensive Analysis of Semantic Web Reasoners and Tools: A Survey
ERIC Educational Resources Information Center
Khamparia, Aditya; Pandey, Babita
2017-01-01
Ontologies are emerging as best representation techniques for knowledge based context domains. The continuing need for interoperation, collaboration and effective information retrieval has lead to the creation of semantic web with the help of tools and reasoners which manages personalized information. The future of semantic web lies in an ontology…
Semantic Search of Web Services
ERIC Educational Resources Information Center
Hao, Ke
2013-01-01
This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…
An Intelligent Semantic E-Learning Framework Using Context-Aware Semantic Web Technologies
ERIC Educational Resources Information Center
Huang, Weihong; Webster, David; Wood, Dawn; Ishaya, Tanko
2006-01-01
Recent developments of e-learning specifications such as Learning Object Metadata (LOM), Sharable Content Object Reference Model (SCORM), Learning Design and other pedagogy research in semantic e-learning have shown a trend of applying innovative computational techniques, especially Semantic Web technologies, to promote existing content-focused…
GO2PUB: Querying PubMed with semantic expansion of gene ontology terms
2012-01-01
Background With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. Results GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts’ agreement was high (kappa = 0.88). GO2PUB returned 69% of the relevant articles, GoPubMed: 40% and PubMed: 29%. GO2PUB and GoPubMed have 17% of their results in common, corresponding to 24% of the total number of relevant results. 70% of the articles returned by more than one tool were relevant. 36% of the relevant articles were returned only by GO2PUB, 17% only by GoPubMed and 14% only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77% and 40% for the first queries, and of 70% and 38% for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. Conclusions We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org. PMID:22958570
Neuro-symbolic representation learning on biological knowledge graphs.
Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert
2017-09-01
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. https://github.com/bio-ontology-research-group/walking-rdf-and-owl. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
The value of the Semantic Web in the laboratory.
Frey, Jeremy G
2009-06-01
The Semantic Web is beginning to impact on the wider chemical and physical sciences, beyond the earlier adopted bio-informatics. While useful in large-scale data driven science with automated processing, these technologies can also help integrate the work of smaller scale laboratories producing diverse data. The semantics aid the discovery, reliable re-use of data, provide improved provenance and facilitate automated processing by increased resilience to changes in presentation and reduced ambiguity. The Semantic Web, its tools and collections are not yet competitive with well-established solutions to current problems. It is in the reduced cost of instituting solutions to new problems that the versatility of Semantic Web-enabled data and resources will make their mark once the more general-purpose tools are more available.
Soualmia, L F; Charlet, J
2016-11-10
To summarize excellent current research in the field of Knowledge Representation and Management (KRM) within the health and medical care domain. We provide a synopsis of the 2016 IMIA selected articles as well as a related synthetic overview of the current and future field activities. A first step of the selection was performed through MEDLINE querying with a list of MeSH descriptors completed by a list of terms adapted to the KRM section. The second step of the selection was completed by the two section editors who separately evaluated the set of 1,432 articles. The third step of the selection consisted of a collective work that merged the evaluation results to retain 15 articles for peer-review. The selection and evaluation process of this Yearbook's section on Knowledge Representation and Management has yielded four excellent and interesting articles regarding semantic interoperability for health care by gathering heterogeneous sources (knowledge and data) and auditing ontologies. In the first article, the authors present a solution based on standards and Semantic Web technologies to access distributed and heterogeneous datasets in the domain of breast cancer clinical trials. The second article describes a knowledge-based recommendation system that relies on ontologies and Semantic Web rules in the context of chronic diseases dietary. The third article is related to concept-recognition and text-mining to derive common human diseases model and a phenotypic network of common diseases. In the fourth article, the authors highlight the need for auditing the SNOMED CT. They propose to use a crowdbased method for ontology engineering. The current research activities further illustrate the continuous convergence of Knowledge Representation and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care by proposing solutions to cope with the problem of semantic interoperability. Indeed, there is a need for powerful tools able to manage and interpret complex, large-scale and distributed datasets and knowledge bases, but also a need for user-friendly tools developed for the clinicians in their daily practice.
Provenance-Based Approaches to Semantic Web Service Discovery and Usage
ERIC Educational Resources Information Center
Narock, Thomas William
2012-01-01
The World Wide Web Consortium defines a Web Service as "a software system designed to support interoperable machine-to-machine interaction over a network." Web Services have become increasingly important both within and across organizational boundaries. With the recent advent of the Semantic Web, web services have evolved into semantic…
Ontology Reuse in Geoscience Semantic Applications
NASA Astrophysics Data System (ADS)
Mayernik, M. S.; Gross, M. B.; Daniels, M. D.; Rowan, L. R.; Stott, D.; Maull, K. E.; Khan, H.; Corson-Rikert, J.
2015-12-01
The tension between local ontology development and wider ontology connections is fundamental to the Semantic web. It is often unclear, however, what the key decision points should be for new semantic web applications in deciding when to reuse existing ontologies and when to develop original ontologies. In addition, with the growth of semantic web ontologies and applications, new semantic web applications can struggle to efficiently and effectively identify and select ontologies to reuse. This presentation will describe the ontology comparison, selection, and consolidation effort within the EarthCollab project. UCAR, Cornell University, and UNAVCO are collaborating on the EarthCollab project to use semantic web technologies to enable the discovery of the research output from a diverse array of projects. The EarthCollab project is using the VIVO Semantic web software suite to increase discoverability of research information and data related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) diverse research projects informed by geodesy through the UNAVCO geodetic facility and consortium. This presentation will outline of EarthCollab use cases, and provide an overview of key ontologies being used, including the VIVO-Integrated Semantic Framework (VIVO-ISF), Global Change Information System (GCIS), and Data Catalog (DCAT) ontologies. We will discuss issues related to bringing these ontologies together to provide a robust ontological structure to support the EarthCollab use cases. It is rare that a single pre-existing ontology meets all of a new application's needs. New projects need to stitch ontologies together in ways that fit into the broader semantic web ecosystem.
NASA Astrophysics Data System (ADS)
Piasecki, M.; Beran, B.
2007-12-01
Search engines have changed the way we see the Internet. The ability to find the information by just typing in keywords was a big contribution to the overall web experience. While the conventional search engine methodology worked well for textual documents, locating scientific data remains a problem since they are stored in databases not readily accessible by search engine bots. Considering different temporal, spatial and thematic coverage of different databases, especially for interdisciplinary research it is typically necessary to work with multiple data sources. These sources can be federal agencies which generally offer national coverage or regional sources which cover a smaller area with higher detail. However for a given geographic area of interest there often exists more than one database with relevant data. Thus being able to query multiple databases simultaneously is a desirable feature that would be tremendously useful for scientists. Development of such a search engine requires dealing with various heterogeneity issues. In scientific databases, systems often impose controlled vocabularies which ensure that they are generally homogeneous within themselves but are semantically heterogeneous when moving between different databases. This defines the boundaries of possible semantic related problems making it easier to solve than with the conventional search engines that deal with free text. We have developed a search engine that enables querying multiple data sources simultaneously and returns data in a standardized output despite the aforementioned heterogeneity issues between the underlying systems. This application relies mainly on metadata catalogs or indexing databases, ontologies and webservices with virtual globe and AJAX technologies for the graphical user interface. Users can trigger a search of dozens of different parameters over hundreds of thousands of stations from multiple agencies by providing a keyword, a spatial extent, i.e. a bounding box, and a temporal bracket. As part of this development we have also added an environment that allows users to do some of the semantic tagging, i.e. the linkage of a variable name (which can be anything they desire) to defined concepts in the ontology structure which in turn provides the backbone of the search engine.
Web queries as a source for syndromic surveillance.
Hulth, Anette; Rydevik, Gustaf; Linde, Annika
2009-01-01
In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
Federated ontology-based queries over cancer data
2012-01-01
Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
ER2OWL: Generating OWL Ontology from ER Diagram
NASA Astrophysics Data System (ADS)
Fahad, Muhammad
Ontology is the fundamental part of Semantic Web. The goal of W3C is to bring the web into (its full potential) a semantic web with reusing previous systems and artifacts. Most legacy systems have been documented in structural analysis and structured design (SASD), especially in simple or Extended ER Diagram (ERD). Such systems need up-gradation to become the part of semantic web. In this paper, we present ERD to OWL-DL ontology transformation rules at concrete level. These rules facilitate an easy and understandable transformation from ERD to OWL. The set of rules for transformation is tested on a structured analysis and design example. The framework provides OWL ontology for semantic web fundamental. This framework helps software engineers in upgrading the structured analysis and design artifact ERD, to components of semantic web. Moreover our transformation tool, ER2OWL, reduces the cost and time for building OWL ontologies with the reuse of existing entity relationship models.
Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David
2010-10-01
Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents
NASA Astrophysics Data System (ADS)
Thiam, Mouhamadou; Bennacer, Nacéra; Pernelle, Nathalie; Lô, Moussa
SHIRIis an ontology-based system for integration of semi-structured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extract-Align algorithm enriches the ontology and the number of terms (or named entities) aligned directly with the ontology increases.
Multimedia Web Searching Trends.
ERIC Educational Resources Information Center
Ozmutlu, Seda; Spink, Amanda; Ozmutlu, H. Cenk
2002-01-01
Examines and compares multimedia Web searching by Excite and FAST search engine users in 2001. Highlights include audio and video queries; time spent on searches; terms per query; ranking of the most frequently used terms; and differences in Web search behaviors of U.S. and European Web users. (Author/LRW)
Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.
De-Arteaga, Maria; Eggel, Ivan; Kahn, Charles E; Müller, Henning
2015-10-01
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
Context-Aware Online Commercial Intention Detection
NASA Astrophysics Data System (ADS)
Hu, Derek Hao; Shen, Dou; Sun, Jian-Tao; Yang, Qiang; Chen, Zheng
With more and more commercial activities moving onto the Internet, people tend to purchase what they need through Internet or conduct some online research before the actual transactions happen. For many Web users, their online commercial activities start from submitting a search query to search engines. Just like the common Web search queries, the queries with commercial intention are usually very short. Recognizing the queries with commercial intention against the common queries will help search engines provide proper search results and advertisements, help Web users obtain the right information they desire and help the advertisers benefit from the potential transactions. However, the intentions behind a query vary a lot for users with different background and interest. The intentions can even be different for the same user, when the query is issued in different contexts. In this paper, we present a new algorithm framework based on skip-chain conditional random field (SCCRF) for automatically classifying Web queries according to context-based online commercial intention. We analyze our algorithm performance both theoretically and empirically. Extensive experiments on several real search engine log datasets show that our algorithm can improve more than 10% on F1 score than previous algorithms on commercial intention detection.
VisGets: coordinated visualizations for web-based information exploration and discovery.
Dörk, Marian; Carpendale, Sheelagh; Collins, Christopher; Williamson, Carey
2008-01-01
In common Web-based search interfaces, it can be difficult to formulate queries that simultaneously combine temporal, spatial, and topical data filters. We investigate how coordinated visualizations can enhance search and exploration of information on the World Wide Web by easing the formulation of these types of queries. Drawing from visual information seeking and exploratory search, we introduce VisGets--interactive query visualizations of Web-based information that operate with online information within a Web browser. VisGets provide the information seeker with visual overviews of Web resources and offer a way to visually filter the data. Our goal is to facilitate the construction of dynamic search queries that combine filters from more than one data dimension. We present a prototype information exploration system featuring three linked VisGets (temporal, spatial, and topical), and used it to visually explore news items from online RSS feeds.
Contextual advertisement placement in printed media
NASA Astrophysics Data System (ADS)
Liu, Sam; Joshi, Parag
2010-02-01
Advertisements today provide the necessary revenue model supporting the WWW ecosystem. Targeted or contextual ad insertion plays an important role in optimizing the financial return of this model. Nearly all the current ads that appear on web sites are geared for display purposes such as banner and "pay-per-click". Little attention, however, is focused on deriving additional ad revenues when the content is repurposed for alternative mean of presentation, e.g. being printed. Although more and more content is moving to the Web, there are still many occasions where printed output of web content is desirable, such as maps and articles; thus printed ad insertion can potentially be lucrative. In this paper, we describe a contextual ad insertion network aimed to realize new revenue for print service providers for web printing. We introduce a cloud print service that enables contextual ads insertion, with respect to the main web page content, when a printout of the page is requested. To encourage service utilization, it would provide higher quality printouts than what is possible from current browser print drivers, which generally produce poor outputs, e.g. ill formatted pages. At this juncture we will limit the scope to only article-related web pages although the concept can be extended to arbitrary web pages. The key components of this system include (1) the extraction of article from web pages, (2) the extraction of semantics from article, (3) querying the ad database for matching advertisement or coupon, and (4) joint content and ad layout for print outputs.
NASA Technical Reports Server (NTRS)
Ashish, Naveen
2005-01-01
We provide an overview of several ongoing NASA endeavors based on concepts, systems, and technology from the Semantic Web arena. Indeed NASA has been one of the early adopters of Semantic Web Technology and we describe ongoing and completed R&D efforts for several applications ranging from collaborative systems to airspace information management to enterprise search to scientific information gathering and discovery systems at NASA.
F-OWL: An Inference Engine for Semantic Web
NASA Technical Reports Server (NTRS)
Zou, Youyong; Finin, Tim; Chen, Harry
2004-01-01
Understanding and using the data and knowledge encoded in semantic web documents requires an inference engine. F-OWL is an inference engine for the semantic web language OWL language based on F-logic, an approach to defining frame-based systems in logic. F-OWL is implemented using XSB and Flora-2 and takes full advantage of their features. We describe how F-OWL computes ontology entailment and compare it with other description logic based approaches. We also describe TAGA, a trading agent environment that we have used as a test bed for F-OWL and to explore how multiagent systems can use semantic web concepts and technology.
NASA Astrophysics Data System (ADS)
Colomo-Palacios, Ricardo; Jiménez-López, Diego; García-Crespo, Ángel; Blanco-Iglesias, Borja
eLearning educative processes are a challenge for educative institutions and education professionals. In an environment in which learning resources are being produced, catalogued and stored using innovative ways, SOLE provides a platform in which exam questions can be produced supported by Web 2.0 tools, catalogued and labeled via semantic web and stored and distributed using eLearning standards. This paper presents, SOLE, a social network of exam questions sharing particularized for Software Engineering domain, based on semantics and built using semantic web and eLearning standards, such as IMS Question and Test Interoperability specification 2.1.
Experimenting with semantic web services to understand the role of NLP technologies in healthcare.
Jagannathan, V
2006-01-01
NLP technologies can play a significant role in healthcare where a predominant segment of the clinical documentation is in text form. In a graduate course focused on understanding semantic web services at West Virginia University, a class project was designed with the purpose of exploring potential use for NLP-based abstraction of clinical documentation. The role of NLP-technology was simulated using human abstractors and various workflows were investigated using public domain workflow and semantic web service technologies. This poster explores the potential use of NLP and the role of workflow and semantic web technologies in developing healthcare IT environments.
Introduction to geospatial semantics and technology workshop handbook
Varanka, Dalia E.
2012-01-01
The workshop is a tutorial on introductory geospatial semantics with hands-on exercises using standard Web browsers. The workshop is divided into two sections, general semantics on the Web and specific examples of geospatial semantics using data from The National Map of the U.S. Geological Survey and the Open Ontology Repository. The general semantics section includes information and access to publicly available semantic archives. The specific session includes information on geospatial semantics with access to semantically enhanced data for hydrography, transportation, boundaries, and names. The Open Ontology Repository offers open-source ontologies for public use.
The BCube Crawler: Web Scale Data and Service Discovery for EarthCube.
NASA Astrophysics Data System (ADS)
Lopez, L. A.; Khalsa, S. J. S.; Duerr, R.; Tayachow, A.; Mingo, E.
2014-12-01
Web-crawling, a core component of the NSF-funded BCube project, is researching and applying the use of big data technologies to find and characterize different types of web services, catalog interfaces, and data feeds such as the ESIP OpenSearch, OGC W*S, THREDDS, and OAI-PMH that describe or provide access to scientific datasets. Given the scale of the Internet, which challenges even large search providers such as Google, the BCube plan for discovering these web accessible services is to subdivide the problem into three smaller, more tractable issues. The first, to be able to discover likely sites where relevant data and data services might be found, the second, to be able to deeply crawl the sites discovered to find any data and services which might be present. Lastly, to leverage the use of semantic technologies to characterize the services and data found, and to filter out everything but those relevant to the geosciences. To address the first two challenges BCube uses an adapted version of Apache Nutch (which originated Hadoop), a web scale crawler, and Amazon's ElasticMapReduce service for flexibility and cost effectiveness. For characterization of the services found, BCube is examining existing web service ontologies for their applicability to our needs and will re-use and/or extend these in order to query for services with specific well-defined characteristics in scientific datasets such as the use of geospatial namespaces. The original proposal for the crawler won a grant from Amazon's academic program, which allowed us to become operational; we successfully tested the Bcube Crawler at web scale obtaining a significant corpus, sizeable enough to enable work on characterization of the services and data found. There is still plenty of work to be done, doing "smart crawls" by managing the frontier, developing and enhancing our scoring algorithms and fully implementing the semantic characterization technologies. We describe the current status of the project, our successes and issues encountered. The final goal of the BCube crawler project is to provide relevant data services to other projects on the EarthCube stack and third party partners so they can be brokered and used by a wider scientific community.
2011-01-01
Background The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. Description SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. Conclusions SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies. PMID:22024447
Wilkinson, Mark D; Vandervalk, Benjamin; McCarthy, Luke
2011-10-24
The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.
Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction
Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong
2010-01-01
Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696
AQBE — QBE Style Queries for Archetyped Data
NASA Astrophysics Data System (ADS)
Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash
Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.
A user-centred evaluation framework for the Sealife semantic web browsers
Oliver, Helen; Diallo, Gayo; de Quincey, Ed; Alexopoulou, Dimitra; Habermann, Bianca; Kostkova, Patty; Schroeder, Michael; Jupp, Simon; Khelif, Khaled; Stevens, Robert; Jawaheer, Gawesh; Madle, Gemma
2009-01-01
Background Semantically-enriched browsing has enhanced the browsing experience by providing contextualised dynamically generated Web content, and quicker access to searched-for information. However, adoption of Semantic Web technologies is limited and user perception from the non-IT domain sceptical. Furthermore, little attention has been given to evaluating semantic browsers with real users to demonstrate the enhancements and obtain valuable feedback. The Sealife project investigates semantic browsing and its application to the life science domain. Sealife's main objective is to develop the notion of context-based information integration by extending three existing Semantic Web browsers (SWBs) to link the existing Web to the eScience infrastructure. Methods This paper describes a user-centred evaluation framework that was developed to evaluate the Sealife SWBs that elicited feedback on users' perceptions on ease of use and information findability. Three sources of data: i) web server logs; ii) user questionnaires; and iii) semi-structured interviews were analysed and comparisons made between each browser and a control system. Results It was found that the evaluation framework used successfully elicited users' perceptions of the three distinct SWBs. The results indicate that the browser with the most mature and polished interface was rated higher for usability, and semantic links were used by the users of all three browsers. Conclusion Confirmation or contradiction of our original hypotheses with relation to SWBs is detailed along with observations of implementation issues. PMID:19796398
A user-centred evaluation framework for the Sealife semantic web browsers.
Oliver, Helen; Diallo, Gayo; de Quincey, Ed; Alexopoulou, Dimitra; Habermann, Bianca; Kostkova, Patty; Schroeder, Michael; Jupp, Simon; Khelif, Khaled; Stevens, Robert; Jawaheer, Gawesh; Madle, Gemma
2009-10-01
Semantically-enriched browsing has enhanced the browsing experience by providing contextualized dynamically generated Web content, and quicker access to searched-for information. However, adoption of Semantic Web technologies is limited and user perception from the non-IT domain sceptical. Furthermore, little attention has been given to evaluating semantic browsers with real users to demonstrate the enhancements and obtain valuable feedback. The Sealife project investigates semantic browsing and its application to the life science domain. Sealife's main objective is to develop the notion of context-based information integration by extending three existing Semantic Web browsers (SWBs) to link the existing Web to the eScience infrastructure. This paper describes a user-centred evaluation framework that was developed to evaluate the Sealife SWBs that elicited feedback on users' perceptions on ease of use and information findability. Three sources of data: i) web server logs; ii) user questionnaires; and iii) semi-structured interviews were analysed and comparisons made between each browser and a control system. It was found that the evaluation framework used successfully elicited users' perceptions of the three distinct SWBs. The results indicate that the browser with the most mature and polished interface was rated higher for usability, and semantic links were used by the users of all three browsers. Confirmation or contradiction of our original hypotheses with relation to SWBs is detailed along with observations of implementation issues.
Semantic Web Compatible Names and Descriptions for Organisms
NASA Astrophysics Data System (ADS)
Wang, H.; Wilson, N.; McGuinness, D. L.
2012-12-01
Modern scientific names are critical for understanding the biological literature and provide a valuable way to understand evolutionary relationships. To validly publish a name, a description is required to separate the described group of organisms from those described by other names at the same level of the taxonomic hierarchy. The frequent revision of descriptions due to new evolutionary evidence has lead to situations where a single given scientific name may over time have multiple descriptions associated with it and a given published description may apply to multiple scientific names. Because of these many-to-many relationships between scientific names and descriptions, the usage of scientific names as a proxy for descriptions is inevitably ambiguous. Another issue lies in the fact that the precise application of scientific names often requires careful microscopic work, or increasingly, genetic sequencing, as scientific names are focused on the evolutionary relatedness between and within named groups such as species, genera, families, etc. This is problematic to many audiences, especially field biologists, who often do not have access to the instruments and tools required to make identifications on a microscopic or genetic basis. To better connect scientific names to descriptions and find a more convenient way to support computer assisted identification, we proposed the Semantic Vernacular System, a novel naming system that creates named, machine-interpretable descriptions for groups of organisms, and is compatible with the Semantic Web. Unlike the evolutionary relationship based scientific naming system, it emphasizes the observable features of organisms. By independently naming the descriptions composed of sets of observational features, as well as maintaining connections to scientific names, it preserves the observational data used to identify organisms. The system is designed to support a peer-review mechanism for creating new names, and uses a controlled vocabulary encoded in the Web Ontology Language to represent the observational features. A prototype of the system is currently under development in collaboration with the Mushroom Observer website. It allows users to propose new names and descriptions for fungi, provide feedback on those proposals, and ultimately have them formally approved. It relies on SPARQL queries and semantic reasoning for data management. This effort will offer the mycology community a knowledge base of fungal observational features and a tool for identifying fungal observations. It will also serve as an operational specification of how the Semantic Vernacular System can be used in practice in one scientific community (in this case mycology).
NASA Astrophysics Data System (ADS)
Hornung, Thomas; Simon, Kai; Lausen, Georg
Combining information from different Web sources often results in a tedious and repetitive process, e.g. even simple information requests might require to iterate over a result list of one Web query and use each single result as input for a subsequent query. One approach for this chained queries are data-centric mashups, which allow to visually model the data flow as a graph, where the nodes represent the data source and the edges the data flow.
E-Government Goes Semantic Web: How Administrations Can Transform Their Information Processes
NASA Astrophysics Data System (ADS)
Klischewski, Ralf; Ukena, Stefan
E-government applications and services are built mainly on access to, retrieval of, integration of, and delivery of relevant information to citizens, businesses, and administrative users. In order to perform such information processing automatically through the Semantic Web,1 machine-readable2 enhancements of web resources are needed, based on the understanding of the content and context of the information in focus. While these enhancements are far from trivial to produce, administrations in their role of information and service providers so far find little guidance on how to migrate their web resources and enable a new quality of information processing; even research is still seeking best practices. Therefore, the underlying research question of this chapter is: what are the appropriate approaches which guide administrations in transforming their information processes toward the Semantic Web? In search for answers, this chapter analyzes the challenges and possible solutions from the perspective of administrations: (a) the reconstruction of the information processing in the e-government in terms of how semantic technologies must be employed to support information provision and consumption through the Semantic Web; (b) the required contribution to the transformation is compared to the capabilities and expectations of administrations; and (c) available experience with the steps of transformation are reviewed and discussed as to what extent they can be expected to successfully drive the e-government to the Semantic Web. This research builds on studying the case of Schleswig-Holstein, Germany, where semantic technologies have been used within the frame of the Access-eGov3 project in order to semantically enhance electronic service interfaces with the aim of providing a new way of accessing and combining e-government services.
Social Semantics for an Effective Enterprise
NASA Technical Reports Server (NTRS)
Berndt, Sarah; Doane, Mike
2012-01-01
An evolution of the Semantic Web, the Social Semantic Web (s2w), facilitates knowledge sharing with "useful information based on human contributions, which gets better as more people participate." The s2w reaches beyond the search box to move us from a collection of hyperlinked facts, to meaningful, real time context. When focused through the lens of Enterprise Search, the Social Semantic Web facilitates the fluid transition of meaningful business information from the source to the user. It is the confluence of human thought and computer processing structured with the iterative application of taxonomies, folksonomies, ontologies, and metadata schemas. The importance and nuances of human interaction are often deemphasized when focusing on automatic generation of semantic markup, which results in dissatisfied users and unrealized return on investment. Users consistently qualify the value of information sets through the act of selection, making them the de facto stakeholders of the Social Semantic Web. Employers are the ultimate beneficiaries of s2w utilization with a better informed, more decisive workforce; one not achieved with an IT miracle technology, but by improved human-computer interactions. Johnson Space Center Taxonomist Sarah Berndt and Mike Doane, principal owner of Term Management, LLC discuss the planning, development, and maintenance stages for components of a semantic system while emphasizing the necessity of a Social Semantic Web for the Enterprise. Identification of risks and variables associated with layering the successful implementation of a semantic system are also modeled.
A semantic medical multimedia retrieval approach using ontology information hiding.
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches.
A Semantic Web-based System for Managing Clinical Archetypes.
Fernandez-Breis, Jesualdo Tomas; Menarguez-Tortosa, Marcos; Martinez-Costa, Catalina; Fernandez-Breis, Eneko; Herrero-Sempere, Jose; Moner, David; Sanchez, Jesus; Valencia-Garcia, Rafael; Robles, Montserrat
2008-01-01
Archetypes facilitate the sharing of clinical knowledge and therefore are a basic tool for achieving interoperability between healthcare information systems. In this paper, a Semantic Web System for Managing Archetypes is presented. This system allows for the semantic annotation of archetypes, as well for performing semantic searches. The current system is capable of working with both ISO13606 and OpenEHR archetypes.
Informatics in radiology: radiology gamuts ontology: differential diagnosis for the Semantic Web.
Budovec, Joseph J; Lam, Cesar A; Kahn, Charles E
2014-01-01
The Semantic Web is an effort to add semantics, or "meaning," to empower automated searching and processing of Web-based information. The overarching goal of the Semantic Web is to enable users to more easily find, share, and combine information. Critical to this vision are knowledge models called ontologies, which define a set of concepts and formalize the relations between them. Ontologies have been developed to manage and exploit the large and rapidly growing volume of information in biomedical domains. In diagnostic radiology, lists of differential diagnoses of imaging observations, called gamuts, provide an important source of knowledge. The Radiology Gamuts Ontology (RGO) is a formal knowledge model of differential diagnoses in radiology that includes 1674 differential diagnoses, 19,017 terms, and 52,976 links between terms. Its knowledge is used to provide an interactive, freely available online reference of radiology gamuts ( www.gamuts.net ). A Web service allows its content to be discovered and consumed by other information systems. The RGO integrates radiologic knowledge with other biomedical ontologies as part of the Semantic Web. © RSNA, 2014.
Bat-Inspired Algorithm Based Query Expansion for Medical Web Information Retrieval.
Khennak, Ilyes; Drias, Habiba
2017-02-01
With the increasing amount of medical data available on the Web, looking for health information has become one of the most widely searched topics on the Internet. Patients and people of several backgrounds are now using Web search engines to acquire medical information, including information about a specific disease, medical treatment or professional advice. Nonetheless, due to a lack of medical knowledge, many laypeople have difficulties in forming appropriate queries to articulate their inquiries, which deem their search queries to be imprecise due the use of unclear keywords. The use of these ambiguous and vague queries to describe the patients' needs has resulted in a failure of Web search engines to retrieve accurate and relevant information. One of the most natural and promising method to overcome this drawback is Query Expansion. In this paper, an original approach based on Bat Algorithm is proposed to improve the retrieval effectiveness of query expansion in medical field. In contrast to the existing literature, the proposed approach uses Bat Algorithm to find the best expanded query among a set of expanded query candidates, while maintaining low computational complexity. Moreover, this new approach allows the determination of the length of the expanded query empirically. Numerical results on MEDLINE, the on-line medical information database, show that the proposed approach is more effective and efficient compared to the baseline.
ERIC Educational Resources Information Center
Ohler, Jason
2008-01-01
The semantic web or Web 3.0 makes information more meaningful to people by making it more understandable to machines. In this article, the author examines the implications of Web 3.0 for education. The author considers three areas of impact: knowledge construction, personal learning network maintenance, and personal educational administration.…
Moving Controlled Vocabularies into the Semantic Web
NASA Astrophysics Data System (ADS)
Thomas, R.; Lowry, R. K.; Kokkinaki, A.
2015-12-01
One of the issues with legacy oceanographic data formats is that the only tool available for describing what a measurement is and how it was made is a single metadata tag known as the parameter code. The British Oceanographic Data Centre (BODC) has been supporting the international oceanographic community gain maximum benefit from this through a controlled vocabulary known as the BODC Parameter Usage Vocabulary (PUV). Over time this has grown to over 34,000 entries some of which have preferred labels with over 400 bytes of descriptive information detailing what was measured and how. A decade ago the BODC pioneered making this information available in a more useful form with the implementation of a prototype vocabulary server (NVS) that referenced each 'parameter code' as a URL. This developed into the current server (NVS V2) in which the parameter URL resolves into an RDF document based on the SKOS data model which includes a list of resource URLs mapped to the 'parameter'. For example the parameter code for a contaminant in biota, such as 'cadmium in Mytilus edulis', carries RDF triples leading to the entry for Mytilus edulis in the WoRMS and for cadmium in the ChEBI ontologies. By providing links into these external ontologies the information captured in a 1980s parameter code now conforms to the Linked Data paradigm of the Semantic Web, vastly increasing the descriptive information accessible to a user. This presentation will describe the next steps along the road to the Semantic Web with the development of a SPARQL end point1 to expose the PUV plus the 190 other controlled vocabularies held in NVS. Whilst this is ideal for those fluent in SPARQL, most users require something a little more user-friendly and so the NVS browser2 was developed over the end point to allow less technical users to query the vocabularies and navigate the NVS ontology. This tool integrates into an editor that allows vocabulary content to be manipulated by authorised users outside BODC. Having placed Linked Data tooling over a single SPARQL end point the obvious future development for this system is to support semantic interoperability outside NVS by the incorporation of federated SPARQL end points in the USA and Australia during the ODIP II project. 1https://vocab.nerc.ac.uk/sparql 2 https://www.bodc.ac.uk/data/codes_and_formats/vocabulary_search/
Martínez-Costa, Catalina; Cornet, Ronald; Karlsson, Daniel; Schulz, Stefan; Kalra, Dipak
2015-05-01
To improve semantic interoperability of electronic health records (EHRs) by ontology-based mediation across syntactically heterogeneous representations of the same or similar clinical information. Our approach is based on a semantic layer that consists of: (1) a set of ontologies supported by (2) a set of semantic patterns. The first aspect of the semantic layer helps standardize the clinical information modeling task and the second shields modelers from the complexity of ontology modeling. We applied this approach to heterogeneous representations of an excerpt of a heart failure summary. Using a set of finite top-level patterns to derive semantic patterns, we demonstrate that those patterns, or compositions thereof, can be used to represent information from clinical models. Homogeneous querying of the same or similar information, when represented according to heterogeneous clinical models, is feasible. Our approach focuses on the meaning embedded in EHRs, regardless of their structure. This complex task requires a clear ontological commitment (ie, agreement to consistently use the shared vocabulary within some context), together with formalization rules. These requirements are supported by semantic patterns. Other potential uses of this approach, such as clinical models validation, require further investigation. We show how an ontology-based representation of a clinical summary, guided by semantic patterns, allows homogeneous querying of heterogeneous information structures. Whether there are a finite number of top-level patterns is an open question. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Not Fade Away? Commentary to Paper "Education and The Semantic Web" ("IJAIED" Vol.14, 2004)
ERIC Educational Resources Information Center
Devedzic, Vladan
2016-01-01
If you ask me "Will Semantic Web 'ever' happen, in general, and specifically in education?", the best answer I can give you is "I don't know," but I know that today we are still far away from the hopes that I had when I wrote my paper "Education and The Semantic Web" (Devedzic 2004) more than 10 years ago. Much of the…
Privacy Preservation in Context-Aware Systems
2011-01-01
Policies and the Semantic Web The Semantic Web refers to both a vision and a set of technologies. The vision was first articulated by Tim Berners - Lee ... Berners - lee 2005) is a distributed framework for describing and reasoning over policies in the Semantic Web. It supports N3 rules ( Berners - Lee ...Connolly 2008), ( Berners - Lee et al. 2005) for representing intercon- nections between policies and resources and uses the CWM forward-chaining reasoning
From Science to e-Science to Semantic e-Science: A Heliosphysics Case Study
NASA Technical Reports Server (NTRS)
Narock, Thomas; Fox, Peter
2011-01-01
The past few years have witnessed unparalleled efforts to make scientific data web accessible. The Semantic Web has proven invaluable in this effort; however, much of the literature is devoted to system design, ontology creation, and trials and tribulations of current technologies. In order to fully develop the nascent field of Semantic e-Science we must also evaluate systems in real-world settings. We describe a case study within the field of Heliophysics and provide a comparison of the evolutionary stages of data discovery, from manual to semantically enable. We describe the socio-technical implications of moving toward automated and intelligent data discovery. In doing so, we highlight how this process enhances what is currently being done manually in various scientific disciplines. Our case study illustrates that Semantic e-Science is more than just semantic search. The integration of search with web services, relational databases, and other cyberinfrastructure is a central tenet of our case study and one that we believe has applicability as a generalized research area within Semantic e-Science. This case study illustrates a specific example of the benefits, and limitations, of semantically replicating data discovery. We show examples of significant reductions in time and effort enable by Semantic e-Science; yet, we argue that a "complete" solution requires integrating semantic search with other research areas such as data provenance and web services.
Semantic e-Learning: Next Generation of e-Learning?
NASA Astrophysics Data System (ADS)
Konstantinos, Markellos; Penelope, Markellou; Giannis, Koutsonikos; Aglaia, Liopa-Tsakalidi
Semantic e-learning aspires to be the next generation of e-learning, since the understanding of learning materials and knowledge semantics allows their advanced representation, manipulation, sharing, exchange and reuse and ultimately promote efficient online experiences for users. In this context, the paper firstly explores some fundamental Semantic Web technologies and then discusses current and potential applications of these technologies in e-learning domain, namely, Semantic portals, Semantic search, personalization, recommendation systems, social software and Web 2.0 tools. Finally, it highlights future research directions and open issues of the field.
The semantic web in translational medicine: current applications and future directions
Machado, Catia M.; Rebholz-Schuhmann, Dietrich; Freitas, Ana T.; Couto, Francisco M.
2015-01-01
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice. PMID:24197933
The semantic web in translational medicine: current applications and future directions.
Machado, Catia M; Rebholz-Schuhmann, Dietrich; Freitas, Ana T; Couto, Francisco M
2015-01-01
Semantic web technologies offer an approach to data integration and sharing, even for resources developed independently or broadly distributed across the web. This approach is particularly suitable for scientific domains that profit from large amounts of data that reside in the public domain and that have to be exploited in combination. Translational medicine is such a domain, which in addition has to integrate private data from the clinical domain with proprietary data from the pharmaceutical domain. In this survey, we present the results of our analysis of translational medicine solutions that follow a semantic web approach. We assessed these solutions in terms of their target medical use case; the resources covered to achieve their objectives; and their use of existing semantic web resources for the purposes of data sharing, data interoperability and knowledge discovery. The semantic web technologies seem to fulfill their role in facilitating the integration and exploration of data from disparate sources, but it is also clear that simply using them is not enough. It is fundamental to reuse resources, to define mappings between resources, to share data and knowledge. All these aspects allow the instantiation of translational medicine at the semantic web-scale, thus resulting in a network of solutions that can share resources for a faster transfer of new scientific results into the clinical practice. The envisioned network of translational medicine solutions is on its way, but it still requires resolving the challenges of sharing protected data and of integrating semantic-driven technologies into the clinical practice. © The Author 2013. Published by Oxford University Press.
A Framework for Building and Reasoning with Adaptive and Interoperable PMESII Models
2007-11-01
Description Logic SOA Service Oriented Architecture SPARQL Simple Protocol And RDF Query Language SQL Standard Query Language SROM Stability and...another by providing a more expressive ontological structure for one of the models, e.g., semantic networks can be mapped to first- order logical...Pellet is an open-source reasoner that works with OWL-DL. It accepts the SPARQL protocol and RDF query language ( SPARQL ) and provides a Java API to
Analysis of Technique to Extract Data from the Web for Improved Performance
NASA Astrophysics Data System (ADS)
Gupta, Neena; Singh, Manish
2010-11-01
The World Wide Web rapidly guides the world into a newly amazing electronic world, where everyone can publish anything in electronic form and extract almost all the information. Extraction of information from semi structured or unstructured documents, such as web pages, is a useful yet complex task. Data extraction, which is important for many applications, extracts the records from the HTML files automatically. Ontologies can achieve a high degree of accuracy in data extraction. We analyze method for data extraction OBDE (Ontology-Based Data Extraction), which automatically extracts the query result records from the web with the help of agents. OBDE first constructs an ontology for a domain according to information matching between the query interfaces and query result pages from different web sites within the same domain. Then, the constructed domain ontology is used during data extraction to identify the query result section in a query result page and to align and label the data values in the extracted records. The ontology-assisted data extraction method is fully automatic and overcomes many of the deficiencies of current automatic data extraction methods.
Semantic Web-based digital, field and virtual geological
NASA Astrophysics Data System (ADS)
Babaie, H. A.
2012-12-01
Digital, field and virtual Semantic Web-based education (SWBE) of geological mapping requires the construction of a set of searchable, reusable, and interoperable digital learning objects (LO) for learners, teachers, and authors. These self-contained units of learning may be text, image, or audio, describing, for example, how to calculate the true dip of a layer from two structural contours or find the apparent dip along a line of section. A collection of multi-media LOs can be integrated, through domain and task ontologies, with mapping-related learning activities and Web services, for example, to search for the description of lithostratigraphic units in an area, or plotting orientation data on stereonet. Domain ontologies (e.g., GeologicStructure, Lithostratigraphy, Rock) represent knowledge in formal languages (RDF, OWL) by explicitly specifying concepts, relations, and theories involved in geological mapping. These ontologies are used by task ontologies that formalize the semantics of computational tasks (e.g., measuring the true thickness of a formation) and activities (e.g., construction of cross section) for all actors to solve specific problems (making map, instruction, learning support, authoring). A SWBE system for geological mapping should also involve ontologies to formalize teaching strategy (pedagogical styles), learner model (e.g., for student performance, personalization of learning), interface (entry points for activities of all actors), communication (exchange of messages among different components and actors), and educational Web services (for interoperability). In this ontology-based environment, actors interact with the LOs through educational servers, that manage (reuse, edit, delete, store) ontologies, and through tools which communicate with Web services to collect resources and links to other tools. Digital geological mapping involves a location-based, spatial organization of geological elements in a set of GIS thematic layers. Each layer in the stack assembles a set of polygonal (e.g., formation, member, intrusion), linear (e.g., fault, contact), and/or point (e.g., sample or measurement site) geological elements. These feature classes, represented in domain ontologies by classes, have their own sets of property (attribute, association relation) and topological (e.g., overlap, adjacency, containment), and network (cross-cuttings; connectivity) relationships. Since geological mapping involves describing and depicting different aspects of each feature class (e.g., contact, formation, structure), the same geographic region may be investigated by different communities, for example, for its stratigraphy, rock type, structure, soil type, and isotopic and paleontological age, using sets of ontologies. These data can become interconnected applying the Semantic Web technologies, on the Linked Open Data Cloud, based on their underlying common geographic coordinates. Sets of geological data published on the Cloud will include multiple RDF links to Cloud's geospatial nodes such as GeoNames and Linked GeoData. During mapping, a device such as smartphone, laptop, or iPad, with GPS and GIS capability and a DBpedia Mobile client, can use the current position to discover and query all the geological linked data, and add new data to the thematic layers and publish them to the Cloud.
The ChEMBL database as linked open data
2013-01-01
Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis. Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying. Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support. PMID:23657106
Web service discovery among large service pools utilising semantic similarity and clustering
NASA Astrophysics Data System (ADS)
Chen, Fuzan; Li, Minqiang; Wu, Harris; Xie, Lingli
2017-03-01
With the rapid development of electronic business, Web services have attracted much attention in recent years. Enterprises can combine individual Web services to provide new value-added services. An emerging challenge is the timely discovery of close matches to service requests among large service pools. In this study, we first define a new semantic similarity measure combining functional similarity and process similarity. We then present a service discovery mechanism that utilises the new semantic similarity measure for service matching. All the published Web services are pre-grouped into functional clusters prior to the matching process. For a user's service request, the discovery mechanism first identifies matching services clusters and then identifies the best matching Web services within these matching clusters. Experimental results show that the proposed semantic discovery mechanism performs better than a conventional lexical similarity-based mechanism.
Computational toxicology using the OpenTox application programming interface and Bioclipse
2011-01-01
Background Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. Findings This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. Conclusions A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers. PMID:22075173
The Analysis of RDF Semantic Data Storage Optimization in Large Data Era
NASA Astrophysics Data System (ADS)
He, Dandan; Wang, Lijuan; Wang, Can
2018-03-01
With the continuous development of information technology and network technology in China, the Internet has also ushered in the era of large data. In order to obtain the effective acquisition of information in the era of large data, it is necessary to optimize the existing RDF semantic data storage and realize the effective query of various data. This paper discusses the storage optimization of RDF semantic data under large data.
The neural and computational bases of semantic cognition.
Ralph, Matthew A Lambon; Jefferies, Elizabeth; Patterson, Karalyn; Rogers, Timothy T
2017-01-01
Semantic cognition refers to our ability to use, manipulate and generalize knowledge that is acquired over the lifespan to support innumerable verbal and non-verbal behaviours. This Review summarizes key findings and issues arising from a decade of research into the neurocognitive and neurocomputational underpinnings of this ability, leading to a new framework that we term controlled semantic cognition (CSC). CSC offers solutions to long-standing queries in philosophy and cognitive science, and yields a convergent framework for understanding the neural and computational bases of healthy semantic cognition and its dysfunction in brain disorders.
Exploiting salient semantic analysis for information retrieval
NASA Astrophysics Data System (ADS)
Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui
2016-11-01
Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.
Semantic-Web Technology: Applications at NASA
NASA Technical Reports Server (NTRS)
Ashish, Naveen
2004-01-01
We provide a description of work at the National Aeronautics and Space Administration (NASA) on building system based on semantic-web concepts and technologies. NASA has been one of the early adopters of semantic-web technologies for practical applications. Indeed there are several ongoing 0 endeavors on building semantics based systems for use in diverse NASA domains ranging from collaborative scientific activity to accident and mishap investigation to enterprise search to scientific information gathering and integration to aviation safety decision support We provide a brief overview of many applications and ongoing work with the goal of informing the external community of these NASA endeavors.
Workspaces in the Semantic Web
NASA Technical Reports Server (NTRS)
Wolfe, Shawn R.; Keller, RIchard M.
2005-01-01
Due to the recency and relatively limited adoption of Semantic Web technologies. practical issues related to technology scaling have received less attention than foundational issues. Nonetheless, these issues must be addressed if the Semantic Web is to realize its full potential. In particular, we concentrate on the lack of scoping methods that reduce the size of semantic information spaces so they are more efficient to work with and more relevant to an agent's needs. We provide some intuition to motivate the need for such reduced information spaces, called workspaces, give a formal definition, and suggest possible methods of deriving them.
Legaz-García, María del Carmen; Martínez-Costa, Catalina; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás
2012-01-01
Linking Electronic Healthcare Records (EHR) content to educational materials has been considered a key international recommendation to enable clinical engagement and to promote patient safety. This would suggest citizens to access reliable information available on the web and to guide them properly. In this paper, we describe an approach in that direction, based on the use of dual model EHR standards and standardized educational contents. The recommendation method will be based on the semantic coverage of the learning content repository for a particular archetype, which will be calculated by applying semantic web technologies like ontologies and semantic annotations.
Chan, Emily H; Sahai, Vikram; Conrad, Corrie; Brownstein, John S
2011-05-01
A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.
De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning
2015-08-01
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, …) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content. Copyright © 2015 Elsevier Inc. All rights reserved.
Versioning System for Distributed Ontology Development
2016-02-02
Semantic Web community. For example, the distributed and isolated development requirement may apply to non‐cyber range communities of public ontology... semantic web .” However, we observe that the maintenance of an ontology and its reuse is not a high priority for the majority of the publicly available... Semantic ) Web . AAAI Spring Symposium: Symbiotic Relationships between Semantic Web and Knowledge Engineering. 2008. [LHK09] Matthias Loskyll
The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.
Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Pocock, Matthew; Flanagan, Keith; Hallinan, Jennifer; Wipat, Anil
2016-06-17
Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.
Multidatabase Query Processing with Uncertainty in Global Keys and Attribute Values.
ERIC Educational Resources Information Center
Scheuermann, Peter; Li, Wen-Syan; Clifton, Chris
1998-01-01
Presents an approach for dynamic database integration and query processing in the absence of information about attribute correspondences and global IDs. Defines different types of equivalence conditions for the construction of global IDs. Proposes a strategy based on ranked role-sets that makes use of an automated semantic integration procedure…
SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory
NASA Astrophysics Data System (ADS)
Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.
2002-12-01
We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a grant from the NASA AISR program.
A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms
Roberts, Kirk; Patra, Braja Gopal
2017-01-01
This paper presents a method for converting natural language questions about structured data in the electronic health record (EHR) into logical forms. The logical forms can then subsequently be converted to EHR-dependent structured queries. The natural language processing task, known as semantic parsing, has the potential to convert questions to logical forms with extremely high precision, resulting in a system that is usable and trusted by clinicians for real-time use in clinical settings. We propose a hybrid semantic parsing method, combining rule-based methods with a machine learning-based classifier. The overall semantic parsing precision on a set of 212 questions is 95.6%. The parser’s rules furthermore allow it to “know what it does not know”, enabling the system to indicate when unknown terms prevent it from understanding the question’s full logical structure. When combined with a module for converting a logical form into an EHR-dependent query, this high-precision approach allows for a question answering system to provide a user with a single, verifiably correct answer. PMID:29854217
A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs
Ennis, Andrew; Nugent, Chris; Morrow, Philip; Chen, Liming; Ioannidis, George; Stan, Alexandru; Rachev, Preslav
2015-01-01
With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for. PMID:26205265
Query Classification and Study of University Students' Search Trends
ERIC Educational Resources Information Center
Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.
2012-01-01
Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
NASA Astrophysics Data System (ADS)
Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.
Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
A Semantic Medical Multimedia Retrieval Approach Using Ontology Information Hiding
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches. PMID:24082915
TopFed: TCGA tailored federated query processing and linking to LOD.
Saleem, Muhammad; Padmanabhuni, Shanmukha S; Ngomo, Axel-Cyrille Ngonga; Iqbal, Aftab; Almeida, Jonas S; Decker, Stefan; Deus, Helena F
2014-01-01
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.
Ontology Alignment Architecture for Semantic Sensor Web Integration
Fernandez, Susel; Marsa-Maestre, Ivan; Velasco, Juan R.; Alarcos, Bernardo
2013-01-01
Sensor networks are a concept that has become very popular in data acquisition and processing for multiple applications in different fields such as industrial, medicine, home automation, environmental detection, etc. Today, with the proliferation of small communication devices with sensors that collect environmental data, semantic Web technologies are becoming closely related with sensor networks. The linking of elements from Semantic Web technologies with sensor networks has been called Semantic Sensor Web and has among its main features the use of ontologies. One of the key challenges of using ontologies in sensor networks is to provide mechanisms to integrate and exchange knowledge from heterogeneous sources (that is, dealing with semantic heterogeneity). Ontology alignment is the process of bringing ontologies into mutual agreement by the automatic discovery of mappings between related concepts. This paper presents a system for ontology alignment in the Semantic Sensor Web which uses fuzzy logic techniques to combine similarity measures between entities of different ontologies. The proposed approach focuses on two key elements: the terminological similarity, which takes into account the linguistic and semantic information of the context of the entity's names, and the structural similarity, based on both the internal and relational structure of the concepts. This work has been validated using sensor network ontologies and the Ontology Alignment Evaluation Initiative (OAEI) tests. The results show that the proposed techniques outperform previous approaches in terms of precision and recall. PMID:24051523
Ontology alignment architecture for semantic sensor Web integration.
Fernandez, Susel; Marsa-Maestre, Ivan; Velasco, Juan R; Alarcos, Bernardo
2013-09-18
Sensor networks are a concept that has become very popular in data acquisition and processing for multiple applications in different fields such as industrial, medicine, home automation, environmental detection, etc. Today, with the proliferation of small communication devices with sensors that collect environmental data, semantic Web technologies are becoming closely related with sensor networks. The linking of elements from Semantic Web technologies with sensor networks has been called Semantic Sensor Web and has among its main features the use of ontologies. One of the key challenges of using ontologies in sensor networks is to provide mechanisms to integrate and exchange knowledge from heterogeneous sources (that is, dealing with semantic heterogeneity). Ontology alignment is the process of bringing ontologies into mutual agreement by the automatic discovery of mappings between related concepts. This paper presents a system for ontology alignment in the Semantic Sensor Web which uses fuzzy logic techniques to combine similarity measures between entities of different ontologies. The proposed approach focuses on two key elements: the terminological similarity, which takes into account the linguistic and semantic information of the context of the entity's names, and the structural similarity, based on both the internal and relational structure of the concepts. This work has been validated using sensor network ontologies and the Ontology Alignment Evaluation Initiative (OAEI) tests. The results show that the proposed techniques outperform previous approaches in terms of precision and recall.
Mapping between the OBO and OWL ontology languages.
Tirmizi, Syed Hamid; Aitken, Stuart; Moreira, Dilvan A; Mungall, Chris; Sequeda, Juan; Shah, Nigam H; Miranker, Daniel P
2011-03-07
Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL. We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source. Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.
Mapping between the OBO and OWL ontology languages
2011-01-01
Background Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL. Results We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source. Conclusions Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner. PMID:21388572
Semantic encoding of relational databases in wireless networks
NASA Astrophysics Data System (ADS)
Benjamin, David P.; Walker, Adrian
2005-03-01
Semantic Encoding is a new, patented technology that greatly increases the speed of transmission of distributed databases over networks, especially over ad hoc wireless networks, while providing a novel method of data security. It reduces bandwidth consumption and storage requirements, while speeding up query processing, encryption and computation of digital signatures. We describe the application of Semantic Encoding in a wireless setting and provide an example of its operation in which a compression of 290:1 would be achieved.
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.
Katayama, Toshiaki; Wilkinson, Mark D; Micklem, Gos; Kawashima, Shuichi; Yamaguchi, Atsuko; Nakao, Mitsuteru; Yamamoto, Yasunori; Okamoto, Shinobu; Oouchida, Kenta; Chun, Hong-Woo; Aerts, Jan; Afzal, Hammad; Antezana, Erick; Arakawa, Kazuharu; Aranda, Bruno; Belleau, Francois; Bolleman, Jerven; Bonnal, Raoul Jp; Chapman, Brad; Cock, Peter Ja; Eriksson, Tore; Gordon, Paul Mk; Goto, Naohisa; Hayashi, Kazuhiro; Horn, Heiko; Ishiwata, Ryosuke; Kaminuma, Eli; Kasprzyk, Arek; Kawaji, Hideya; Kido, Nobuhiro; Kim, Young Joo; Kinjo, Akira R; Konishi, Fumikazu; Kwon, Kyung-Hoon; Labarga, Alberto; Lamprecht, Anna-Lena; Lin, Yu; Lindenbaum, Pierre; McCarthy, Luke; Morita, Hideyuki; Murakami, Katsuhiko; Nagao, Koji; Nishida, Kozo; Nishimura, Kunihiro; Nishizawa, Tatsuya; Ogishima, Soichi; Ono, Keiichiro; Oshita, Kazuki; Park, Keun-Joon; Prins, Pjotr; Saito, Taro L; Samwald, Matthias; Satagopam, Venkata P; Shigemoto, Yasumasa; Smith, Richard; Splendiani, Andrea; Sugawara, Hideaki; Taylor, James; Vos, Rutger A; Withers, David; Yamasaki, Chisato; Zmasek, Christian M; Kawamoto, Shoko; Okubo, Kosaku; Asai, Kiyoshi; Takagi, Toshihisa
2013-02-11
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies
2013-01-01
Background BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. Results The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. Conclusion We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer. PMID:23398680
NASA Astrophysics Data System (ADS)
Paulraj, D.; Swamynathan, S.; Madhaiyan, M.
2012-11-01
Web Service composition has become indispensable as a single web service cannot satisfy complex functional requirements. Composition of services has received much interest to support business-to-business (B2B) or enterprise application integration. An important component of the service composition is the discovery of relevant services. In Semantic Web Services (SWS), service discovery is generally achieved by using service profile of Ontology Web Languages for Services (OWL-S). The profile of the service is a derived and concise description but not a functional part of the service. The information contained in the service profile is sufficient for atomic service discovery, but it is not sufficient for the discovery of composite semantic web services (CSWS). The purpose of this article is two-fold: first to prove that the process model is a better choice than the service profile for service discovery. Second, to facilitate the composition of inter-organisational CSWS by proposing a new composition method which uses process ontology. The proposed service composition approach uses an algorithm which performs a fine grained match at the level of atomic process rather than at the level of the entire service in a composite semantic web service. Many works carried out in this area have proposed solutions only for the composition of atomic services and this article proposes a solution for the composition of composite semantic web services.
Semantic eScience for Ecosystem Understanding and Monitoring: The Jefferson Project Case Study
NASA Astrophysics Data System (ADS)
McGuinness, D. L.; Pinheiro da Silva, P.; Patton, E. W.; Chastain, K.
2014-12-01
Monitoring and understanding ecosystems such as lakes and their watersheds is becoming increasingly important. Accelerated eutrophication threatens our drinking water sources. Many believe that the use of nutrients (e.g., road salts, fertilizers, etc.) near these sources may have negative impacts on animal and plant populations and water quality although it is unclear how to best balance broad community needs. The Jefferson Project is a joint effort between RPI, IBM and the Fund for Lake George aimed at creating an instrumented water ecosystem along with an appropriate cyberinfrastructure that can serve as a global model for ecosystem monitoring, exploration, understanding, and prediction. One goal is to help communities understand the potential impacts of actions such as road salting strategies so that they can make appropriate informed recommendations that serve broad community needs. Our semantic eScience team is creating a semantic infrastructure to support data integration and analysis to help trained scientists as well as the general public to better understand the lake today, and explore potential future scenarios. We are leveraging our RPI Tetherless World Semantic Web methodology that provides an agile process for describing use cases, identification of appropriate background ontologies and technologies, implementation, and evaluation. IBM is providing a state-of-the-art sensor network infrastructure along with a collection of tools to share, maintain, analyze and visualize the network data. In the context of this sensor infrastructure, we will discuss our semantic approach's contributions in three knowledge representation and reasoning areas: (a) human interventions on the deployment and maintenance of local sensor networks including the scientific knowledge to decide how and where sensors are deployed; (b) integration, interpretation and management of data coming from external sources used to complement the project's models; and (c) knowledge about simulation results including parameters, interpretation of results, and comparison of results against external data. We will also demonstrate some example queries highlighting the benefits of our semantic approach and will also identify reusable components.
Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval.
Wang, Yang; Lin, Xuemin; Wu, Lin; Zhang, Wenjie
2017-03-01
Given a query photo issued by a user (q-user), the landmark retrieval is to return a set of photos with their landmarks similar to those of the query, while the existing studies on the landmark retrieval focus on exploiting geometries of landmarks for similarity matches between candidate photos and a query photo. We observe that the same landmarks provided by different users over social media community may convey different geometry information depending on the viewpoints and/or angles, and may, subsequently, yield very different results. In fact, dealing with the landmarks with low quality shapes caused by the photography of q-users is often nontrivial and has seldom been studied. In this paper, we propose a novel framework, namely, multi-query expansions, to retrieve semantically robust landmarks by two steps. First, we identify the top- k photos regarding the latent topics of a query landmark to construct multi-query set so as to remedy its possible low quality shape. For this purpose, we significantly extend the techniques of Latent Dirichlet Allocation. Then, motivated by the typical collaborative filtering methods, we propose to learn a collaborative deep networks-based semantically, nonlinear, and high-level features over the latent factor for landmark photo as the training set, which is formed by matrix factorization over collaborative user-photo matrix regarding the multi-query set. The learned deep network is further applied to generate the features for all the other photos, meanwhile resulting into a compact multi-query set within such space. Then, the final ranking scores are calculated over the high-level feature space between the multi-query set and all other photos, which are ranked to serve as the final ranking list of landmark retrieval. Extensive experiments are conducted on real-world social media data with both landmark photos together with their user information to show the superior performance over the existing methods, especially our recently proposed multi-query based mid-level pattern representation method [1].
Software for Studying and Enhancing Educational Uses of Geospatial Semantics and Data
ERIC Educational Resources Information Center
Nodenot, Thierry; Sallaberry, Christian; Gaio, Mauro
2010-01-01
Geographically related queries form nearly one-fifth of all queries submitted to the Excite search engine and the most frequently occurring terms are names of places. This paper focuses on digital libraries and extends the basic services of existing library management systems to include new ones that are dedicated to geographic information…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czejdo, Bogdan; Bhattacharya, Sambit; Ferragut, Erik M
2012-01-01
This paper describes the syntax and semantics of multi-level state diagrams to support probabilistic behavior of cooperating robots. The techniques are presented to analyze these diagrams by querying combined robots behaviors. It is shown how to use state abstraction and transition abstraction to create, verify and process large probabilistic state diagrams.
Stratification-Based Outlier Detection over the Deep Web.
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603
DBPQL: A view-oriented query language for the Intel Data Base Processor
NASA Technical Reports Server (NTRS)
Fishwick, P. A.
1983-01-01
An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.
Latent Semantic Analysis as a Method of Content-Based Image Retrieval in Medical Applications
ERIC Educational Resources Information Center
Makovoz, Gennadiy
2010-01-01
The research investigated whether a Latent Semantic Analysis (LSA)-based approach to image retrieval can map pixel intensity into a smaller concept space with good accuracy and reasonable computational cost. From a large set of M computed tomography (CT) images, a retrieval query found all images for a particular patient based on semantic…
Executing SPARQL Queries over the Web of Linked Data
NASA Astrophysics Data System (ADS)
Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph
The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.
Automatic Semantic Generation and Arabic Translation of Mathematical Expressions on the Web
ERIC Educational Resources Information Center
Doush, Iyad Abu; Al-Bdarneh, Sondos
2013-01-01
Automatic processing of mathematical information on the web imposes some difficulties. This paper presents a novel technique for automatic generation of mathematical equations semantic and Arabic translation on the web. The proposed system facilitates unambiguous representation of mathematical equations by correlating equations to their known…
Secure and Efficient k-NN Queries⋆
Asif, Hafiz; Vaidya, Jaideep; Shafiq, Basit; Adam, Nabil
2017-01-01
Given the morass of available data, ranking and best match queries are often used to find records of interest. As such, k-NN queries, which give the k closest matches to a query point, are of particular interest, and have many applications. We study this problem in the context of the financial sector, wherein an investment portfolio database is queried for matching portfolios. Given the sensitivity of the information involved, our key contribution is to develop a secure k-NN computation protocol that can enable the computation k-NN queries in a distributed multi-party environment while taking domain semantics into account. The experimental results show that the proposed protocols are extremely efficient. PMID:29218333
UltiMatch-NL: A Web Service Matchmaker Based on Multiple Semantic Filters
Mohebbi, Keyvan; Ibrahim, Suhaimi; Zamani, Mazdak; Khezrian, Mojtaba
2014-01-01
In this paper, a Semantic Web service matchmaker called UltiMatch-NL is presented. UltiMatch-NL applies two filters namely Signature-based and Description-based on different abstraction levels of a service profile to achieve more accurate results. More specifically, the proposed filters rely on semantic knowledge to extract the similarity between a given pair of service descriptions. Thus it is a further step towards fully automated Web service discovery via making this process more semantic-aware. In addition, a new technique is proposed to weight and combine the results of different filters of UltiMatch-NL, automatically. Moreover, an innovative approach is introduced to predict the relevance of requests and Web services and eliminate the need for setting a threshold value of similarity. In order to evaluate UltiMatch-NL, the repository of OWLS-TC is used. The performance evaluation based on standard measures from the information retrieval field shows that semantic matching of OWL-S services can be significantly improved by incorporating designed matching filters. PMID:25157872
Recipes for Semantic Web Dog Food — The ESWC and ISWC Metadata Projects
NASA Astrophysics Data System (ADS)
Möller, Knud; Heath, Tom; Handschuh, Siegfried; Domingue, John
Semantic Web conferences such as ESWC and ISWC offer prime opportunities to test and showcase semantic technologies. Conference metadata about people, papers and talks is diverse in nature and neither too small to be uninteresting or too big to be unmanageable. Many metadata-related challenges that may arise in the Semantic Web at large are also present here. Metadata must be generated from sources which are often unstructured and hard to process, and may originate from many different players, therefore suitable workflows must be established. Moreover, the generated metadata must use appropriate formats and vocabularies, and be served in a way that is consistent with the principles of linked data. This paper reports on the metadata efforts from ESWC and ISWC, identifies specific issues and barriers encountered during the projects, and discusses how these were approached. Recommendations are made as to how these may be addressed in the future, and we discuss how these solutions may generalize to metadata production for the Semantic Web at large.
UltiMatch-NL: a Web service matchmaker based on multiple semantic filters.
Mohebbi, Keyvan; Ibrahim, Suhaimi; Zamani, Mazdak; Khezrian, Mojtaba
2014-01-01
In this paper, a Semantic Web service matchmaker called UltiMatch-NL is presented. UltiMatch-NL applies two filters namely Signature-based and Description-based on different abstraction levels of a service profile to achieve more accurate results. More specifically, the proposed filters rely on semantic knowledge to extract the similarity between a given pair of service descriptions. Thus it is a further step towards fully automated Web service discovery via making this process more semantic-aware. In addition, a new technique is proposed to weight and combine the results of different filters of UltiMatch-NL, automatically. Moreover, an innovative approach is introduced to predict the relevance of requests and Web services and eliminate the need for setting a threshold value of similarity. In order to evaluate UltiMatch-NL, the repository of OWLS-TC is used. The performance evaluation based on standard measures from the information retrieval field shows that semantic matching of OWL-S services can be significantly improved by incorporating designed matching filters.
Annotations of Mexican bullfighting videos for semantic index
NASA Astrophysics Data System (ADS)
Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro
2015-09-01
The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.
Navigation as a New Form of Search for Agricultural Learning Resources in Semantic Repositories
NASA Astrophysics Data System (ADS)
Cano, Ramiro; Abián, Alberto; Mena, Elena
Education is essential when it comes to raise public awareness on the environmental and economic benefits of organic agriculture and agroecology (OA & AE). Organic.Edunet, an EU funded project, aims at providing a freely-available portal where learning contents on OA & AE can be published and accessed through specialized technologies. This paper describes a novel mechanism for providing semantic capabilities (such as semantic navigational queries) to an arbitrary set of agricultural learning resources, in the context of the Organic.Edunet initiative.
Guardia, Gabriela D A; Ferreira Pires, Luís; da Silva, Eduardo G; de Farias, Cléver R G
2017-02-01
Gene expression studies often require the combined use of a number of analysis tools. However, manual integration of analysis tools can be cumbersome and error prone. To support a higher level of automation in the integration process, efforts have been made in the biomedical domain towards the development of semantic web services and supporting composition environments. Yet, most environments consider only the execution of simple service behaviours and requires users to focus on technical details of the composition process. We propose a novel approach to the semantic composition of gene expression analysis services that addresses the shortcomings of the existing solutions. Our approach includes an architecture designed to support the service composition process for gene expression analysis, and a flexible strategy for the (semi) automatic composition of semantic web services. Finally, we implement a supporting platform called SemanticSCo to realize the proposed composition approach and demonstrate its functionality by successfully reproducing a microarray study documented in the literature. The SemanticSCo platform provides support for the composition of RESTful web services semantically annotated using SAWSDL. Our platform also supports the definition of constraints/conditions regarding the order in which service operations should be invoked, thus enabling the definition of complex service behaviours. Our proposed solution for semantic web service composition takes into account the requirements of different stakeholders and addresses all phases of the service composition process. It also provides support for the definition of analysis workflows at a high-level of abstraction, thus enabling users to focus on biological research issues rather than on the technical details of the composition process. The SemanticSCo source code is available at https://github.com/usplssb/SemanticSCo. Copyright © 2017 Elsevier Inc. All rights reserved.
Improving life sciences information retrieval using semantic web technology.
Quan, Dennis
2007-05-01
The ability to retrieve relevant information is at the heart of every aspect of research and development in the life sciences industry. Information is often distributed across multiple systems and recorded in a way that makes it difficult to piece together the complete picture. Differences in data formats, naming schemes and network protocols amongst information sources, both public and private, must be overcome, and user interfaces not only need to be able to tap into these diverse information sources but must also assist users in filtering out extraneous information and highlighting the key relationships hidden within an aggregated set of information. The Semantic Web community has made great strides in proposing solutions to these problems, and many efforts are underway to apply Semantic Web techniques to the problem of information retrieval in the life sciences space. This article gives an overview of the principles underlying a Semantic Web-enabled information retrieval system: creating a unified abstraction for knowledge using the RDF semantic network model; designing semantic lenses that extract contextually relevant subsets of information; and assembling semantic lenses into powerful information displays. Furthermore, concrete examples of how these principles can be applied to life science problems including a scenario involving a drug discovery dashboard prototype called BioDash are provided.
Accelerating Cancer Systems Biology Research through Semantic Web Technology
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S.
2012-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute’s caBIG®, so users can not only interact with the DMR through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers’ intellectual property. PMID:23188758
Accelerating cancer systems biology research through Semantic Web technology.
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S
2013-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute's caBIG, so users can interact with the DMR not only through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers' intellectual property. Copyright © 2012 Wiley Periodicals, Inc.
A unified framework for managing provenance information in translational research
2011-01-01
Background A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis. PMID:22126369
Theodosiou, T; Vizirianakis, I S; Angelis, L; Tsaftaris, A; Darzentas, N
2011-12-01
PubMed is the most widely used database of biomedical literature. To the detriment of the user though, the ranking of the documents retrieved for a query is not content-based, and important semantic information in the form of assigned Medical Subject Headings (MeSH) terms is not readily presented or productively utilized. The motivation behind this work was the discovery of unanticipated information through the appropriate ranking of MeSH term pairs and, indirectly, documents. Such information can be useful in guiding novel research and following promising trends. A web-based tool, called MeSHy, was developed implementing a mainly statistical algorithm. The algorithm takes into account the frequencies of occurrences, concurrences, and the semantic similarities of MeSH terms in retrieved PubMed documents to create MeSH term pairs. These are then scored and ranked, focusing on their unexpectedly frequent or infrequent occurrences. MeSHy presents results through an online interactive interface facilitating further manipulation through filtering and sorting. The results themselves include the MeSH term pairs, along with MeSH categories, the score, and document IDs, all of which are hyperlinked for convenience. To highlight the applicability of the tool, we report the findings of an expert in the pharmacology field on querying the molecularly-targeted drug imatinib and nutrition-related flavonoids. To the best of our knowledge, MeSHy is the first publicly available tool able to directly provide such a different perspective on the complex nature of published work. Implemented in Perl and served by Apache2 at http://bat.ina.certh.gr/tools/meshy/ with all major browsers supported. Copyright © 2011 Elsevier Inc. All rights reserved.
An Information Infrastructure for Coastal Models and Data
NASA Astrophysics Data System (ADS)
Hardin, D.; Keiser, K.; Conover, H.; Graves, S.
2007-12-01
Advances in semantics and visualization have given rise to new capabilities for the location, manipulation, integration, management and display of data and information in and across domains. An example of these capabilities is illustrated by a coastal restoration project that utilizes satellite, in-situ data and hydrodynamic model output to address seagrass habitat restoration in the Northern Gulf of Mexico. In this project a standard stressor conceptual model was implemented as an ontology in addition to the typical CMAP diagram. The ontology captures the elements of the seagrass conceptual model as well as the relationships between them. Noesis, developed by the University of Alabama in Huntsville, is an application that provides a simple but powerful way to search and organize data and information represented by ontologies. Noesis uses domain ontologies to help scope search queries to ensure that search results are both accurate and complete. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. As a resource aggregator Noesis categorizes search results returned from multiple, concurrent search engines such as Google, Yahoo, and Ask.com. Search results are further directed by accessing domain specific catalogs that include outputs from hydrodynamic and other models. Embedded within the search results are links that invoke applications such as web map displays, animation tools and virtual globe applications such as Google Earth. In the seagrass prioritization project Noesis is used to locate information that is vital to understanding the impact of stressors on the habitat. This presentation will show how the intelligent search capabilities of Noesis are coupled with visualization tools and model output to investigate the restoration of seagrass habitat.
ESTminer: a Web interface for mining EST contig and cluster databases.
Huang, Yecheng; Pumphrey, Janie; Gingle, Alan R
2005-03-01
ESTminer is a Web application and database schema for interactive mining of expressed sequence tag (EST) contig and cluster datasets. The Web interface contains a query frame that allows the selection of contigs/clusters with specific cDNA library makeup or a threshold number of members. The results are displayed as color-coded tree nodes, where the color indicates the fractional size of each cDNA library component. The nodes are expandable, revealing library statistics as well as EST or contig members, with links to sequence data, GenBank records or user configurable links. Also, the interface allows 'queries within queries' where the result set of a query is further filtered by the subsequent query. ESTminer is implemented in Java/JSP and the package, including MySQL and Oracle schema creation scripts, is available from http://cggc.agtec.uga.edu/Data/download.asp agingle@uga.edu.
High-performance web services for querying gene and variant annotation.
Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei
2016-05-06
Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.
Query Language for Location-Based Services: A Model Checking Approach
NASA Astrophysics Data System (ADS)
Hoareau, Christian; Satoh, Ichiro
We present a model checking approach to the rationale, implementation, and applications of a query language for location-based services. Such query mechanisms are necessary so that users, objects, and/or services can effectively benefit from the location-awareness of their surrounding environment. The underlying data model is founded on a symbolic model of space organized in a tree structure. Once extended to a semantic model for modal logic, we regard location query processing as a model checking problem, and thus define location queries as hybrid logicbased formulas. Our approach is unique to existing research because it explores the connection between location models and query processing in ubiquitous computing systems, relies on a sound theoretical basis, and provides modal logic-based query mechanisms for expressive searches over a decentralized data structure. A prototype implementation is also presented and will be discussed.
Syndromic surveillance models using Web data: the case of scarlet fever in the UK.
Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel
2012-03-01
Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.
A future Outlook: Web based Simulation of Hydrodynamic models
NASA Astrophysics Data System (ADS)
Islam, A. S.; Piasecki, M.
2003-12-01
Despite recent advances to present simulation results as 3D graphs or animation contours, the modeling user community still faces some shortcomings when trying to move around and analyze data. Typical problems include the lack of common platforms with standard vocabulary to exchange simulation results from different numerical models, insufficient descriptions about data (metadata), lack of robust search and retrieval tools for data, and difficulties to reuse simulation domain knowledge. This research demonstrates how to create a shared simulation domain in the WWW and run a number of models through multi-user interfaces. Firstly, meta-datasets have been developed to describe hydrodynamic model data based on geographic metadata standard (ISO 19115) that has been extended to satisfy the need of the hydrodynamic modeling community. The Extended Markup Language (XML) is used to publish this metadata by the Resource Description Framework (RDF). Specific domain ontology for Web Based Simulation (WBS) has been developed to explicitly define vocabulary for the knowledge based simulation system. Subsequently, this knowledge based system is converted into an object model using Meta Object Family (MOF). The knowledge based system acts as a Meta model for the object oriented system, which aids in reusing the domain knowledge. Specific simulation software has been developed based on the object oriented model. Finally, all model data is stored in an object relational database. Database back-ends help store, retrieve and query information efficiently. This research uses open source software and technology such as Java Servlet and JSP, Apache web server, Tomcat Servlet Engine, PostgresSQL databases, Protégé ontology editor, RDQL and RQL for querying RDF in semantic level, Jena Java API for RDF. Also, we use international standards such as the ISO 19115 metadata standard, and specifications such as XML, RDF, OWL, XMI, and UML. The final web based simulation product is deployed as Web Archive (WAR) files which is platform and OS independent and can be used by Windows, UNIX, or Linux. Keywords: Apache, ISO 19115, Java Servlet, Jena, JSP, Metadata, MOF, Linux, Ontology, OWL, PostgresSQL, Protégé, RDF, RDQL, RQL, Tomcat, UML, UNIX, Windows, WAR, XML
Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.
2011-01-01
Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308
XSemantic: An Extension of LCA Based XML Semantic Search
NASA Astrophysics Data System (ADS)
Supasitthimethee, Umaporn; Shimizu, Toshiyuki; Yoshikawa, Masatoshi; Porkaew, Kriengkrai
One of the most convenient ways to query XML data is a keyword search because it does not require any knowledge of XML structure or learning a new user interface. However, the keyword search is ambiguous. The users may use different terms to search for the same information. Furthermore, it is difficult for a system to decide which node is likely to be chosen as a return node and how much information should be included in the result. To address these challenges, we propose an XML semantic search based on keywords called XSemantic. On the one hand, we give three definitions to complete in terms of semantics. Firstly, the semantic term expansion, our system is robust from the ambiguous keywords by using the domain ontology. Secondly, to return semantic meaningful answers, we automatically infer the return information from the user queries and take advantage of the shortest path to return meaningful connections between keywords. Thirdly, we present the semantic ranking that reflects the degree of similarity as well as the semantic relationship so that the search results with the higher relevance are presented to the users first. On the other hand, in the LCA and the proximity search approaches, we investigated the problem of information included in the search results. Therefore, we introduce the notion of the Lowest Common Element Ancestor (LCEA) and define our simple rule without any requirement on the schema information such as the DTD or XML Schema. The first experiment indicated that XSemantic not only properly infers the return information but also generates compact meaningful results. Additionally, the benefits of our proposed semantics are demonstrated by the second experiment.
An architecture for diversity-aware search for medical web content.
Denecke, K
2012-01-01
The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user. We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher. The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures. It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.
A Digital Knowledge Preservation Platform for Environmental Sciences
NASA Astrophysics Data System (ADS)
Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida; Perez, David
2017-04-01
The Digital Knowledge Preservation Platform is the evolution of a pilot project for Open Data supporting the full research data life cycle. It is currently being evolved at IFCA (Instituto de Física de Cantabria) as a combination of different open tools that have been extended: DMPTool (https://dmptool.org/) with pilot semantics features (RDF export, parameters definition), INVENIO (http://invenio-software.org/ ) customized version to integrate the entire research data life cycle and Jupyter (http://jupyter.org/) as processing tool and reproducibility environment. This complete platform aims to provide an integrated environment for research data management following the FAIR+R principles: -Findable: The Web portal based on Invenio provides a search engine and all elements including metadata to make them easily findable. -Accessible: Both data and software are available online with internal PIDs and DOIs (provided by Datacite). -Interoperable: Datasets can be combined to perform new analysis. The OAI-PMH standard is also integrated. -Re-usable: different licenses types and embargo periods can be defined. -+Reproducible: directly integrated with cloud computing resources. The deployment of the entire system over a Cloud framework helps to build a dynamic and scalable solution, not only for managing open datasets but also as a useful tool for the final user, who is able to directly process and analyse the open data. In parallel, the direct use of semantics and metadata is being explored and integrated in the framework. Ontologies, being a knowledge representation, can contribute to define the elements and relationships of the research data life cycle, including DMP, datasets, software, etc. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop the ontology we are using a graphical tool called Protégé. Protégé is a graphical ontology-development tool which supports a rich knowledge model and it is open-source and freely available. However in order to process and manage the ontology from the web framework, we are using Semantic MediaWiki, which is able to process queries. Semantic MediaWiki is an extension of MediaWiki where we can do semantic search and export data in RDF and CSV format. This system is used as a testbed for the potential use of semantics in a more general environment. This Digital Knowledge Preservation Platform is very closed related to INDIGO-DataCloud project (https://www.indigo-datacloud.eu) since the same data life cycle approach is taking into account (Planning, Collect, Curate, Analyze, Publish, Preserve). INDIGO-DataCloud solutions will be able to support all the different elements in the system, as we showed in the last Research Data Alliance Plenary. This presentation will show the different elements on the system and how they work, as well as the roadmap of their continuous integration.
Raising the IQ in full-text searching via intelligent querying
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kero, R.; Russell, L.; Swietlik, C.
1994-11-01
Current Information Retrieval (IR) technologies allow for efficient access to relevant information, provided that user selected query terms coincide with the specific linguistical choices made by the authors whose works constitute the text-base. Therefore, the challenge is to enhance the limited searching capability of state-of-the-practice IR. This can be done either with augmented clients that overcome current server searching deficiencies, or with added capabilities that can augment searching algorithms on the servers. The technology being investigated is that of deductive databases, with a set of new techniques called cooperative answering. This technology utilizes semantic networks to allow for navigation betweenmore » possible query search term alternatives. The augmented search terms are passed to an IR engine and the results can be compared. The project utilizes the OSTI Environment, Safety and Health Thesaurus to populate the domain specific semantic network and the text base of ES&H related documents from the Facility Profile Information Management System as the domain specific search space.« less
Flexible querying of Web data to simulate bacterial growth in food.
Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Hignette, Gaëlle; Mettler, Eric; Soler, Lydie
2011-06-01
A preliminary step in microbial risk assessment in foods is the gathering of experimental data. In the framework of the Sym'Previus project, we have designed a complete data integration system opened on the Web which allows a local database to be complemented by data extracted from the Web and annotated using a domain ontology. We focus on the Web data tables as they contain, in general, a synthesis of data published in the documents. We propose in this paper a flexible querying system using the domain ontology to scan simultaneously local and Web data, this in order to feed the predictive modeling tools available on the Sym'Previus platform. Special attention is paid on the way fuzzy annotations associated with Web data are taken into account in the querying process, which is an important and original contribution of the proposed system. Copyright © 2010 Elsevier Ltd. All rights reserved.
Semantic Networks and Social Networks
ERIC Educational Resources Information Center
Downes, Stephen
2005-01-01
Purpose: To illustrate the need for social network metadata within semantic metadata. Design/methodology/approach: Surveys properties of social networks and the semantic web, suggests that social network analysis applies to semantic content, argues that semantic content is more searchable if social network metadata is merged with semantic web…
Advancing translational research with the Semantic Web.
Ruttenberg, Alan; Clark, Tim; Bug, William; Samwald, Matthias; Bodenreider, Olivier; Chen, Helen; Doherty, Donald; Forsberg, Kerstin; Gao, Yong; Kashyap, Vipul; Kinoshita, June; Luciano, Joanne; Marshall, M Scott; Ogbuji, Chimezie; Rees, Jonathan; Stephens, Susie; Wong, Gwendolyn T; Wu, Elizabeth; Zaccagnini, Davide; Hongsermeier, Tonya; Neumann, Eric; Herman, Ivan; Cheung, Kei-Hoi
2007-05-09
A fundamental goal of the U.S. National Institute of Health (NIH) "Roadmap" is to strengthen Translational Research, defined as the movement of discoveries in basic research to application at the clinical level. A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. The Semantic Web is an extension of the current Web that enables navigation and meaningful use of digital resources by automatic processes. It is based on common formats that support aggregation and integration of data drawn from diverse sources. A variety of technologies have been built on this foundation that, together, support identifying, representing, and reasoning across a wide range of biomedical data. The Semantic Web Health Care and Life Sciences Interest Group (HCLSIG), set up within the framework of the World Wide Web Consortium, was launched to explore the application of these technologies in a variety of areas. Subgroups focus on making biomedical data available in RDF, working with biomedical ontologies, prototyping clinical decision support systems, working on drug safety and efficacy communication, and supporting disease researchers navigating and annotating the large amount of potentially relevant literature. We present a scenario that shows the value of the information environment the Semantic Web can support for aiding neuroscience researchers. We then report on several projects by members of the HCLSIG, in the process illustrating the range of Semantic Web technologies that have applications in areas of biomedicine. Semantic Web technologies present both promise and challenges. Current tools and standards are already adequate to implement components of the bench-to-bedside vision. On the other hand, these technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Still, the potential of interoperable knowledge sources for biomedicine, at the scale of the World Wide Web, merits continued work.
Advancing translational research with the Semantic Web
Ruttenberg, Alan; Clark, Tim; Bug, William; Samwald, Matthias; Bodenreider, Olivier; Chen, Helen; Doherty, Donald; Forsberg, Kerstin; Gao, Yong; Kashyap, Vipul; Kinoshita, June; Luciano, Joanne; Marshall, M Scott; Ogbuji, Chimezie; Rees, Jonathan; Stephens, Susie; Wong, Gwendolyn T; Wu, Elizabeth; Zaccagnini, Davide; Hongsermeier, Tonya; Neumann, Eric; Herman, Ivan; Cheung, Kei-Hoi
2007-01-01
Background A fundamental goal of the U.S. National Institute of Health (NIH) "Roadmap" is to strengthen Translational Research, defined as the movement of discoveries in basic research to application at the clinical level. A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. The Semantic Web is an extension of the current Web that enables navigation and meaningful use of digital resources by automatic processes. It is based on common formats that support aggregation and integration of data drawn from diverse sources. A variety of technologies have been built on this foundation that, together, support identifying, representing, and reasoning across a wide range of biomedical data. The Semantic Web Health Care and Life Sciences Interest Group (HCLSIG), set up within the framework of the World Wide Web Consortium, was launched to explore the application of these technologies in a variety of areas. Subgroups focus on making biomedical data available in RDF, working with biomedical ontologies, prototyping clinical decision support systems, working on drug safety and efficacy communication, and supporting disease researchers navigating and annotating the large amount of potentially relevant literature. Results We present a scenario that shows the value of the information environment the Semantic Web can support for aiding neuroscience researchers. We then report on several projects by members of the HCLSIG, in the process illustrating the range of Semantic Web technologies that have applications in areas of biomedicine. Conclusion Semantic Web technologies present both promise and challenges. Current tools and standards are already adequate to implement components of the bench-to-bedside vision. On the other hand, these technologies are young. Gaps in standards and implementations still exist and adoption is limited by typical problems with early technology, such as the need for a critical mass of practitioners and installed base, and growing pains as the technology is scaled up. Still, the potential of interoperable knowledge sources for biomedicine, at the scale of the World Wide Web, merits continued work. PMID:17493285
NASA Astrophysics Data System (ADS)
Siegel, Z.; Siegel, Edward Carl-Ludwig
2011-03-01
RANDOMNESS of Numbers cognitive-semantics DEFINITION VIA Cognition QUERY: WHAT???, NOT HOW?) VS. computer-``science" mindLESS number-crunching (Harrel-Sipser-...) algorithmics Goldreich "PSEUDO-randomness"[Not.AMS(02)] mea-culpa is ONLY via MAXWELL-BOLTZMANN CLASSICAL-STATISTICS(NOT FDQS!!!) "hot-plasma" REPULSION VERSUS Newcomb(1881)-Weyl(1914;1916)-Benford(1938) "NeWBe" logarithmic-law digit-CLUMPING/ CLUSTERING NON-Randomness simple Siegel[AMS Joint.Mtg.(02)-Abs. # 973-60-124] algebraic-inversion to THE QUANTUM and ONLY BEQS preferentially SEQUENTIALLY lower-DIGITS CLUMPING/CLUSTERING with d = 0 BEC, is ONLY VIA Siegel-Baez FUZZYICS=CATEGORYICS (SON OF TRIZ)/"Category-Semantics"(C-S), latter intersection/union of Lawvere(1964)-Siegel(1964)] category-theory (matrix: MORPHISMS V FUNCTORS) "+" cognitive-semantics'' (matrix: ANTONYMS V SYNONYMS) yields Siegel-Baez FUZZYICS=CATEGORYICS/C-S tabular list-format matrix truth-table analytics: MBCS RANDOMNESS TRUTH/EMET!!!
Constructing a Graph Database for Semantic Literature-Based Discovery.
Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C
2015-01-01
Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.
Usage and applications of Semantic Web techniques and technologies to support chemistry research
2014-01-01
Background The drug discovery process is now highly dependent on the management, curation and integration of large amounts of potentially useful data. Semantics are necessary in order to interpret the information and derive knowledge. Advances in recent years have mitigated concerns that the lack of robust, usable tools has inhibited the adoption of methodologies based on semantics. Results This paper presents three examples of how Semantic Web techniques and technologies can be used in order to support chemistry research: a controlled vocabulary for quantities, units and symbols in physical chemistry; a controlled vocabulary for the classification and labelling of chemical substances and mixtures; and, a database of chemical identifiers. This paper also presents a Web-based service that uses the datasets in order to assist with the completion of risk assessment forms, along with a discussion of the legal implications and value-proposition for the use of such a service. Conclusions We have introduced the Semantic Web concepts, technologies, and methodologies that can be used to support chemistry research, and have demonstrated the application of those techniques in three areas very relevant to modern chemistry research, generating three new datasets that we offer as exemplars of an extensible portfolio of advanced data integration facilities. We have thereby established the importance of Semantic Web techniques and technologies for meeting Wild’s fourth “grand challenge”. PMID:24855494
Usage and applications of Semantic Web techniques and technologies to support chemistry research.
Borkum, Mark I; Frey, Jeremy G
2014-01-01
The drug discovery process is now highly dependent on the management, curation and integration of large amounts of potentially useful data. Semantics are necessary in order to interpret the information and derive knowledge. Advances in recent years have mitigated concerns that the lack of robust, usable tools has inhibited the adoption of methodologies based on semantics. THIS PAPER PRESENTS THREE EXAMPLES OF HOW SEMANTIC WEB TECHNIQUES AND TECHNOLOGIES CAN BE USED IN ORDER TO SUPPORT CHEMISTRY RESEARCH: a controlled vocabulary for quantities, units and symbols in physical chemistry; a controlled vocabulary for the classification and labelling of chemical substances and mixtures; and, a database of chemical identifiers. This paper also presents a Web-based service that uses the datasets in order to assist with the completion of risk assessment forms, along with a discussion of the legal implications and value-proposition for the use of such a service. We have introduced the Semantic Web concepts, technologies, and methodologies that can be used to support chemistry research, and have demonstrated the application of those techniques in three areas very relevant to modern chemistry research, generating three new datasets that we offer as exemplars of an extensible portfolio of advanced data integration facilities. We have thereby established the importance of Semantic Web techniques and technologies for meeting Wild's fourth "grand challenge".
Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining
Sadesh, S.; Suganthe, R. C.
2015-01-01
Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio. PMID:26221626
Stracuzzi, David John; Brost, Randolph C.; Phillips, Cynthia A.; ...
2015-09-26
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. As a result, we present a preliminary evaluation of three methods for determining both match qualitymore » scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.« less
Usage of the Jess Engine, Rules and Ontology to Query a Relational Database
NASA Astrophysics Data System (ADS)
Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej
We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.
Protecting count queries in study design
Sarwate, Anand D; Boxwala, Aziz A
2012-01-01
Objective Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. Methods A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Results Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Conclusions Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems. PMID:22511018
Protecting count queries in study design.
Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A
2012-01-01
Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.
A Semantic Web-based System for Mining Genetic Mutations in Cancer Clinical Trials.
Priya, Sambhawa; Jiang, Guoqian; Dasari, Surendra; Zimmermann, Michael T; Wang, Chen; Heflin, Jeff; Chute, Christopher G
2015-01-01
Textual eligibility criteria in clinical trial protocols contain important information about potential clinically relevant pharmacogenomic events. Manual curation for harvesting this evidence is intractable as it is error prone and time consuming. In this paper, we develop and evaluate a Semantic Web-based system that captures and manages mutation evidences and related contextual information from cancer clinical trials. The system has 2 main components: an NLP-based annotator and a Semantic Web ontology-based annotation manager. We evaluated the performance of the annotator in terms of precision and recall. We demonstrated the usefulness of the system by conducting case studies in retrieving relevant clinical trials using a collection of mutations identified from TCGA Leukemia patients and Atlas of Genetics and Cytogenetics in Oncology and Haematology. In conclusion, our system using Semantic Web technologies provides an effective framework for extraction, annotation, standardization and management of genetic mutations in cancer clinical trials.
Framework for Building Collaborative Research Environment
Devarakonda, Ranjeet; Palanisamy, Giriprakash; San Gil, Inigo
2014-10-25
Wide range of expertise and technologies are the key to solving some global problems. Semantic web technology can revolutionize the nature of how scientific knowledge is produced and shared. The semantic web is all about enabling machine-machine readability instead of a routine human-human interaction. Carefully structured data, as in machine readable data is the key to enabling these interactions. Drupal is an example of one such toolset that can render all the functionalities of Semantic Web technology right out of the box. Drupal’s content management system automatically stores the data in a structured format enabling it to be machine. Withinmore » this paper, we will discuss how Drupal promotes collaboration in a research setting such as Oak Ridge National Laboratory (ORNL) and Long Term Ecological Research Center (LTER) and how it is effectively using the Semantic Web in achieving this.« less
a Web-Based Interactive Tool for Multi-Resolution 3d Models of a Maya Archaeological Site
NASA Astrophysics Data System (ADS)
Agugiaro, G.; Remondino, F.; Girardi, G.; von Schwerin, J.; Richards-Rissetto, H.; De Amicis, R.
2011-09-01
Continuous technological advances in surveying, computing and digital-content delivery are strongly contributing to a change in the way Cultural Heritage is "perceived": new tools and methodologies for documentation, reconstruction and research are being created to assist not only scholars, but also to reach more potential users (e.g. students and tourists) willing to access more detailed information about art history and archaeology. 3D computer-simulated models, sometimes set in virtual landscapes, offer for example the chance to explore possible hypothetical reconstructions, while on-line GIS resources can help interactive analyses of relationships and change over space and time. While for some research purposes a traditional 2D approach may suffice, this is not the case for more complex analyses concerning spatial and temporal features of architecture, like for example the relationship of architecture and landscape, visibility studies etc. The project aims therefore at creating a tool, called "QueryArch3D" tool, which enables the web-based visualisation and queries of an interactive, multi-resolution 3D model in the framework of Cultural Heritage. More specifically, a complete Maya archaeological site, located in Copan (Honduras), has been chosen as case study to test and demonstrate the platform's capabilities. Much of the site has been surveyed and modelled at different levels of detail (LoD) and the geometric model has been semantically segmented and integrated with attribute data gathered from several external data sources. The paper describes the characteristics of the research work, along with its implementation issues and the initial results of the developed prototype.