Sample records for mining semantics modeling

  1. Modeling Spatial Dependencies and Semantic Concepts in Data Mining

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju

    Data mining is the process of discovering new patterns and relationships in large datasets. However, several studies have shown that general data mining techniques often fail to extract meaningful patterns and relationships from the spatial data owing to the violation of fundamental geospatial principles. In this tutorial, we introduce basic principles behind explicit modeling of spatial and semantic concepts in data mining. In particular, we focus on modeling these concepts in the widely used classification, clustering, and prediction algorithms. Classification is the process of learning a structure or model (from user given inputs) and applying the known model to themore » new data. Clustering is the process of discovering groups and structures in the data that are ``similar,'' without applying any known structures in the data. Prediction is the process of finding a function that models (explains) the data with least error. One common assumption among all these methods is that the data is independent and identically distributed. Such assumptions do not hold well in spatial data, where spatial dependency and spatial heterogeneity are a norm. In addition, spatial semantics are often ignored by the data mining algorithms. In this tutorial we cover recent advances in explicitly modeling of spatial dependencies and semantic concepts in data mining.« less

  2. A semantic model for multimodal data mining in healthcare information systems.

    PubMed

    Iakovidis, Dimitris; Smailis, Christos

    2012-01-01

    Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.

  3. Assessing semantic similarity of texts - Methods and algorithms

    NASA Astrophysics Data System (ADS)

    Rozeva, Anna; Zerkova, Silvia

    2017-12-01

    Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.

  4. Exploiting Recurring Structure in a Semantic Network

    NASA Technical Reports Server (NTRS)

    Wolfe, Shawn R.; Keller, Richard M.

    2004-01-01

    With the growing popularity of the Semantic Web, an increasing amount of information is becoming available in machine interpretable, semantically structured networks. Within these semantic networks are recurring structures that could be mined by existing or novel knowledge discovery methods. The mining of these semantic structures represents an interesting area that focuses on mining both for and from the Semantic Web, with surprising applicability to problems confronting the developers of Semantic Web applications. In this paper, we present representative examples of recurring structures and show how these structures could be used to increase the utility of a semantic repository deployed at NASA.

  5. Digital Workflows for a 3d Semantic Representation of AN Ancient Mining Landscape

    NASA Astrophysics Data System (ADS)

    Hiebel, G.; Hanke, K.

    2017-08-01

    The ancient mining landscape of Schwaz/Brixlegg in the Tyrol, Austria witnessed mining from prehistoric times to modern times creating a first order cultural landscape when it comes to one of the most important inventions in human history: the production of metal. In 1991 a part of this landscape was lost due to an enormous landslide that reshaped part of the mountain. With our work we want to propose a digital workflow to create a 3D semantic representation of this ancient mining landscape with its mining structures to preserve it for posterity. First, we define a conceptual model to integrate the data. It is based on the CIDOC CRM ontology and CRMgeo for geometric data. To transform our information sources to a formal representation of the classes and properties of the ontology we applied semantic web technologies and created a knowledge graph in RDF (Resource Description Framework). Through the CRMgeo extension coordinate information of mining features can be integrated into the RDF graph and thus related to the detailed digital elevation model that may be visualized together with the mining structures using Geoinformation systems or 3D visualization tools. The RDF network of the triple store can be queried using the SPARQL query language. We created a snapshot of mining, settlement and burial sites in the Bronze Age. The results of the query were loaded into a Geoinformation system and a visualization of known bronze age sites related to mining, settlement and burial activities was created.

  6. Semantic Context Detection Using Audio Event Fusion

    NASA Astrophysics Data System (ADS)

    Chu, Wei-Ta; Cheng, Wen-Huang; Wu, Ja-Ling

    2006-12-01

    Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.

  7. Semantic web for integrated network analysis in biomedicine.

    PubMed

    Chen, Huajun; Ding, Li; Wu, Zhaohui; Yu, Tong; Dhanapalan, Lavanya; Chen, Jake Y

    2009-03-01

    The Semantic Web technology enables integration of heterogeneous data on the World Wide Web by making the semantics of data explicit through formal ontologies. In this article, we survey the feasibility and state of the art of utilizing the Semantic Web technology to represent, integrate and analyze the knowledge in various biomedical networks. We introduce a new conceptual framework, semantic graph mining, to enable researchers to integrate graph mining with ontology reasoning in network data analysis. Through four case studies, we demonstrate how semantic graph mining can be applied to the analysis of disease-causal genes, Gene Ontology category cross-talks, drug efficacy analysis and herb-drug interactions analysis.

  8. EXACT2: the semantics of biomedical protocols

    PubMed Central

    2014-01-01

    Background The reliability and reproducibility of experimental procedures is a cornerstone of scientific practice. There is a pressing technological need for the better representation of biomedical protocols to enable other agents (human or machine) to better reproduce results. A framework that ensures that all information required for the replication of experimental protocols is essential to achieve reproducibility. Methods We have developed the ontology EXACT2 (EXperimental ACTions) that is designed to capture the full semantics of biomedical protocols required for their reproducibility. To construct EXACT2 we manually inspected hundreds of published and commercial biomedical protocols from several areas of biomedicine. After establishing a clear pattern for extracting the required information we utilized text-mining tools to translate the protocols into a machine amenable format. We have verified the utility of EXACT2 through the successful processing of previously 'unseen' (not used for the construction of EXACT2) protocols. Results The paper reports on a fundamentally new version EXACT2 that supports the semantically-defined representation of biomedical protocols. The ability of EXACT2 to capture the semantics of biomedical procedures was verified through a text mining use case. In this EXACT2 is used as a reference model for text mining tools to identify terms pertinent to experimental actions, and their properties, in biomedical protocols expressed in natural language. An EXACT2-based framework for the translation of biomedical protocols to a machine amenable format is proposed. Conclusions The EXACT2 ontology is sufficient to record, in a machine processable form, the essential information about biomedical protocols. EXACT2 defines explicit semantics of experimental actions, and can be used by various computer applications. It can serve as a reference model for for the translation of biomedical protocols in natural language into a semantically-defined format. PMID:25472549

  9. Structure Discovery in Large Semantic Graphs Using Extant Ontological Scaling and Descriptive Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    al-Saffar, Sinan; Joslyn, Cliff A.; Chappell, Alan R.

    As semantic datasets grow to be very large and divergent, there is a need to identify and exploit their inherent semantic structure for discovery and optimization. Towards that end, we present here a novel methodology to identify the semantic structures inherent in an arbitrary semantic graph dataset. We first present the concept of an extant ontology as a statistical description of the semantic relations present amongst the typed entities modeled in the graph. This serves as a model of the underlying semantic structure to aid in discovery and visualization. We then describe a method of ontological scaling in which themore » ontology is employed as a hierarchical scaling filter to infer different resolution levels at which the graph structures are to be viewed or analyzed. We illustrate these methods on three large and publicly available semantic datasets containing more than one billion edges each. Keywords-Semantic Web; Visualization; Ontology; Multi-resolution Data Mining;« less

  10. Proactive Supply Chain Performance Management with Predictive Analytics

    PubMed Central

    Stefanovic, Nenad

    2014-01-01

    Today's business climate requires supply chains to be proactive rather than reactive, which demands a new approach that incorporates data mining predictive analytics. This paper introduces a predictive supply chain performance management model which combines process modelling, performance measurement, data mining models, and web portal technologies into a unique model. It presents the supply chain modelling approach based on the specialized metamodel which allows modelling of any supply chain configuration and at different level of details. The paper also presents the supply chain semantic business intelligence (BI) model which encapsulates data sources and business rules and includes the data warehouse model with specific supply chain dimensions, measures, and KPIs (key performance indicators). Next, the paper describes two generic approaches for designing the KPI predictive data mining models based on the BI semantic model. KPI predictive models were trained and tested with a real-world data set. Finally, a specialized analytical web portal which offers collaborative performance monitoring and decision making is presented. The results show that these models give very accurate KPI projections and provide valuable insights into newly emerging trends, opportunities, and problems. This should lead to more intelligent, predictive, and responsive supply chains capable of adapting to future business environment. PMID:25386605

  11. Proactive supply chain performance management with predictive analytics.

    PubMed

    Stefanovic, Nenad

    2014-01-01

    Today's business climate requires supply chains to be proactive rather than reactive, which demands a new approach that incorporates data mining predictive analytics. This paper introduces a predictive supply chain performance management model which combines process modelling, performance measurement, data mining models, and web portal technologies into a unique model. It presents the supply chain modelling approach based on the specialized metamodel which allows modelling of any supply chain configuration and at different level of details. The paper also presents the supply chain semantic business intelligence (BI) model which encapsulates data sources and business rules and includes the data warehouse model with specific supply chain dimensions, measures, and KPIs (key performance indicators). Next, the paper describes two generic approaches for designing the KPI predictive data mining models based on the BI semantic model. KPI predictive models were trained and tested with a real-world data set. Finally, a specialized analytical web portal which offers collaborative performance monitoring and decision making is presented. The results show that these models give very accurate KPI projections and provide valuable insights into newly emerging trends, opportunities, and problems. This should lead to more intelligent, predictive, and responsive supply chains capable of adapting to future business environment.

  12. Developing a semantic web model for medical differential diagnosis recommendation.

    PubMed

    Mohammed, Osama; Benlamri, Rachid

    2014-10-01

    In this paper we describe a novel model for differential diagnosis designed to make recommendations by utilizing semantic web technologies. The model is a response to a number of requirements, ranging from incorporating essential clinical diagnostic semantics to the integration of data mining for the process of identifying candidate diseases that best explain a set of clinical features. We introduce two major components, which we find essential to the construction of an integral differential diagnosis recommendation model: the evidence-based recommender component and the proximity-based recommender component. Both approaches are driven by disease diagnosis ontologies designed specifically to enable the process of generating diagnostic recommendations. These ontologies are the disease symptom ontology and the patient ontology. The evidence-based diagnosis process develops dynamic rules based on standardized clinical pathways. The proximity-based component employs data mining to provide clinicians with diagnosis predictions, as well as generates new diagnosis rules from provided training datasets. This article describes the integration between these two components along with the developed diagnosis ontologies to form a novel medical differential diagnosis recommendation model. This article also provides test cases from the implementation of the overall model, which shows quite promising diagnostic recommendation results.

  13. Science and Technology Text Mining: Text Mining of the Journal Cortex

    DTIC Science & Technology

    2004-01-01

    Amnesia Retrograde Amnesia GENERAL Semantic Memory Episodic Memory Working Memory TEST Serial Position Curve...in Cortex can be reasonably divided into four categories (papers in each category in parenthesis): Semantic Memory (151); Handedness (145); Amnesia ... Semantic Memory (151) is divided into Verbal/ Numerical (76) and Visual/ Spatial (75). Amnesia (119) is divided into Amnesia Symptoms (50) and

  14. Graph Mining Meets the Semantic Web

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Sangkeun; Sukumar, Sreenivas R; Lim, Seung-Hwan

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluatemore » the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.« less

  15. Industrial application of semantic process mining

    NASA Astrophysics Data System (ADS)

    Espen Ingvaldsen, Jon; Atle Gulla, Jon

    2012-05-01

    Process mining relates to the extraction of non-trivial and useful information from information system event logs. It is a new research discipline that has evolved significantly since the early work on idealistic process logs. Over the last years, process mining prototypes have incorporated elements from semantics and data mining and targeted visualisation techniques that are more user-friendly to business experts and process owners. In this article, we present a framework for evaluating different aspects of enterprise process flows and address practical challenges of state-of-the-art industrial process mining. We also explore the inherent strengths of the technology for more efficient process optimisation.

  16. Attitudinal Modeling of Affect, Behavior and Cognition: Semantic Mining of Disaster Text Corpus

    DTIC Science & Technology

    2010-10-01

    Abelson, R . P., Kinder, D. R ., Peters, M. D., and Fiske, S. T . (1982). Affective and semantic components in political person perception. Journal of...servlet/ Satellite Orasanu, J. and Connolly, T . (1993). The reinvention of decision making. In: G.A. Klein, J. Orasanu, R . Calderwood, R ., and C.E...of resilience in natural disasters. Disaster Prevention and Management, 15, 5, 793-798. Reiman , T ., and Oedewald, P. (2007). Assessment of complex

  17. Adaptive semantic tag mining from heterogeneous clinical research texts.

    PubMed

    Hao, T; Weng, C

    2015-01-01

    To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

  18. Mining semantic networks of bioinformatics e-resources from the literature

    PubMed Central

    2011-01-01

    Background There have been a number of recent efforts (e.g. BioCatalogue, BioMoby) to systematically catalogue bioinformatics tools, services and datasets. These efforts rely on manual curation, making it difficult to cope with the huge influx of various electronic resources that have been provided by the bioinformatics community. We present a text mining approach that utilises the literature to automatically extract descriptions and semantically profile bioinformatics resources to make them available for resource discovery and exploration through semantic networks that contain related resources. Results The method identifies the mentions of resources in the literature and assigns a set of co-occurring terminological entities (descriptors) to represent them. We have processed 2,691 full-text bioinformatics articles and extracted profiles of 12,452 resources containing associated descriptors with binary and tf*idf weights. Since such representations are typically sparse (on average 13.77 features per resource), we used lexical kernel metrics to identify semantically related resources via descriptor smoothing. Resources are then clustered or linked into semantic networks, providing the users (bioinformaticians, curators and service/tool crawlers) with a possibility to explore algorithms, tools, services and datasets based on their relatedness. Manual exploration of links between a set of 18 well-known bioinformatics resources suggests that the method was able to identify and group semantically related entities. Conclusions The results have shown that the method can reconstruct interesting functional links between resources (e.g. linking data types and algorithms), in particular when tf*idf-like weights are used for profiling. This demonstrates the potential of combining literature mining and simple lexical kernel methods to model relatedness between resource descriptors in particular when there are few features, thus potentially improving the resource description, discovery and exploration process. The resource profiles are available at http://gnode1.mib.man.ac.uk/bioinf/semnets.html PMID:21388573

  19. Knowledge acquisition, semantic text mining, and security risks in health and biomedical informatics

    PubMed Central

    Huang, Jingshan; Dou, Dejing; Dang, Jiangbo; Pardue, J Harold; Qin, Xiao; Huan, Jun; Gerthoffer, William T; Tan, Ming

    2012-01-01

    Computational techniques have been adopted in medical and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from original data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research. PMID:22371823

  20. Exploring context and content links in social media: a latent space method.

    PubMed

    Qi, Guo-Jun; Aggarwal, Charu; Tian, Qi; Ji, Heng; Huang, Thomas S

    2012-05-01

    Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques.

  1. Mining integrated semantic networks for drug repositioning opportunities

    PubMed Central

    Mullen, Joseph; Tipney, Hannah

    2016-01-01

    Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions. PMID:26844016

  2. Kernel Methods for Mining Instance Data in Ontologies

    NASA Astrophysics Data System (ADS)

    Bloehdorn, Stephan; Sure, York

    The amount of ontologies and meta data available on the Web is constantly growing. The successful application of machine learning techniques for learning of ontologies from textual data, i.e. mining for the Semantic Web, contributes to this trend. However, no principal approaches exist so far for mining from the Semantic Web. We investigate how machine learning algorithms can be made amenable for directly taking advantage of the rich knowledge expressed in ontologies and associated instance data. Kernel methods have been successfully employed in various learning tasks and provide a clean framework for interfacing between non-vectorial data and machine learning algorithms. In this spirit, we express the problem of mining instances in ontologies as the problem of defining valid corresponding kernels. We present a principled framework for designing such kernels by means of decomposing the kernel computation into specialized kernels for selected characteristics of an ontology which can be flexibly assembled and tuned. Initial experiments on real world Semantic Web data enjoy promising results and show the usefulness of our approach.

  3. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

    PubMed Central

    Roos, Marco; Marshall, M Scott; Gibson, Andrew P; Schuemie, Martijn; Meij, Edgar; Katrenko, Sophia; van Hage, Willem Robert; Krommydas, Konstantinos; Adriaans, Pieter W

    2009-01-01

    Background Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation. PMID:19796406

  4. Mining large heterogeneous data sets in drug discovery.

    PubMed

    Wild, David J

    2009-10-01

    Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.

  5. Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

    PubMed

    Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

    2013-01-16

    The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields significantly better results. Before applying text-mining techniques, one must pay careful attention to the structure of the analyzed corpora. While the importance of data cleaning has been known for low-level text characteristics (e.g., encoding and spelling), high-level and difficult-to-quantify corpus characteristics, such as naturally occurring redundancy, can also hurt text mining. Fingerprinting enables text-mining techniques to leverage available data in the EHR corpus, while avoiding the bias introduced by redundancy.

  6. Semantic Theme Analysis of Pilot Incident Reports

    NASA Technical Reports Server (NTRS)

    Thirumalainambi, Rajkumar

    2009-01-01

    Pilots report accidents or incidents during take-off, on flight and landing to airline authorities and Federal aviation authority as well. The description of pilot reports for an incident contains technical terms related to Flight instruments and operations. Normal text mining approaches collect keywords from text documents and relate them among documents that are stored in database. Present approach will extract specific theme analysis of incident reports and semantically relate hierarchy of terms assigning weights of themes. Once the theme extraction has been performed for a given document, a unique key can be assigned to that document to cross linking the documents. Semantic linking will be used to categorize the documents based on specific rules that can help an end-user to analyze certain types of accidents. This presentation outlines the architecture of text mining for pilot incident reports for autonomous categorization of pilot incident reports using semantic theme analysis.

  7. Mining Predictors of Success in Air Force Flight Training Regiments via Semantic Analysis of Instructor Evaluations

    DTIC Science & Technology

    2018-03-01

    We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model that...factors. We apply our methodology to the criticism text written in the flight-training program student evaluations in order to construct a model...9 D. BINARY CLASSIFICATION AND FEATURE SELECTION ..........11 III. METHODOLOGY

  8. Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications.

    PubMed

    Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M; Maudsley, Stuart

    2013-01-01

    Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.

  9. BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains

    PubMed Central

    2014-01-01

    The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed. PMID:24495517

  10. Incorporating linguistic knowledge for learning distributed word representations.

    PubMed

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.

  11. Incorporating Linguistic Knowledge for Learning Distributed Word Representations

    PubMed Central

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining. PMID:25874581

  12. Wide coverage biomedical event extraction using multiple partially overlapping corpora

    PubMed Central

    2013-01-01

    Background Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. Results We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. Conclusions The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. PMID:23731785

  13. The effects of different representations on static structure analysis of computer malware signatures.

    PubMed

    Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban

    2013-01-01

    The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka.

  14. The Effects of Different Representations on Static Structure Analysis of Computer Malware Signatures

    PubMed Central

    Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban

    2013-01-01

    The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka. PMID:23983644

  15. Toward semantic-based retrieval of visual information: a model-based approach

    NASA Astrophysics Data System (ADS)

    Park, Youngchoon; Golshani, Forouzan; Panchanathan, Sethuraman

    2002-07-01

    This paper center around the problem of automated visual content classification. To enable classification based image or visual object retrieval, we propose a new image representation scheme called visual context descriptor (VCD) that is a multidimensional vector in which each element represents the frequency of a unique visual property of an image or a region. VCD utilizes the predetermined quality dimensions (i.e., types of features and quantization level) and semantic model templates mined in priori. Not only observed visual cues, but also contextually relevant visual features are proportionally incorporated in VCD. Contextual relevance of a visual cue to a semantic class is determined by using correlation analysis of ground truth samples. Such co-occurrence analysis of visual cues requires transformation of a real-valued visual feature vector (e.g., color histogram, Gabor texture, etc.,) into a discrete event (e.g., terms in text). Good-feature to track, rule of thirds, iterative k-means clustering and TSVQ are involved in transformation of feature vectors into unified symbolic representations called visual terms. Similarity-based visual cue frequency estimation is also proposed and used for ensuring the correctness of model learning and matching since sparseness of sample data causes the unstable results of frequency estimation of visual cues. The proposed method naturally allows integration of heterogeneous visual or temporal or spatial cues in a single classification or matching framework, and can be easily integrated into a semantic knowledge base such as thesaurus, and ontology. Robust semantic visual model template creation and object based image retrieval are demonstrated based on the proposed content description scheme.

  16. Semantic Framework of Internet of Things for Smart Cities: Case Studies.

    PubMed

    Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan

    2016-09-14

    In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications.

  17. Semantic Framework of Internet of Things for Smart Cities: Case Studies

    PubMed Central

    Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan

    2016-01-01

    In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications. PMID:27649185

  18. TOPTRAC: Topical Trajectory Pattern Mining

    PubMed Central

    Kim, Younghoon; Han, Jiawei; Yuan, Cangzhou

    2015-01-01

    With the increasing use of GPS-enabled mobile phones, geo-tagging, which refers to adding GPS information to media such as micro-blogging messages or photos, has seen a surge in popularity recently. This enables us to not only browse information based on locations, but also discover patterns in the location-based behaviors of users. Many techniques have been developed to find the patterns of people's movements using GPS data, but latent topics in text messages posted with local contexts have not been utilized effectively. In this paper, we present a latent topic-based clustering algorithm to discover patterns in the trajectories of geo-tagged text messages. We propose a novel probabilistic model to capture the semantic regions where people post messages with a coherent topic as well as the patterns of movement between the semantic regions. Based on the model, we develop an efficient inference algorithm to calculate model parameters. By exploiting the estimated model, we next devise a clustering algorithm to find the significant movement patterns that appear frequently in data. Our experiments on real-life data sets show that the proposed algorithm finds diverse and interesting trajectory patterns and identifies the semantic regions in a finer granularity than the traditional geographical clustering methods. PMID:26709365

  19. A Text-Mining Framework for Supporting Systematic Reviews.

    PubMed

    Li, Dingcheng; Wang, Zhen; Wang, Liwei; Sohn, Sunghwan; Shen, Feichen; Murad, Mohammad Hassan; Liu, Hongfang

    2016-11-01

    Systematic reviews (SRs) involve the identification, appraisal, and synthesis of all relevant studies for focused questions in a structured reproducible manner. High-quality SRs follow strict procedures and require significant resources and time. We investigated advanced text-mining approaches to reduce the burden associated with abstract screening in SRs and provide high-level information summary. A text-mining SR supporting framework consisting of three self-defined semantics-based ranking metrics was proposed, including keyword relevance, indexed-term relevance and topic relevance. Keyword relevance is based on the user-defined keyword list used in the search strategy. Indexed-term relevance is derived from indexed vocabulary developed by domain experts used for indexing journal articles and books. Topic relevance is defined as the semantic similarity among retrieved abstracts in terms of topics generated by latent Dirichlet allocation, a Bayesian-based model for discovering topics. We tested the proposed framework using three published SRs addressing a variety of topics (Mass Media Interventions, Rectal Cancer and Influenza Vaccine). The results showed that when 91.8%, 85.7%, and 49.3% of the abstract screening labor was saved, the recalls were as high as 100% for the three cases; respectively. Relevant studies identified manually showed strong topic similarity through topic analysis, which supported the inclusion of topic analysis as relevance metric. It was demonstrated that advanced text mining approaches can significantly reduce the abstract screening labor of SRs and provide an informative summary of relevant studies.

  20. Leveraging Semantic Knowledge in IRB Databases to Improve Translation Science

    PubMed Central

    Hurdle, John F.; Botkin, Jeffery; Rindflesch, Thomas C.

    2007-01-01

    We introduce the notion that research administrative databases (RADs), such as those increasingly used to manage information flow in the Institutional Review Board (IRB), offer a novel, useful, and mine-able data source overlooked by informaticists. As a proof of concept, using an IRB database we extracted all titles and abstracts from system startup through January 2007 (n=1,876); formatted these in a pseudo-MEDLINE format; and processed them through the SemRep semantic knowledge extraction system. Even though SemRep is tuned to find semantic relations in MEDLINE citations, we found that it performed comparably well on the IRB texts. When adjusted to eliminate non-healthcare IRB submissions (e.g., economic and education studies), SemRep extracted an average of 7.3 semantic relations per IRB abstract (compared to an average of 11.1 for MEDLINE citations) with a precision of 70% (compared to 78% for MEDLINE). We conclude that RADs, as represented by IRB data, are mine-able with existing tools, but that performance will improve as these tools are tuned for RAD structures. PMID:18693856

  1. A way toward analyzing high-content bioimage data by means of semantic annotation and visual data mining

    NASA Astrophysics Data System (ADS)

    Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.

    2009-02-01

    In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.

  2. Utilizing the Structure and Content Information for XML Document Clustering

    NASA Astrophysics Data System (ADS)

    Tran, Tien; Kutty, Sangeetha; Nayak, Richi

    This paper reports on the experiments and results of a clustering approach used in the INEX 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD). On a large feature space matrix, the computation of SVD is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD. The document space reduction is based on the common structural information of the Wikipedia XML document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX 2008 document mining challenge.

  3. Socio-contextual Network Mining for User Assistance in Web-based Knowledge Gathering Tasks

    NASA Astrophysics Data System (ADS)

    Rajendran, Balaji; Kombiah, Iyakutti

    Web-based Knowledge Gathering (WKG) is a specialized and complex information seeking task carried out by many users on the web, for their various learning, and decision-making requirements. We construct a contextual semantic structure by observing the actions of the users involved in WKG task, in order to gain an understanding of their task and requirement. We also build a knowledge warehouse in the form of a master Semantic Link Network (SLX) that accommodates and assimilates all the contextual semantic structures. This master SLX, which is a socio-contextual network, is then mined to provide contextual inputs to the current users through their agents. We validated our approach through experiments and analyzed the benefits to the users in terms of resource explorations and the time saved. The results are positive enough to motivate us to implement in a larger scale.

  4. Collaborative mining and transfer learning for relational data

    NASA Astrophysics Data System (ADS)

    Levchuk, Georgiy; Eslami, Mohammed

    2015-06-01

    Many of the real-world problems, - including human knowledge, communication, biological, and cyber network analysis, - deal with data entities for which the essential information is contained in the relations among those entities. Such data must be modeled and analyzed as graphs, with attributes on both objects and relations encode and differentiate their semantics. Traditional data mining algorithms were originally designed for analyzing discrete objects for which a set of features can be defined, and thus cannot be easily adapted to deal with graph data. This gave rise to the relational data mining field of research, of which graph pattern learning is a key sub-domain [11]. In this paper, we describe a model for learning graph patterns in collaborative distributed manner. Distributed pattern learning is challenging due to dependencies between the nodes and relations in the graph, and variability across graph instances. We present three algorithms that trade-off benefits of parallelization and data aggregation, compare their performance to centralized graph learning, and discuss individual benefits and weaknesses of each model. Presented algorithms are designed for linear speedup in distributed computing environments, and learn graph patterns that are both closer to ground truth and provide higher detection rates than centralized mining algorithm.

  5. Data Integration and Mining for Synthetic Biology Design.

    PubMed

    Mısırlı, Göksel; Hallinan, Jennifer; Pocock, Matthew; Lord, Phillip; McLaughlin, James Alastair; Sauro, Herbert; Wipat, Anil

    2016-10-21

    One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

  6. EAGLE: 'EAGLE'Is an' Algorithmic Graph Library for Exploration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-01-16

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. Today there is no tools to conduct "graph mining" on RDF standard data sets. We address that need through implementation of popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, degree distribution,more » diversity degree, PageRank, etc.). We implement these algorithms as SPARQL queries, wrapped within Python scripts and call our software tool as EAGLE. In RDF style, EAGLE stands for "EAGLE 'Is an' algorithmic graph library for exploration. EAGLE is like 'MATLAB' for 'Linked Data.'« less

  7. Coupling a regional warning system to a semantic engine on online news for enhancing landslide prediction

    NASA Astrophysics Data System (ADS)

    Battistini, Alessandro; Rosi, Ascanio; Segoni, Samuele; Catani, Filippo; Casagli, Nicola

    2017-04-01

    Landslide inventories are basic data for large scale landslide modelling, e.g. they are needed to calibrate and validate rainfall thresholds, physically based models and early warning systems. The setting up of landslide inventories with traditional methods (e.g. remote sensing, field surveys and manual retrieval of data from technical reports and local newspapers) is time consuming. The objective of this work is to automatically set up a landslide inventory using a state-of-the art semantic engine based on data mining on online news (Battistini et al., 2013) and to evaluate if the automatically generated inventory can be used to validate a regional scale landslide warning system based on rainfall-thresholds. The semantic engine scanned internet news in real time in a 50 months test period. At the end of the process, an inventory of approximately 900 landslides was set up for the Tuscany region (23,000 km2, Italy). The inventory was compared with the outputs of the regional landslide early warning system based on rainfall thresholds, and a good correspondence was found: e.g. 84% of the events reported in the news is correctly identified by the model. In addition, the cases of not correspondence were forwarded to the rainfall threshold developers, which used these inputs to update some of the thresholds. On the basis of the results obtained, we conclude that automatic validation of landslide models using geolocalized landslide events feedback is possible. The source of data for validation can be obtained directly from the internet channel using an appropriate semantic engine. We also automated the validation procedure, which is based on a comparison between forecasts and reported events. We verified that our approach can be automatically used for a near real time validation of the warning system and for a semi-automatic update of the rainfall thresholds, which could lead to an improvement of the forecasting effectiveness of the warning system. In the near future, the proposed procedure could operate in continuous time and could allow for a periodic update of landslide hazard models and landslide early warning systems with minimum human intervention. References: Battistini, A., Segoni, S., Manzo, G., Catani, F., Casagli, N. (2013). Web data mining for automatic inventory of geohazards at national scale. Applied Geography, 43, 147-158.

  8. Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical Science

    NASA Astrophysics Data System (ADS)

    Emadzadeh, Ehsan

    Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1--6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.

  9. Ontology-guided data preparation for discovering genotype-phenotype relationships.

    PubMed

    Coulet, Adrien; Smaïl-Tabbone, Malika; Benlian, Pascale; Napoli, Amedeo; Devignes, Marie-Dominique

    2008-04-25

    Complexity and amount of post-genomic data constitute two major factors limiting the application of Knowledge Discovery in Databases (KDD) methods in life sciences. Bio-ontologies may nowadays play key roles in knowledge discovery in life science providing semantics to data and to extracted units, by taking advantage of the progress of Semantic Web technologies concerning the understanding and availability of tools for knowledge representation, extraction, and reasoning. This paper presents a method that exploits bio-ontologies for guiding data selection within the preparation step of the KDD process. We propose three scenarios in which domain knowledge and ontology elements such as subsumption, properties, class descriptions, are taken into account for data selection, before the data mining step. Each of these scenarios is illustrated within a case-study relative to the search of genotype-phenotype relationships in a familial hypercholesterolemia dataset. The guiding of data selection based on domain knowledge is analysed and shows a direct influence on the volume and significance of the data mining results. The method proposed in this paper is an efficient alternative to numerical methods for data selection based on domain knowledge. In turn, the results of this study may be reused in ontology modelling and data integration.

  10. Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

    PubMed

    Urbain, Jay

    2015-12-01

    We present the design, and analyze the performance of a multi-stage natural language processing system employing named entity recognition, Bayesian statistics, and rule logic to identify and characterize heart disease risk factor events in diabetic patients over time. The system was originally developed for the 2014 i2b2 Challenges in Natural Language in Clinical Data. The system's strengths included a high level of accuracy for identifying named entities associated with heart disease risk factor events. The system's primary weakness was due to inaccuracies when characterizing the attributes of some events. For example, determining the relative time of an event with respect to the record date, whether an event is attributable to the patient's history or the patient's family history, and differentiating between current and prior smoking status. We believe these inaccuracies were due in large part to the lack of an effective approach for integrating context into our event detection model. To address these inaccuracies, we explore the addition of a distributional semantic model for characterizing contextual evidence of heart disease risk factor events. Using this semantic model, we raise our initial 2014 i2b2 Challenges in Natural Language of Clinical data F1 score of 0.838 to 0.890 and increased precision by 10.3% without use of any lexicons that might bias our results. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Improving integrative searching of systems chemical biology data using semantic annotation.

    PubMed

    Chen, Bin; Ding, Ying; Wild, David J

    2012-03-08

    Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.

  12. QTLTableMiner++: semantic mining of QTL tables in scientific articles.

    PubMed

    Singh, Gurnoor; Kuzniar, Arnold; van Mulligen, Erik M; Gavai, Anand; Bachem, Christian W; Visser, Richard G F; Finkers, Richard

    2018-05-25

    A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner ++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.

  13. Mining Quality Phrases from Massive Text Corpora

    PubMed Central

    Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei

    2015-01-01

    Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method. PMID:26705375

  14. Knowledge-guided mutation in classification rules for autism treatment efficacy.

    PubMed

    Engle, Kelley; Rada, Roy

    2017-03-01

    Data mining methods in biomedical research might benefit by combining genetic algorithms with domain-specific knowledge. The objective of this research is to show how the evolution of treatment rules for autism might be guided. The semantic distance between two concepts in the taxonomy is measured by the number of relationships separating the concepts in the taxonomy. The hypothesis is that replacing a concept in a treatment rule will change the accuracy of the rule in direct proportion to the semantic distance between the concepts. The method uses a patient database and autism taxonomies. Treatment rules are developed with an algorithm that exploits the taxonomies. The results support the hypothesis. This research should both advance the understanding of autism data mining in particular and of knowledge-guided evolutionary search in biomedicine in general.

  15. Brain model of text animation as a data mining strategy.

    PubMed

    Astakhova, Tamara; Astakhov, Vadim

    2009-01-01

    Imagination is the critical point in developing of realistic intelligence (AI) systems. One way to approach imagination would be simulation of its properties and operations. We developed two models "Brain Network Hierarchy of Languages," and "Semantical Holographic Calculus" and simulation system ScriptWriter that emulate the process of imagination through an automatic animation of English texts. The purpose of this paper is to demonstrate the model and present "ScriptWriter" system http://nvo.sdsc.edu/NVO/JCSG/get_SRB_mime_file2.cgi//home/tamara.sdsc/test/demo.zip?F=/home/tamara.sdsc/test/demo.zip&M=application/x-gtar for simulation of the imagination.

  16. Knowledge Discovery from Biomedical Ontologies in Cross Domains.

    PubMed

    Shen, Feichen; Lee, Yugyung

    2016-01-01

    In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.

  17. Knowledge Discovery from Biomedical Ontologies in Cross Domains

    PubMed Central

    Shen, Feichen; Lee, Yugyung

    2016-01-01

    In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies. PMID:27548262

  18. Web Video Event Recognition by Semantic Analysis From Ubiquitous Documents.

    PubMed

    Yu, Litao; Yang, Yang; Huang, Zi; Wang, Peng; Song, Jingkuan; Shen, Heng Tao

    2016-12-01

    In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyze video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of Web video event recognition, where Web videos often describe large-granular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from Web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous Web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model, which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video data sets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of Web video event recognition.

  19. Mining patterns in persistent surveillance systems with smart query and visual analytics

    NASA Astrophysics Data System (ADS)

    Habibi, Mohammad S.; Shirkhodaie, Amir

    2013-05-01

    In Persistent Surveillance Systems (PSS) the ability to detect and characterize events geospatially help take pre-emptive steps to counter adversary's actions. Interactive Visual Analytic (VA) model offers this platform for pattern investigation and reasoning to comprehend and/or predict such occurrences. The need for identifying and offsetting these threats requires collecting information from diverse sources, which brings with it increasingly abstract data. These abstract semantic data have a degree of inherent uncertainty and imprecision, and require a method for their filtration before being processed further. In this paper, we have introduced an approach based on Vector Space Modeling (VSM) technique for classification of spatiotemporal sequential patterns of group activities. The feature vectors consist of an array of attributes extracted from generated sensors semantic annotated messages. To facilitate proper similarity matching and detection of time-varying spatiotemporal patterns, a Temporal-Dynamic Time Warping (DTW) method with Gaussian Mixture Model (GMM) for Expectation Maximization (EM) is introduced. DTW is intended for detection of event patterns from neighborhood-proximity semantic frames derived from established ontology. GMM with EM, on the other hand, is employed as a Bayesian probabilistic model to estimated probability of events associated with a detected spatiotemporal pattern. In this paper, we present a new visual analytic tool for testing and evaluation group activities detected under this control scheme. Experimental results demonstrate the effectiveness of proposed approach for discovery and matching of subsequences within sequentially generated patterns space of our experiments.

  20. Enriching semantic knowledge bases for opinion mining in big data applications.

    PubMed

    Weichselbraun, A; Gindl, S; Scharl, A

    2014-10-01

    This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.

  1. DebugIT for patient safety - improving the treatment with antibiotics through multimedia data mining of heterogeneous clinical data.

    PubMed

    Lovis, Christian; Colaert, Dirk; Stroetmann, Veli N

    2008-01-01

    The concepts and architecture underlying a large-scale integrating project funded within the 7th EU Framework Programme (FP7) are discussed. The main objective of the project is to build a tool that will have a significant impact for the monitoring and the control of infectious diseases and antimicrobial resistances in Europe; This will be realized by building a technical and semantic infrastructure able to share heterogeneous clinical data sets from different hospitals in different countries, with different languages and legislations; to analyze large amounts of this clinical data with advanced multimedia data mining and finally apply the obtained knowledge for clinical decisions and outcome monitoring. There are numerous challenges in this project at all levels, technical, semantical, legal and ethical that will have to be addressed.

  2. Classifying patents based on their semantic content.

    PubMed

    Bergeaud, Antonin; Potiron, Yoann; Raimbault, Juste

    2017-01-01

    In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information.

  3. Classifying patents based on their semantic content

    PubMed Central

    2017-01-01

    In this paper, we extend some usual techniques of classification resulting from a large-scale data-mining and network approach. This new technology, which in particular is designed to be suitable to big data, is used to construct an open consolidated database from raw data on 4 million patents taken from the US patent office from 1976 onward. To build the pattern network, not only do we look at each patent title, but we also examine their full abstract and extract the relevant keywords accordingly. We refer to this classification as semantic approach in contrast with the more common technological approach which consists in taking the topology when considering US Patent office technological classes. Moreover, we document that both approaches have highly different topological measures and strong statistical evidence that they feature a different model. This suggests that our method is a useful tool to extract endogenous information. PMID:28445550

  4. Vaccine adverse event text mining system for extracting features from vaccine safety reports.

    PubMed

    Botsis, Taxiarchis; Buttolph, Thomas; Nguyen, Michael D; Winiecki, Scott; Woo, Emily Jane; Ball, Robert

    2012-01-01

    To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

  5. Benchmarking infrastructure for mutation text mining

    PubMed Central

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  6. Benchmarking infrastructure for mutation text mining.

    PubMed

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  7. Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.

    PubMed

    Percha, Bethany; Altman, Russ B

    2013-01-01

    The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.

  8. Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2013-01-01

    The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology. PMID:24551397

  9. The Topological Field Theory of Data: a program towards a novel strategy for data mining through data language

    NASA Astrophysics Data System (ADS)

    Rasetti, M.; Merelli, E.

    2015-07-01

    This paper aims to challenge the current thinking in IT for the 'Big Data' question, proposing - almost verbatim, with no formulas - a program aiming to construct an innovative methodology to perform data analytics in a way that returns an automaton as a recognizer of the data language: a Field Theory of Data. We suggest to build, directly out of probing data space, a theoretical framework enabling us to extract the manifold hidden relations (patterns) that exist among data, as correlations depending on the semantics generated by the mining context. The program, that is grounded in the recent innovative ways of integrating data into a topological setting, proposes the realization of a Topological Field Theory of Data, transferring and generalizing to the space of data notions inspired by physical (topological) field theories and harnesses the theory of formal languages to define the potential semantics necessary to understand the emerging patterns.

  10. Enriching semantic knowledge bases for opinion mining in big data applications

    PubMed Central

    Weichselbraun, A.; Gindl, S.; Scharl, A.

    2014-01-01

    This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process. PMID:25431524

  11. Corpus annotation for mining biomedical events from literature

    PubMed Central

    Kim, Jin-Dong; Ohta, Tomoko; Tsujii, Jun'ichi

    2008-01-01

    Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain. PMID:18182099

  12. A Framework of Knowledge Integration and Discovery for Supporting Pharmacogenomics Target Predication of Adverse Drug Events: A Case Study of Drug-Induced Long QT Syndrome.

    PubMed

    Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G

    2013-01-01

    Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.

  13. A Semantic Web-based System for Mining Genetic Mutations in Cancer Clinical Trials.

    PubMed

    Priya, Sambhawa; Jiang, Guoqian; Dasari, Surendra; Zimmermann, Michael T; Wang, Chen; Heflin, Jeff; Chute, Christopher G

    2015-01-01

    Textual eligibility criteria in clinical trial protocols contain important information about potential clinically relevant pharmacogenomic events. Manual curation for harvesting this evidence is intractable as it is error prone and time consuming. In this paper, we develop and evaluate a Semantic Web-based system that captures and manages mutation evidences and related contextual information from cancer clinical trials. The system has 2 main components: an NLP-based annotator and a Semantic Web ontology-based annotation manager. We evaluated the performance of the annotator in terms of precision and recall. We demonstrated the usefulness of the system by conducting case studies in retrieving relevant clinical trials using a collection of mutations identified from TCGA Leukemia patients and Atlas of Genetics and Cytogenetics in Oncology and Haematology. In conclusion, our system using Semantic Web technologies provides an effective framework for extraction, annotation, standardization and management of genetic mutations in cancer clinical trials.

  14. Mediator infrastructure for information integration and semantic data integration environment for biomedical research.

    PubMed

    Grethe, Jeffrey S; Ross, Edward; Little, David; Sanders, Brian; Gupta, Amarnath; Astakhov, Vadim

    2009-01-01

    This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the development of a cyberinfrastructure for biomedical research that supports advance data acquisition, data storage, data management, data integration, data mining, data visualization, and other computing and information processing services over the Internet. Each participating institution maintains storage of their experimental or computationally derived data. Mediator-based data integration system performs semantic integration over the databases to enable researchers to perform analyses based on larger and broader datasets than would be available from any single institution's data. This paper describes recent revision of the system architecture, implementation, and capabilities of the semantically based data integration environment for BIRN.

  15. The structural and content aspects of abstracts versus bodies of full text journal articles are different

    PubMed Central

    2010-01-01

    Background An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. Results We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts. We found content differences with respect to linguistic features. Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres. We also found content differences with respect to the distribution of semantic features. There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres. With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts. POS tagging was also more accurate in abstracts than in article bodies. Conclusions Aspects of structure and content differ markedly between article abstracts and article bodies. A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles. However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts. PMID:20920264

  16. Processable English: The Theory Behind the PENG System

    DTIC Science & Technology

    2009-06-01

    implicit - is often buried amongst masses of irrelevant data. Heralding from unstructured sources such as natural language documents, email, audio ...estimation and prediction, data-mining, social network analysis, and semantic search and visualisation . This report describes the theoretical

  17. Handling Dynamic Weights in Weighted Frequent Pattern Mining

    NASA Astrophysics Data System (ADS)

    Ahmed, Chowdhury Farhan; Tanbeer, Syed Khairuzzaman; Jeong, Byeong-Soo; Lee, Young-Koo

    Even though weighted frequent pattern (WFP) mining is more effective than traditional frequent pattern mining because it can consider different semantic significances (weights) of items, existing WFP algorithms assume that each item has a fixed weight. But in real world scenarios, the weight (price or significance) of an item can vary with time. Reflecting these changes in item weight is necessary in several mining applications, such as retail market data analysis and web click stream analysis. In this paper, we introduce the concept of a dynamic weight for each item, and propose an algorithm, DWFPM (dynamic weighted frequent pattern mining), that makes use of this concept. Our algorithm can address situations where the weight (price or significance) of an item varies dynamically. It exploits a pattern growth mining technique to avoid the level-wise candidate set generation-and-test methodology. Furthermore, it requires only one database scan, so it is eligible for use in stream data mining. An extensive performance analysis shows that our algorithm is efficient and scalable for WFP mining using dynamic weights.

  18. Text Mining to inform construction of Earth and Environmental Science Ontologies

    NASA Astrophysics Data System (ADS)

    Schildhauer, M.; Adams, B.; Rebich Hespanha, S.

    2013-12-01

    There is a clear need for better semantic representation of Earth and environmental concepts, to facilitate more effective discovery and re-use of information resources relevant to scientists doing integrative research. In order to develop general-purpose Earth and environmental science ontologies, however, it is necessary to represent concepts and relationships that span usage across multiple disciplines and scientific specialties. Traditional knowledge modeling through ontologies utilizes expert knowledge but inevitably favors the particular perspectives of the ontology engineers, as well as the domain experts who interacted with them. This often leads to ontologies that lack robust coverage of synonymy, while also missing important relationships among concepts that can be extremely useful for working scientists to be aware of. In this presentation we will discuss methods we have developed that utilize statistical topic modeling on a large corpus of Earth and environmental science articles, to expand coverage and disclose relationships among concepts in the Earth sciences. For our work we collected a corpus of over 121,000 abstracts from many of the top Earth and environmental science journals. We performed latent Dirichlet allocation topic modeling on this corpus to discover a set of latent topics, which consist of terms that commonly co-occur in abstracts. We match terms in the topics to concept labels in existing ontologies to reveal gaps, and we examine which terms are commonly associated in natural language discourse, to identify relationships that are important to formally model in ontologies. Our text mining methodology uncovers significant gaps in the content of some popular existing ontologies, and we show how, through a workflow involving human interpretation of topic models, we can bootstrap ontologies to have much better coverage and richer semantics. Because we base our methods directly on what working scientists are communicating about their research, it gives us an alternative bottom-up approach to populating and enriching ontologies, that complements more traditional knowledge modeling endeavors.

  19. Mining Adverse Drug Reactions in Social Media with Named Entity Recognition and Semantic Methods.

    PubMed

    Chen, Xiaoyi; Deldossi, Myrtille; Aboukhamis, Rim; Faviez, Carole; Dahamna, Badisse; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Girardeau, Yannick; Guillemin-Lanne, Sylvie; Lillo-Le-Louët, Agnès; Texier, Nathalie; Burgun, Anita; Katsahian, Sandrine

    2017-01-01

    Suspected adverse drug reactions (ADR) reported by patients through social media can be a complementary source to current pharmacovigilance systems. However, the performance of text mining tools applied to social media text data to discover ADRs needs to be evaluated. In this paper, we introduce the approach developed to mine ADR from French social media. A protocol of evaluation is highlighted, which includes a detailed sample size determination and evaluation corpus constitution. Our text mining approach provided very encouraging preliminary results with F-measures of 0.94 and 0.81 for recognition of drugs and symptoms respectively, and with F-measure of 0.70 for ADR detection. Therefore, this approach is promising for downstream pharmacovigilance analysis.

  20. Biomedical data mining in clinical routine: expanding the impact of hospital information systems.

    PubMed

    Müller, Marcel; Markó, Kornel; Daumke, Philipp; Paetzold, Jan; Roesner, Arnold; Klar, Rüdiger

    2007-01-01

    In this paper we want to describe how the promising technology of biomedical data mining can improve the use of hospital information systems: a large set of unstructured, narrative clinical data from a dermatological university hospital like discharge letters or other dermatological reports were processed through a morpho-semantic text retrieval engine ("MorphoSaurus") and integrated with other clinical data using a web-based interface and brought into daily clinical routine. The user evaluation showed a very high user acceptance - this system seems to meet the clinicians' requirements for a vertical data mining in the electronic patient records. What emerges is the need for integration of biomedical data mining into hospital information systems for clinical, scientific, educational and economic reasons.

  1. Building a Semantic Framework for eScience

    NASA Astrophysics Data System (ADS)

    Movva, S.; Ramachandran, R.; Maskey, M.; Li, X.

    2009-12-01

    The e-Science vision focuses on the use of advanced computing technologies to support scientists. Recent research efforts in this area have focused primarily on “enabling” use of infrastructure resources for both data and computational access especially in Geosciences. One of the existing gaps in the existing e-Science efforts has been the failure to incorporate stable semantic technologies within the design process itself. In this presentation, we describe our effort in designing a framework for e-Science built using Service Oriented Architecture. Our framework provides users capabilities to create science workflows and mine distributed data. Our e-Science framework is being designed around a mass market tool to promote reusability across many projects. Semantics is an integral part of this framework and our design goal is to leverage the latest stable semantic technologies. The use of these stable semantic technologies will provide the users of our framework the useful features such as: allow search engines to find their content with RDFa tags; create RDF triple data store for their content; create RDF end points to share with others; and semantically mash their content with other online content available as RDF end point.

  2. GoWeb: a semantic search engine for the life science web.

    PubMed

    Dietze, Heiko; Schroeder, Michael

    2009-10-01

    Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%. GoWeb's combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: http://www.gopubmed.org/goweb.

  3. Survey of Knowledge Representation and Reasoning Systems

    DTIC Science & Technology

    2009-07-01

    processing large volumes of unstructured information such as natural language documents, email, audio , images and video [Ferrucci et al. 2006]. Using this...information we hope to obtain improved es- timation and prediction, data-mining, social network analysis, and semantic search and visualisation . Knowledge

  4. SPECTRa-T: machine-based data extraction and semantic searching of chemistry e-theses.

    PubMed

    Downing, Jim; Harvey, Matt J; Morgan, Peter B; Murray-Rust, Peter; Rzepa, Henry S; Stewart, Diana C; Tonge, Alan P; Townsend, Joe A

    2010-02-22

    The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.

  5. SemaTyP: a knowledge graph based literature mining method for drug discovery.

    PubMed

    Sang, Shengtian; Yang, Zhihao; Wang, Lei; Liu, Xiaoxia; Lin, Hongfei; Wang, Jian

    2018-05-30

    Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.

  6. Data Mining and Knowledge Discovery tools for exploiting big Earth-Observation data

    NASA Astrophysics Data System (ADS)

    Espinoza Molina, D.; Datcu, M.

    2015-04-01

    The continuous increase in the size of the archives and in the variety and complexity of Earth-Observation (EO) sensors require new methodologies and tools that allow the end-user to access a large image repository, to extract and to infer knowledge about the patterns hidden in the images, to retrieve dynamically a collection of relevant images, and to support the creation of emerging applications (e.g.: change detection, global monitoring, disaster and risk management, image time series, etc.). In this context, we are concerned with providing a platform for data mining and knowledge discovery content from EO archives. The platform's goal is to implement a communication channel between Payload Ground Segments and the end-user who receives the content of the data coded in an understandable format associated with semantics that is ready for immediate exploitation. It will provide the user with automated tools to explore and understand the content of highly complex images archives. The challenge lies in the extraction of meaningful information and understanding observations of large extended areas, over long periods of time, with a broad variety of EO imaging sensors in synergy with other related measurements and data. The platform is composed of several components such as 1.) ingestion of EO images and related data providing basic features for image analysis, 2.) query engine based on metadata, semantics and image content, 3.) data mining and knowledge discovery tools for supporting the interpretation and understanding of image content, 4.) semantic definition of the image content via machine learning methods. All these components are integrated and supported by a relational database management system, ensuring the integrity and consistency of Terabytes of Earth Observation data.

  7. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media

    PubMed Central

    Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel

    2013-01-01

    Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295

  8. Leveraging Semantic Labels for Multi-level Abstraction in Medical Process Mining and Trace Comparison.

    PubMed

    Leonardi, Giorgio; Striani, Manuel; Quaglini, Silvana; Cavallini, Anna; Montani, Stefania

    2018-05-21

    Many medical information systems record data about the executed process instances in the form of an event log. In this paper, we present a framework, able to convert actions in the event log into higher level concepts, at different levels of abstraction, on the basis of domain knowledge. Abstracted traces are then provided as an input to trace comparison and semantic process discovery. Our abstraction mechanism is able to manage non trivial situations, such as interleaved actions or delays between two actions that abstract to the same concept. Trace comparison resorts to a similarity metric able to take into account abstraction phase penalties, and to deal with quantitative and qualitative temporal constraints in abstracted traces. As for process discovery, we rely on classical algorithms embedded in the framework ProM, made semantic by the capability of abstracting the actions on the basis of their conceptual meaning. The approach has been tested in stroke care, where we adopted abstraction and trace comparison to cluster event logs of different stroke units, to highlight (in)correct behavior, abstracting from details. We also provide process discovery results, showing how the abstraction mechanism allows to obtain stroke process models more easily interpretable by neurologists. Copyright © 2018. Published by Elsevier Inc.

  9. Thai Language Sentence Similarity Computation Based on Syntactic Structure and Semantic Vector

    NASA Astrophysics Data System (ADS)

    Wang, Hongbin; Feng, Yinhan; Cheng, Liang

    2018-03-01

    Sentence similarity computation plays an increasingly important role in text mining, Web page retrieval, machine translation, speech recognition and question answering systems. Thai language as a kind of resources scarce language, it is not like Chinese language with HowNet and CiLin resources. So the Thai sentence similarity research faces some challenges. In order to solve this problem of the Thai language sentence similarity computation. This paper proposes a novel method to compute the similarity of Thai language sentence based on syntactic structure and semantic vector. This method firstly uses the Part-of-Speech (POS) dependency to calculate two sentences syntactic structure similarity, and then through the word vector to calculate two sentences semantic similarity. Finally, we combine the two methods to calculate two Thai language sentences similarity. The proposed method not only considers semantic, but also considers the sentence syntactic structure. The experiment result shows that this method in Thai language sentence similarity computation is feasible.

  10. Learning Semantic Tags from Big Data for Clinical Text Representation.

    PubMed

    Li, Yanpeng; Liu, Hongfang

    2015-01-01

    In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level.

  11. Computable visually observed phenotype ontological framework for plants

    PubMed Central

    2011-01-01

    Background The ability to search for and precisely compare similar phenotypic appearances within and across species has vast potential in plant science and genetic research. The difficulty in doing so lies in the fact that many visual phenotypic data, especially visually observed phenotypes that often times cannot be directly measured quantitatively, are in the form of text annotations, and these descriptions are plagued by semantic ambiguity, heterogeneity, and low granularity. Though several bio-ontologies have been developed to standardize phenotypic (and genotypic) information and permit comparisons across species, these semantic issues persist and prevent precise analysis and retrieval of information. A framework suitable for the modeling and analysis of precise computable representations of such phenotypic appearances is needed. Results We have developed a new framework called the Computable Visually Observed Phenotype Ontological Framework for plants. This work provides a novel quantitative view of descriptions of plant phenotypes that leverages existing bio-ontologies and utilizes a computational approach to capture and represent domain knowledge in a machine-interpretable form. This is accomplished by means of a robust and accurate semantic mapping module that automatically maps high-level semantics to low-level measurements computed from phenotype imagery. The framework was applied to two different plant species with semantic rules mined and an ontology constructed. Rule quality was evaluated and showed high quality rules for most semantics. This framework also facilitates automatic annotation of phenotype images and can be adopted by different plant communities to aid in their research. Conclusions The Computable Visually Observed Phenotype Ontological Framework for plants has been developed for more efficient and accurate management of visually observed phenotypes, which play a significant role in plant genomics research. The uniqueness of this framework is its ability to bridge the knowledge of informaticians and plant science researchers by translating descriptions of visually observed phenotypes into standardized, machine-understandable representations, thus enabling the development of advanced information retrieval and phenotype annotation analysis tools for the plant science community. PMID:21702966

  12. Link-topic model for biomedical abbreviation disambiguation.

    PubMed

    Kim, Seonho; Yoon, Juntae

    2015-02-01

    The ambiguity of biomedical abbreviations is one of the challenges in biomedical text mining systems. In particular, the handling of term variants and abbreviations without nearby definitions is a critical issue. In this study, we adopt the concepts of topic of document and word link to disambiguate biomedical abbreviations. We newly suggest the link topic model inspired by the latent Dirichlet allocation model, in which each document is perceived as a random mixture of topics, where each topic is characterized by a distribution over words. Thus, the most probable expansions with respect to abbreviations of a given abstract are determined by word-topic, document-topic, and word-link distributions estimated from a document collection through the link topic model. The model allows two distinct modes of word generation to incorporate semantic dependencies among words, particularly long form words of abbreviations and their sentential co-occurring words; a word can be generated either dependently on the long form of the abbreviation or independently. The semantic dependency between two words is defined as a link and a new random parameter for the link is assigned to each word as well as a topic parameter. Because the link status indicates whether the word constitutes a link with a given specific long form, it has the effect of determining whether a word forms a unigram or a skipping/consecutive bigram with respect to the long form. Furthermore, we place a constraint on the model so that a word has the same topic as a specific long form if it is generated in reference to the long form. Consequently, documents are generated from the two hidden parameters, i.e. topic and link, and the most probable expansion of a specific abbreviation is estimated from the parameters. Our model relaxes the bag-of-words assumption of the standard topic model in which the word order is neglected, and it captures a richer structure of text than does the standard topic model by considering unigrams and semantically associated bigrams simultaneously. The addition of semantic links improves the disambiguation accuracy without removing irrelevant contextual words and reduces the parameter space of massive skipping or consecutive bigrams. The link topic model achieves 98.42% disambiguation accuracy on 73,505 MEDLINE abstracts with respect to 21 three letter abbreviations and their 139 distinct long forms. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Vaccine Hesitancy in Discussion Forums: Computer-Assisted Argument Mining with Topic Models.

    PubMed

    Skeppstedt, Maria; Kerren, Andreas; Stede, Manfred

    2018-01-01

    Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually coding the posts ranked as most typical for these six topics, a set of semantically coherent arguments were identified for each extracted topic. This indicates that topic modelling is a useful method for automatically identifying vaccine-related discussion topics and for identifying debate posts where these topics are discussed. This functionality could facilitate manual coding of salient arguments, and thereby form an important component in a system for computer-assisted coding of vaccine-related discussions.

  14. A review of EO image information mining

    NASA Astrophysics Data System (ADS)

    Quartulli, Marco; Olaizola, Igor G.

    2013-01-01

    We analyze the state of the art of content-based retrieval in Earth observation image archives focusing on complete systems showing promise for operational implementation. The different paradigms at the basis of the main system families are introduced. The approaches taken are considered, focusing in particular on the phases after primitive feature extraction. The solutions envisaged for the issues related to feature simplification and synthesis, indexing, semantic labeling are reviewed. The methodologies for query specification and execution are evaluated. Conclusions are drawn on the state of published research in Earth observation (EO) mining.

  15. BioStar models of clinical and genomic data for biomedical data warehouse design

    PubMed Central

    Wang, Liangjiang; Ramanathan, Murali

    2008-01-01

    Biomedical research is now generating large amounts of data, ranging from clinical test results to microarray gene expression profiles. The scale and complexity of these datasets give rise to substantial challenges in data management and analysis. It is highly desirable that data warehousing and online analytical processing technologies can be applied to biomedical data integration and mining. The major difficulty probably lies in the task of capturing and modelling diverse biological objects and their complex relationships. This paper describes multidimensional data modelling for biomedical data warehouse design. Since the conventional models such as star schema appear to be insufficient for modelling clinical and genomic data, we develop a new model called BioStar schema. The new model can capture the rich semantics of biomedical data and provide greater extensibility for the fast evolution of biological research methodologies. PMID:18048122

  16. Linking descriptive geology and quantitative machine learning through an ontology of lithological concepts

    NASA Astrophysics Data System (ADS)

    Klump, J. F.; Huber, R.; Robertson, J.; Cox, S. J. D.; Woodcock, R.

    2014-12-01

    Despite the recent explosion of quantitative geological data, geology remains a fundamentally qualitative science. Numerical data only constitute a certain part of data collection in the geosciences. In many cases, geological observations are compiled as text into reports and annotations on drill cores, thin sections or drawings of outcrops. The observations are classified into concepts such as lithology, stratigraphy, geological structure, etc. These descriptions are semantically rich and are generally supported by more quantitative observations using geochemical analyses, XRD, hyperspectral scanning, etc, but the goal is geological semantics. In practice it has been difficult to bring the different observations together due to differing perception or granularity of classification in human observation, or the partial observation of only some characteristics using quantitative sensors. In the past years many geological classification schemas have been transferred into ontologies and vocabularies, formalized using RDF and OWL, and published through SPARQL endpoints. Several lithological ontologies were compiled by stratigraphy.net and published through a SPARQL endpoint. This work is complemented by the development of a Python API to integrate this vocabulary into Python-based text mining applications. The applications for the lithological vocabulary and Python API are automated semantic tagging of geochemical data and descriptions of drill cores, machine learning of geochemical compositions that are diagnostic for lithological classifications, and text mining for lithological concepts in reports and geological literature. This combination of applications can be used to identify anomalies in databases, where composition and lithological classification do not match. It can also be used to identify lithological concepts in the literature and infer quantitative values. The resulting semantic tagging opens new possibilities for linking these diverse sources of data.

  17. OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

    PubMed

    Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René

    2011-10-01

    Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.

  18. A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain.

    PubMed

    Hassanpour, Saeed; O'Connor, Martin J; Das, Amar K

    2013-08-12

    A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.

  19. Discovery of novel biomarkers and phenotypes by semantic technologies

    PubMed Central

    2013-01-01

    Background Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents. Results This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated. Conclusions The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions. PMID:23402646

  20. Signal Detection Framework Using Semantic Text Mining Techniques

    ERIC Educational Resources Information Center

    Sudarsan, Sithu D.

    2009-01-01

    Signal detection is a challenging task for regulatory and intelligence agencies. Subject matter experts in those agencies analyze documents, generally containing narrative text in a time bound manner for signals by identification, evaluation and confirmation, leading to follow-up action e.g., recalling a defective product or public advisory for…

  1. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

    PubMed Central

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698

  2. PREDOSE: a semantic web platform for drug abuse epidemiology using social media.

    PubMed

    Cameron, Delroy; Smith, Gary A; Daniulaityte, Raminta; Sheth, Amit P; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z; Falck, Russel

    2013-12-01

    The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO--pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Graph-Theoretic Properties of Networks Based on Word Association Norms: Implications for Models of Lexical Semantic Memory

    ERIC Educational Resources Information Center

    Gruenenfelder, Thomas M.; Recchia, Gabriel; Rubin, Tim; Jones, Michael N.

    2016-01-01

    We compared the ability of three different contextual models of lexical semantic memory (BEAGLE, Latent Semantic Analysis, and the Topic model) and of a simple associative model (POC) to predict the properties of semantic networks derived from word association norms. None of the semantic models were able to accurately predict all of the network…

  4. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

    PubMed

    Peng, Shengwen; You, Ronghui; Wang, Hongning; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-06-15

    Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. The software is available upon request. zhusf@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  5. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    PubMed

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  6. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

    PubMed

    Rebholz-Schuhmann, Dietrich; Grabmüller, Christoph; Kavaliauskas, Silvestras; Croset, Samuel; Woollard, Peter; Backofen, Rolf; Filsell, Wendy; Clark, Dominic

    2014-07-01

    In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker. Copyright © 2013. Published by Elsevier Ltd.

  7. Semantic-Web Architecture for Electronic Discharge Summary Based on OWL 2.0 Standard.

    PubMed

    Tahmasebian, Shahram; Langarizadeh, Mostafa; Ghazisaeidi, Marjan; Safdari, Reza

    2016-06-01

    Patients' electronic medical record contains all information related to treatment processes during hospitalization. One of the most important documents in this record is the record summary. In this document, summary of the whole treatment process is presented which is used for subsequent treatments and other issues pertaining to the treatment. Using suitable architecture for this document, apart from the aforementioned points we can use it in other fields such as data mining or decision making based on the cases. In this study, at first, a model for patient's medical record summary has been suggested using semantic web-based architecture. Then, based on service-oriented architecture and using Java programming language, a software solution was designed and run in a way to generate medical record summary with this structure and at the end, new uses of this structure was explained. in this study a structure for medical record summaries along with corrective points within semantic web has been offered and a software running within Java along with special ontologies are provided. After discussing the project with the experts of medical/health data management and medical informatics as well as clinical experts, it became clear that suggested design for medical record summary apart from covering many issues currently faced in the medical records has also many advantages including its uses in research projects, decision making based on the cases etc.

  8. Mining the Human Phenome using Semantic Web Technologies: A Case Study for Type 2 Diabetes

    PubMed Central

    Pathak, Jyotishman; Kiefer, Richard C.; Bielinski, Suzette J.; Chute, Christopher G.

    2012-01-01

    The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. PMID:23304343

  9. Mechanism-based Pharmacovigilance over the Life Sciences Linked Open Data Cloud.

    PubMed

    Kamdar, Maulik R; Musen, Mark A

    2017-01-01

    Adverse drug reactions (ADR) result in significant morbidity and mortality in patients, and a substantial proportion of these ADRs are caused by drug-drug interactions (DDIs). Pharmacovigilance methods are used to detect unanticipated DDIs and ADRs by mining Spontaneous Reporting Systems, such as the US FDA Adverse Event Reporting System (FAERS). However, these methods do not provide mechanistic explanations for the discovered drug-ADR associations in a systematic manner. In this paper, we present a systems pharmacology-based approach to perform mechanism-based pharmacovigilance. We integrate data and knowledge from four different sources using Semantic Web Technologies and Linked Data principles to generate a systems network. We present a network-based Apriori algorithm for association mining in FAERS reports. We evaluate our method against existing pharmacovigilance methods for three different validation sets. Our method has AUROC statistics of 0.7-0.8, similar to current methods, and event-specific thresholds generate AUROC statistics greater than 0.75 for certain ADRs. Finally, we discuss the benefits of using Semantic Web technologies to attain the objectives for mechanism-based pharmacovigilance.

  10. Mining the human phenome using semantic web technologies: a case study for Type 2 Diabetes.

    PubMed

    Pathak, Jyotishman; Kiefer, Richard C; Bielinski, Suzette J; Chute, Christopher G

    2012-01-01

    The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.

  11. BiSet: Semantic Edge Bundling with Biclusters for Sensemaking.

    PubMed

    Sun, Maoyuan; Mi, Peng; North, Chris; Ramakrishnan, Naren

    2016-01-01

    Identifying coordinated relationships is an important task in data analytics. For example, an intelligence analyst might want to discover three suspicious people who all visited the same four cities. Existing techniques that display individual relationships, such as between lists of entities, require repetitious manual selection and significant mental aggregation in cluttered visualizations to find coordinated relationships. In this paper, we present BiSet, a visual analytics technique to support interactive exploration of coordinated relationships. In BiSet, we model coordinated relationships as biclusters and algorithmically mine them from a dataset. Then, we visualize the biclusters in context as bundled edges between sets of related entities. Thus, bundles enable analysts to infer task-oriented semantic insights about potentially coordinated activities. We make bundles as first class objects and add a new layer, "in-between", to contain these bundle objects. Based on this, bundles serve to organize entities represented in lists and visually reveal their membership. Users can interact with edge bundles to organize related entities, and vice versa, for sensemaking purposes. With a usage scenario, we demonstrate how BiSet supports the exploration of coordinated relationships in text analytics.

  12. Leveraging Pattern Semantics for Extracting Entities in Enterprises

    PubMed Central

    Tao, Fangbo; Zhao, Bo; Fuxman, Ariel; Li, Yang; Han, Jiawei

    2015-01-01

    Entity Extraction is a process of identifying meaningful entities from text documents. In enterprises, extracting entities improves enterprise efficiency by facilitating numerous applications, including search, recommendation, etc. However, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redundancy of enterprise entities makes previous web-based systems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enterprises (“Blue” refers to “Windows Blue”), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage public signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined semantics, and yield more accurate and complete entities. Experiments on Microsoft enterprise data show the effectiveness of our approach. PMID:26705540

  13. Leveraging Pattern Semantics for Extracting Entities in Enterprises.

    PubMed

    Tao, Fangbo; Zhao, Bo; Fuxman, Ariel; Li, Yang; Han, Jiawei

    2015-05-01

    Entity Extraction is a process of identifying meaningful entities from text documents. In enterprises, extracting entities improves enterprise efficiency by facilitating numerous applications, including search, recommendation, etc. However, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redundancy of enterprise entities makes previous web-based systems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enterprises ("Blue" refers to "Windows Blue"), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage public signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined semantics, and yield more accurate and complete entities. Experiments on Microsoft enterprise data show the effectiveness of our approach.

  14. LEARNING SEMANTICS-ENHANCED LANGUAGE MODELS APPLIED TO UNSUEPRVISED WSD

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    VERSPOOR, KARIN; LIN, SHOU-DE

    An N-gram language model aims at capturing statistical syntactic word order information from corpora. Although the concept of language models has been applied extensively to handle a variety of NLP problems with reasonable success, the standard model does not incorporate semantic information, and consequently limits its applicability to semantic problems such as word sense disambiguation. We propose a framework that integrates semantic information into the language model schema, allowing a system to exploit both syntactic and semantic information to address NLP problems. Furthermore, acknowledging the limited availability of semantically annotated data, we discuss how the proposed model can be learnedmore » without annotated training examples. Finally, we report on a case study showing how the semantics-enhanced language model can be applied to unsupervised word sense disambiguation with promising results.« less

  15. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  16. Semantic Annotation of Complex Text Structures in Problem Reports

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Throop, David R.; Fleming, Land D.

    2011-01-01

    Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.

  17. EEG Decoding of Semantic Category Reveals Distributed Representations for Single Concepts

    ERIC Educational Resources Information Center

    Murphy, Brian; Poesio, Massimo; Bovolo, Francesca; Bruzzone, Lorenzo; Dalponte, Michele; Lakany, Heba

    2011-01-01

    Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual…

  18. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing

    PubMed Central

    Peng, Shengwen; You, Ronghui; Wang, Hongning; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-01-01

    Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. Methods: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the ‘learning to rank’ framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. Results: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. Availability and Implementation: The software is available upon request. Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307646

  19. A Complex Network Approach to Distributional Semantic Models

    PubMed Central

    Utsumi, Akira

    2015-01-01

    A number of studies on network analysis have focused on language networks based on free word association, which reflects human lexical knowledge, and have demonstrated the small-world and scale-free properties in the word association network. Nevertheless, there have been very few attempts at applying network analysis to distributional semantic models, despite the fact that these models have been studied extensively as computational or cognitive models of human lexical knowledge. In this paper, we analyze three network properties, namely, small-world, scale-free, and hierarchical properties, of semantic networks created by distributional semantic models. We demonstrate that the created networks generally exhibit the same properties as word association networks. In particular, we show that the distribution of the number of connections in these networks follows the truncated power law, which is also observed in an association network. This indicates that distributional semantic models can provide a plausible model of lexical knowledge. Additionally, the observed differences in the network properties of various implementations of distributional semantic models are consistently explained or predicted by considering the intrinsic semantic features of a word-context matrix and the functions of matrix weighting and smoothing. Furthermore, to simulate a semantic network with the observed network properties, we propose a new growing network model based on the model of Steyvers and Tenenbaum. The idea underlying the proposed model is that both preferential and random attachments are required to reflect different types of semantic relations in network growth process. We demonstrate that this model provides a better explanation of network behaviors generated by distributional semantic models. PMID:26295940

  20. Using a Search Engine-Based Mutually Reinforcing Approach to Assess the Semantic Relatedness of Biomedical Terms

    PubMed Central

    Hsu, Yi-Yu; Chen, Hung-Yu; Kao, Hung-Yu

    2013-01-01

    Background Determining the semantic relatedness of two biomedical terms is an important task for many text-mining applications in the biomedical field. Previous studies, such as those using ontology-based and corpus-based approaches, measured semantic relatedness by using information from the structure of biomedical literature, but these methods are limited by the small size of training resources. To increase the size of training datasets, the outputs of search engines have been used extensively to analyze the lexical patterns of biomedical terms. Methodology/Principal Findings In this work, we propose the Mutually Reinforcing Lexical Pattern Ranking (ReLPR) algorithm for learning and exploring the lexical patterns of synonym pairs in biomedical text. ReLPR employs lexical patterns and their pattern containers to assess the semantic relatedness of biomedical terms. By combining sentence structures and the linking activities between containers and lexical patterns, our algorithm can explore the correlation between two biomedical terms. Conclusions/Significance The average correlation coefficient of the ReLPR algorithm was 0.82 for various datasets. The results of the ReLPR algorithm were significantly superior to those of previous methods. PMID:24348899

  1. A semantic web framework to integrate cancer omics data with biological knowledge.

    PubMed

    Holford, Matthew E; McCusker, James P; Cheung, Kei-Hoi; Krauthammer, Michael

    2012-01-25

    The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily.

  2. Text Mining the History of Medicine.

    PubMed

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform.

  3. Text Mining the History of Medicine

    PubMed Central

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform. PMID:26734936

  4. Construction of an annotated corpus to support biomedical information extraction

    PubMed Central

    Thompson, Paul; Iqbal, Syed A; McNaught, John; Ananiadou, Sophia

    2009-01-01

    Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes. PMID:19852798

  5. The Function of Semantics in Automated Language Processing.

    ERIC Educational Resources Information Center

    Pacak, Milos; Pratt, Arnold W.

    This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language. Current approaches to semantic analysis and logical interference are based mainly on models of human cognitive processes such as Quillian's semantic memory, Simmon's Protosynthex III and others. All existing…

  6. A new compact representation of morphological profiles: report on first massive VHR image processing at the JRC

    NASA Astrophysics Data System (ADS)

    Pesaresi, Martino; Ouzounis, Georgios K.; Gueguen, Lionel

    2012-06-01

    A new compact representation of dierential morphological prole (DMP) vector elds is presented. It is referred to as the CSL model and is conceived to radically reduce the dimensionality of the DMP descriptors. The model maps three characteristic parameters, namely scale, saliency and level, into the RGB space through a HSV transform. The result is a a medium abstraction semantic layer used for visual exploration, image information mining and pattern classication. Fused with the PANTEX built-up presence index, the CSL model converges to an approximate building footprint representation layer in which color represents building class labels. This process is demonstrated on the rst high resolution (HR) global human settlement layer (GHSL) computed from multi-modal HR and VHR satellite images. Results of the rst massive processing exercise involving several thousands of scenes around the globe are reported along with validation gures.

  7. GEM at 10: a decade's experience with the Guideline Elements Model.

    PubMed

    Hajizadeh, Negin; Kashyap, Nitu; Michel, George; Shiffman, Richard N

    2011-01-01

    The Guideline Elements Model (GEM) was developed in 2000 to organize the information contained in clinical practice guidelines using XML and to represent guideline content in a form that can be understood by human readers and processed by computers. In this work, we systematically reviewed the literature to better understand how GEM was being used, potential barriers to its use, and suggestions for improvement. Fifty external and twelve internally produced publications were identified and analyzed. GEM was used most commonly for modeling and ontology creation. Other investigators applied GEM for knowledge extraction and data mining, for clinical decision support for guideline generation. The GEM Cutter software-used to markup guidelines for translation into XML- has been downloaded 563 times since 2000. Although many investigators found GEM to be valuable, others critiqued its failure to clarify guideline semantics, difficulties in markup, and the fact that GEM files are not usually executable.

  8. Using semantic data modeling techniques to organize an object-oriented database for extending the mass storage model

    NASA Technical Reports Server (NTRS)

    Campbell, William J.; Short, Nicholas M., Jr.; Roelofs, Larry H.; Dorfman, Erik

    1991-01-01

    A methodology for optimizing organization of data obtained by NASA earth and space missions is discussed. The methodology uses a concept based on semantic data modeling techniques implemented in a hierarchical storage model. The modeling is used to organize objects in mass storage devices, relational database systems, and object-oriented databases. The semantic data modeling at the metadata record level is examined, including the simulation of a knowledge base and semantic metadata storage issues. The semantic data model hierarchy and its application for efficient data storage is addressed, as is the mapping of the application structure to the mass storage.

  9. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying

    PubMed Central

    Kiefer, Richard C.; Freimuth, Robert R.; Chute, Christopher G; Pathak, Jyotishman

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively. PMID:24303249

  10. Semantic retrieval and navigation in clinical document collections.

    PubMed

    Kreuzthaler, Markus; Daumke, Philipp; Schulz, Stefan

    2015-01-01

    Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool of 1,696 de-identified discharge summaries was used for prototyping. A natural language processing toolset for document annotation (based on the text-mining framework UIMA) and indexing (Solr) was used to support a browser-based platform for document import, search and navigation. The integrated search engine combines free text and concept-based querying, supported by dynamically generated facets (diagnoses, procedures, medications, lab values, and body parts). The prototype demonstrates the feasibility of semantic document enrichment within document collections of a single patient. Originally conceived as an add-on for the clinical workplace, this technology could also be adapted to support personalised health record platforms, as well as cross-patient search for cohort building and other secondary use scenarios.

  11. Qualitative dynamics semantics for SBGN process description.

    PubMed

    Rougny, Adrien; Froidevaux, Christine; Calzone, Laurence; Paulevé, Loïc

    2016-06-16

    Qualitative dynamics semantics provide a coarse-grain modeling of networks dynamics by abstracting away kinetic parameters. They allow to capture general features of systems dynamics, such as attractors or reachability properties, for which scalable analyses exist. The Systems Biology Graphical Notation Process Description language (SBGN-PD) has become a standard to represent reaction networks. However, no qualitative dynamics semantics taking into account all the main features available in SBGN-PD had been proposed so far. We propose two qualitative dynamics semantics for SBGN-PD reaction networks, namely the general semantics and the stories semantics, that we formalize using asynchronous automata networks. While the general semantics extends standard Boolean semantics of reaction networks by taking into account all the main features of SBGN-PD, the stories semantics allows to model several molecules of a network by a unique variable. The obtained qualitative models can be checked against dynamical properties and therefore validated with respect to biological knowledge. We apply our framework to reason on the qualitative dynamics of a large network (more than 200 nodes) modeling the regulation of the cell cycle by RB/E2F. The proposed semantics provide a direct formalization of SBGN-PD networks in dynamical qualitative models that can be further analyzed using standard tools for discrete models. The dynamics in stories semantics have a lower dimension than the general one and prune multiple behaviors (which can be considered as spurious) by enforcing the mutual exclusiveness between the activity of different nodes of a same story. Overall, the qualitative semantics for SBGN-PD allow to capture efficiently important dynamical features of reaction network models and can be exploited to further refine them.

  12. City model enrichment

    NASA Astrophysics Data System (ADS)

    Smart, Philip D.; Quinn, Jonathan A.; Jones, Christopher B.

    The combination of mobile communication technology with location and orientation aware digital cameras has introduced increasing interest in the exploitation of 3D city models for applications such as augmented reality and automated image captioning. The effectiveness of such applications is, at present, severely limited by the often poor quality of semantic annotation of the 3D models. In this paper, we show how freely available sources of georeferenced Web 2.0 information can be used for automated enrichment of 3D city models. Point referenced names of prominent buildings and landmarks mined from Wikipedia articles and from the OpenStreetMaps digital map and Geonames gazetteer have been matched to the 2D ground plan geometry of a 3D city model. In order to address the ambiguities that arise in the associations between these sources and the city model, we present procedures to merge potentially related buildings and implement fuzzy matching between reference points and building polygons. An experimental evaluation demonstrates the effectiveness of the presented methods.

  13. A computational modeling of semantic knowledge in reading comprehension: Integrating the landscape model with latent semantic analysis.

    PubMed

    Yeari, Menahem; van den Broek, Paul

    2016-09-01

    It is a well-accepted view that the prior semantic (general) knowledge that readers possess plays a central role in reading comprehension. Nevertheless, computational models of reading comprehension have not integrated the simulation of semantic knowledge and online comprehension processes under a unified mathematical algorithm. The present article introduces a computational model that integrates the landscape model of comprehension processes with latent semantic analysis representation of semantic knowledge. In three sets of simulations of previous behavioral findings, the integrated model successfully simulated the activation and attenuation of predictive and bridging inferences during reading, as well as centrality estimations and recall of textual information after reading. Analyses of the computational results revealed new theoretical insights regarding the underlying mechanisms of the various comprehension phenomena.

  14. Assimilating Text-Mining & Bio-Informatics Tools to Analyze Cellulase structures

    NASA Astrophysics Data System (ADS)

    Satyasree, K. P. N. V., Dr; Lalitha Kumari, B., Dr; Jyotsna Devi, K. S. N. V.; Choudri, S. M. Roy; Pratap Joshi, K.

    2017-08-01

    Text-mining is one of the best potential way of automatically extracting information from the huge biological literature. To exploit its prospective, the knowledge encrypted in the text should be converted to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. But text mining could be helpful for generating or validating predictions. Cellulases have abundant applications in various industries. Cellulose degrading enzymes are cellulases and the same producing bacteria - Bacillus subtilis & fungus Pseudomonas putida were isolated from top soil of Guntur Dt. A.P. India. Absolute cultures were conserved on potato dextrose agar medium for molecular studies. In this paper, we presented how well the text mining concepts can be used to analyze cellulase producing bacteria and fungi, their comparative structures are also studied with the aid of well-establised, high quality standard bioinformatic tools such as Bioedit, Swissport, Protparam, EMBOSSwin with which a complete data on Cellulases like structure, constituents of the enzyme has been obtained.

  15. Unsupervised user similarity mining in GSM sensor networks.

    PubMed

    Shad, Shafqat Ali; Chen, Enhong

    2013-01-01

    Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining.

  16. A semantic web framework to integrate cancer omics data with biological knowledge

    PubMed Central

    2012-01-01

    Background The RDF triple provides a simple linguistic means of describing limitless types of information. Triples can be flexibly combined into a unified data source we call a semantic model. Semantic models open new possibilities for the integration of variegated biological data. We use Semantic Web technology to explicate high throughput clinical data in the context of fundamental biological knowledge. We have extended Corvus, a data warehouse which provides a uniform interface to various forms of Omics data, by providing a SPARQL endpoint. With the querying and reasoning tools made possible by the Semantic Web, we were able to explore quantitative semantic models retrieved from Corvus in the light of systematic biological knowledge. Results For this paper, we merged semantic models containing genomic, transcriptomic and epigenomic data from melanoma samples with two semantic models of functional data - one containing Gene Ontology (GO) data, the other, regulatory networks constructed from transcription factor binding information. These two semantic models were created in an ad hoc manner but support a common interface for integration with the quantitative semantic models. Such combined semantic models allow us to pose significant translational medicine questions. Here, we study the interplay between a cell's molecular state and its response to anti-cancer therapy by exploring the resistance of cancer cells to Decitabine, a demethylating agent. Conclusions We were able to generate a testable hypothesis to explain how Decitabine fights cancer - namely, that it targets apoptosis-related gene promoters predominantly in Decitabine-sensitive cell lines, thus conveying its cytotoxic effect by activating the apoptosis pathway. Our research provides a framework whereby similar hypotheses can be developed easily. PMID:22373303

  17. Evidence for the contribution of a threshold retrieval process to semantic memory.

    PubMed

    Kempnich, Maria; Urquhart, Josephine A; O'Connor, Akira R; Moulin, Chris J A

    2017-10-01

    It is widely held that episodic retrieval can recruit two processes: a threshold context retrieval process (recollection) and a continuous signal strength process (familiarity). Conversely the processes recruited during semantic retrieval are less well specified. We developed a semantic task analogous to single-item episodic recognition to interrogate semantic recognition receiver-operating characteristics (ROCs) for a marker of a threshold retrieval process. We fitted observed ROC points to three signal detection models: two models typically used in episodic recognition (unequal variance and dual-process signal detection models) and a novel dual-process recollect-to-reject (DP-RR) signal detection model that allows a threshold recollection process to aid both target identification and lure rejection. Given the nature of most semantic questions, we anticipated the DP-RR model would best fit the semantic task data. Experiment 1 (506 participants) provided evidence for a threshold retrieval process in semantic memory, with overall best fits to the DP-RR model. Experiment 2 (316 participants) found within-subjects estimates of episodic and semantic threshold retrieval to be uncorrelated. Our findings add weight to the proposal that semantic and episodic memory are served by similar dual-process retrieval systems, though the relationship between the two threshold processes needs to be more fully elucidated.

  18. Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases

    NASA Astrophysics Data System (ADS)

    Hoehndorf, Robert; Schofield, Paul N.; Gkoutos, Georgios V.

    2015-06-01

    Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.

  19. A predictive framework for evaluating models of semantic organization in free recall

    PubMed Central

    Morton, Neal W; Polyn, Sean M.

    2016-01-01

    Research in free recall has demonstrated that semantic associations reliably influence the organization of search through episodic memory. However, the specific structure of these associations and the mechanisms by which they influence memory search remain unclear. We introduce a likelihood-based model-comparison technique, which embeds a model of semantic structure within the context maintenance and retrieval (CMR) model of human memory search. Within this framework, model variants are evaluated in terms of their ability to predict the specific sequence in which items are recalled. We compare three models of semantic structure, latent semantic analysis (LSA), global vectors (GloVe), and word association spaces (WAS), and find that models using WAS have the greatest predictive power. Furthermore, we find evidence that semantic and temporal organization is driven by distinct item and context cues, rather than a single context cue. This finding provides important constraint for theories of memory search. PMID:28331243

  20. Empirical study using network of semantically related associations in bridging the knowledge gap.

    PubMed

    Abedi, Vida; Yeasin, Mohammed; Zand, Ramin

    2014-11-27

    The data overload has created a new set of challenges in finding meaningful and relevant information with minimal cognitive effort. However designing robust and scalable knowledge discovery systems remains a challenge. Recent innovations in the (biological) literature mining tools have opened new avenues to understand the confluence of various diseases, genes, risk factors as well as biological processes in bridging the gaps between the massive amounts of scientific data and harvesting useful knowledge. In this paper, we highlight some of the findings using a text analytics tool, called ARIANA--Adaptive Robust and Integrative Analysis for finding Novel Associations. Empirical study using ARIANA reveals knowledge discovery instances that illustrate the efficacy of such tool. For example, ARIANA can capture the connection between the drug hexamethonium and pulmonary inflammation and fibrosis that caused the tragic death of a healthy volunteer in a 2001 John Hopkins asthma study, even though the abstract of the study was not part of the semantic model. An integrated system, such as ARIANA, could assist the human expert in exploratory literature search by bringing forward hidden associations, promoting data reuse and knowledge discovery as well as stimulating interdisciplinary projects by connecting information across the disciplines.

  1. Modelling Metamorphism by Abstract Interpretation

    NASA Astrophysics Data System (ADS)

    Dalla Preda, Mila; Giacobazzi, Roberto; Debray, Saumya; Coogan, Kevin; Townsend, Gregg M.

    Metamorphic malware apply semantics-preserving transformations to their own code in order to foil detection systems based on signature matching. In this paper we consider the problem of automatically extract metamorphic signatures from these malware. We introduce a semantics for self-modifying code, later called phase semantics, and prove its correctness by showing that it is an abstract interpretation of the standard trace semantics. Phase semantics precisely models the metamorphic code behavior by providing a set of traces of programs which correspond to the possible evolutions of the metamorphic code during execution. We show that metamorphic signatures can be automatically extracted by abstract interpretation of the phase semantics, and that regular metamorphism can be modelled as finite state automata abstraction of the phase semantics.

  2. A Tri-network Model of Human Semantic Processing

    PubMed Central

    Xu, Yangwen; He, Yong; Bi, Yanchao

    2017-01-01

    Humans process the meaning of the world via both verbal and nonverbal modalities. It has been established that widely distributed cortical regions are involved in semantic processing, yet the global wiring pattern of this brain system has not been considered in the current neurocognitive semantic models. We review evidence from the brain-network perspective, which shows that the semantic system is topologically segregated into three brain modules. Revisiting previous region-based evidence in light of these new network findings, we postulate that these three modules support multimodal experiential representation, language-supported representation, and semantic control. A tri-network neurocognitive model of semantic processing is proposed, which generates new hypotheses regarding the network basis of different types of semantic processes. PMID:28955266

  3. Performance impact of stop lists and morphological decomposition on word-word corpus-based semantic space models.

    PubMed

    Keith, Jeff; Westbury, Chris; Goldman, James

    2015-09-01

    Corpus-based semantic space models, which primarily rely on lexical co-occurrence statistics, have proven effective in modeling and predicting human behavior in a number of experimental paradigms that explore semantic memory representation. The most widely studied extant models, however, are strongly influenced by orthographic word frequency (e.g., Shaoul & Westbury, Behavior Research Methods, 38, 190-195, 2006). This has the implication that high-frequency closed-class words can potentially bias co-occurrence statistics. Because these closed-class words are purported to carry primarily syntactic, rather than semantic, information, the performance of corpus-based semantic space models may be improved by excluding closed-class words (using stop lists) from co-occurrence statistics, while retaining their syntactic information through other means (e.g., part-of-speech tagging and/or affixes from inflected word forms). Additionally, very little work has been done to explore the effect of employing morphological decomposition on the inflected forms of words in corpora prior to compiling co-occurrence statistics, despite (controversial) evidence that humans perform early morphological decomposition in semantic processing. In this study, we explored the impact of these factors on corpus-based semantic space models. From this study, morphological decomposition appears to significantly improve performance in word-word co-occurrence semantic space models, providing some support for the claim that sublexical information-specifically, word morphology-plays a role in lexical semantic processing. An overall decrease in performance was observed in models employing stop lists (e.g., excluding closed-class words). Furthermore, we found some evidence that weakens the claim that closed-class words supply primarily syntactic information in word-word co-occurrence semantic space models.

  4. The Use of a Context-Based Information Retrieval Technique

    DTIC Science & Technology

    2009-07-01

    provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic

  5. Interconnected growing self-organizing maps for auditory and semantic acquisition modeling.

    PubMed

    Cao, Mengxue; Li, Aijun; Fang, Qiang; Kaufmann, Emily; Kröger, Bernd J

    2014-01-01

    Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic information into consideration, in this paper. Direct phonetic-semantic association is simulated in order to model the language acquisition in early phases, such as the babbling and imitation stages, in which no phonological representations exist. Based on the I-GSOM algorithm, we conducted experiments using paired acoustic and semantic training data. We use a cyclical reinforcing and reviewing training procedure to model the teaching and learning process between children and their communication partners. A reinforcing-by-link training procedure and a link-forgetting procedure are introduced to model the acquisition of associative relations between auditory and semantic information. Experimental results indicate that (1) I-GSOM has good ability to learn auditory and semantic categories presented within the training data; (2) clear auditory and semantic boundaries can be found in the network representation; (3) cyclical reinforcing and reviewing training leads to a detailed categorization as well as to a detailed clustering, while keeping the clusters that have already been learned and the network structure that has already been developed stable; and (4) reinforcing-by-link training leads to well-perceived auditory-semantic associations. Our I-GSOM model suggests that it is important to associate auditory information with semantic information during language acquisition. Despite its high level of abstraction, our I-GSOM approach can be interpreted as a biologically-inspired neurocomputational model.

  6. Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing.

    PubMed

    Fan, Jianping; Luo, Hangzai; Elmagarmid, Ahmed K

    2004-07-01

    Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and concept-oriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semantic-sensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic video-classifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a concept-oriented video database organization technique through a certain domain-dependent concept hierarchy to enable semantic-sensitive video retrieval and browsing.

  7. Graph-Theoretic Properties of Networks Based on Word Association Norms: Implications for Models of Lexical Semantic Memory.

    PubMed

    Gruenenfelder, Thomas M; Recchia, Gabriel; Rubin, Tim; Jones, Michael N

    2016-08-01

    We compared the ability of three different contextual models of lexical semantic memory (BEAGLE, Latent Semantic Analysis, and the Topic model) and of a simple associative model (POC) to predict the properties of semantic networks derived from word association norms. None of the semantic models were able to accurately predict all of the network properties. All three contextual models over-predicted clustering in the norms, whereas the associative model under-predicted clustering. Only a hybrid model that assumed that some of the responses were based on a contextual model and others on an associative network (POC) successfully predicted all of the network properties and predicted a word's top five associates as well as or better than the better of the two constituent models. The results suggest that participants switch between a contextual representation and an associative network when generating free associations. We discuss the role that each of these representations may play in lexical semantic memory. Concordant with recent multicomponent theories of semantic memory, the associative network may encode coordinate relations between concepts (e.g., the relation between pea and bean, or between sparrow and robin), and contextual representations may be used to process information about more abstract concepts. Copyright © 2015 Cognitive Science Society, Inc.

  8. Efficient Results in Semantic Interoperability for Health Care. Findings from the Section on Knowledge Representation and Management.

    PubMed

    Soualmia, L F; Charlet, J

    2016-11-10

    To summarize excellent current research in the field of Knowledge Representation and Management (KRM) within the health and medical care domain. We provide a synopsis of the 2016 IMIA selected articles as well as a related synthetic overview of the current and future field activities. A first step of the selection was performed through MEDLINE querying with a list of MeSH descriptors completed by a list of terms adapted to the KRM section. The second step of the selection was completed by the two section editors who separately evaluated the set of 1,432 articles. The third step of the selection consisted of a collective work that merged the evaluation results to retain 15 articles for peer-review. The selection and evaluation process of this Yearbook's section on Knowledge Representation and Management has yielded four excellent and interesting articles regarding semantic interoperability for health care by gathering heterogeneous sources (knowledge and data) and auditing ontologies. In the first article, the authors present a solution based on standards and Semantic Web technologies to access distributed and heterogeneous datasets in the domain of breast cancer clinical trials. The second article describes a knowledge-based recommendation system that relies on ontologies and Semantic Web rules in the context of chronic diseases dietary. The third article is related to concept-recognition and text-mining to derive common human diseases model and a phenotypic network of common diseases. In the fourth article, the authors highlight the need for auditing the SNOMED CT. They propose to use a crowdbased method for ontology engineering. The current research activities further illustrate the continuous convergence of Knowledge Representation and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care by proposing solutions to cope with the problem of semantic interoperability. Indeed, there is a need for powerful tools able to manage and interpret complex, large-scale and distributed datasets and knowledge bases, but also a need for user-friendly tools developed for the clinicians in their daily practice.

  9. Phrase Mining of Textual Data to Analyze Extracellular Matrix Protein Patterns Across Cardiovascular Disease.

    PubMed

    Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei

    2018-05-18

    Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.

  10. A health analytics semantic ETL service for obesity surveillance.

    PubMed

    Poulymenopoulou, M; Papakonstantinou, D; Malamateniou, F; Vassilacopoulos, G

    2015-01-01

    The increasingly large amount of data produced in healthcare (e.g. collected through health information systems such as electronic medical records - EMRs or collected through novel data sources such as personal health records - PHRs, social media, web resources) enable the creation of detailed records about people's health, sentiments and activities (e.g. physical activity, diet, sleep quality) that can be used in the public health area among others. However, despite the transformative potential of big data in public health surveillance there are several challenges in integrating big data. In this paper, the interoperability challenge is tackled and a semantic Extract Transform Load (ETL) service is proposed that seeks to semantically annotate big data to result into valuable data for analysis. This service is considered as part of a health analytics engine on the cloud that interacts with existing healthcare information exchange networks, like the Integrating the Healthcare Enterprise (IHE), PHRs, sensors, mobile applications, and other web resources to retrieve patient health, behavioral and daily activity data. The semantic ETL service aims at semantically integrating big data for use by analytic mechanisms. An illustrative implementation of the service on big data which is potentially relevant to human obesity, enables using appropriate analytic techniques (e.g. machine learning, text mining) that are expected to assist in identifying patterns and contributing factors (e.g. genetic background, social, environmental) for this social phenomenon and, hence, drive health policy changes and promote healthy behaviors where residents live, work, learn, shop and play.

  11. Interconnected growing self-organizing maps for auditory and semantic acquisition modeling

    PubMed Central

    Cao, Mengxue; Li, Aijun; Fang, Qiang; Kaufmann, Emily; Kröger, Bernd J.

    2014-01-01

    Based on the incremental nature of knowledge acquisition, in this study we propose a growing self-organizing neural network approach for modeling the acquisition of auditory and semantic categories. We introduce an Interconnected Growing Self-Organizing Maps (I-GSOM) algorithm, which takes associations between auditory information and semantic information into consideration, in this paper. Direct phonetic–semantic association is simulated in order to model the language acquisition in early phases, such as the babbling and imitation stages, in which no phonological representations exist. Based on the I-GSOM algorithm, we conducted experiments using paired acoustic and semantic training data. We use a cyclical reinforcing and reviewing training procedure to model the teaching and learning process between children and their communication partners. A reinforcing-by-link training procedure and a link-forgetting procedure are introduced to model the acquisition of associative relations between auditory and semantic information. Experimental results indicate that (1) I-GSOM has good ability to learn auditory and semantic categories presented within the training data; (2) clear auditory and semantic boundaries can be found in the network representation; (3) cyclical reinforcing and reviewing training leads to a detailed categorization as well as to a detailed clustering, while keeping the clusters that have already been learned and the network structure that has already been developed stable; and (4) reinforcing-by-link training leads to well-perceived auditory–semantic associations. Our I-GSOM model suggests that it is important to associate auditory information with semantic information during language acquisition. Despite its high level of abstraction, our I-GSOM approach can be interpreted as a biologically-inspired neurocomputational model. PMID:24688478

  12. Ontology driven modeling for the knowledge of genetic susceptibility to disease.

    PubMed

    Lin, Yu; Sakamoto, Norihiro

    2009-05-12

    For the machine helped exploring the relationships between genetic factors and complex diseases, a well-structured conceptual framework of the background knowledge is needed. However, because of the complexity of determining a genetic susceptibility factor, there is no formalization for the knowledge of genetic susceptibility to disease, which makes the interoperability between systems impossible. Thus, the ontology modeling language OWL was used for formalization in this paper. After introducing the Semantic Web and OWL language propagated by W3C, we applied text mining technology combined with competency questions to specify the classes of the ontology. Then, an N-ary pattern was adopted to describe the relationships among these defined classes. Based on the former work of OGSF-DM (Ontology of Genetic Susceptibility Factors to Diabetes Mellitus), we formalized the definition of "Genetic Susceptibility", "Genetic Susceptibility Factor" and other classes by using OWL-DL modeling language; and a reasoner automatically performed the classification of the class "Genetic Susceptibility Factor". The ontology driven modeling is used for formalization the knowledge of genetic susceptibility to complex diseases. More importantly, when a class has been completely formalized in an ontology, the OWL reasoning can automatically compute the classification of the class, in our case, the class of "Genetic Susceptibility Factors". With more types of genetic susceptibility factors obtained from the laboratory research, our ontologies always needs to be refined, and many new classes must be taken into account to harmonize with the ontologies. Using the ontologies to develop the semantic web needs to be applied in the future.

  13. Semantic attributes are encoded in human electrocorticographic signals during visual object recognition.

    PubMed

    Rupp, Kyle; Roos, Matthew; Milsap, Griffin; Caceres, Carlos; Ratto, Christopher; Chevillet, Mark; Crone, Nathan E; Wolmetz, Michael

    2017-03-01

    Non-invasive neuroimaging studies have shown that semantic category and attribute information are encoded in neural population activity. Electrocorticography (ECoG) offers several advantages over non-invasive approaches, but the degree to which semantic attribute information is encoded in ECoG responses is not known. We recorded ECoG while patients named objects from 12 semantic categories and then trained high-dimensional encoding models to map semantic attributes to spectral-temporal features of the task-related neural responses. Using these semantic attribute encoding models, untrained objects were decoded with accuracies comparable to whole-brain functional Magnetic Resonance Imaging (fMRI), and we observed that high-gamma activity (70-110Hz) at basal occipitotemporal electrodes was associated with specific semantic dimensions (manmade-animate, canonically large-small, and places-tools). Individual patient results were in close agreement with reports from other imaging modalities on the time course and functional organization of semantic processing along the ventral visual pathway during object recognition. The semantic attribute encoding model approach is critical for decoding objects absent from a training set, as well as for studying complex semantic encodings without artificially restricting stimuli to a small number of semantic categories. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Semantic web data warehousing for caGrid.

    PubMed

    McCusker, James P; Phillips, Joshua A; González Beltrán, Alejandra; Finkelstein, Anthony; Krauthammer, Michael

    2009-10-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.

  15. Semantic enrichment of clinical models towards semantic interoperability. The heart failure summary use case.

    PubMed

    Martínez-Costa, Catalina; Cornet, Ronald; Karlsson, Daniel; Schulz, Stefan; Kalra, Dipak

    2015-05-01

    To improve semantic interoperability of electronic health records (EHRs) by ontology-based mediation across syntactically heterogeneous representations of the same or similar clinical information. Our approach is based on a semantic layer that consists of: (1) a set of ontologies supported by (2) a set of semantic patterns. The first aspect of the semantic layer helps standardize the clinical information modeling task and the second shields modelers from the complexity of ontology modeling. We applied this approach to heterogeneous representations of an excerpt of a heart failure summary. Using a set of finite top-level patterns to derive semantic patterns, we demonstrate that those patterns, or compositions thereof, can be used to represent information from clinical models. Homogeneous querying of the same or similar information, when represented according to heterogeneous clinical models, is feasible. Our approach focuses on the meaning embedded in EHRs, regardless of their structure. This complex task requires a clear ontological commitment (ie, agreement to consistently use the shared vocabulary within some context), together with formalization rules. These requirements are supported by semantic patterns. Other potential uses of this approach, such as clinical models validation, require further investigation. We show how an ontology-based representation of a clinical summary, guided by semantic patterns, allows homogeneous querying of heterogeneous information structures. Whether there are a finite number of top-level patterns is an open question. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Understanding human activity patterns based on space-time-semantics

    NASA Astrophysics Data System (ADS)

    Huang, Wei; Li, Songnian

    2016-11-01

    Understanding human activity patterns plays a key role in various applications in an urban environment, such as transportation planning and traffic forecasting, urban planning, public health and safety, and emergency response. Most existing studies in modeling human activity patterns mainly focus on spatiotemporal dimensions, which lacks consideration of underlying semantic context. In fact, what people do and discuss at some places, inferring what is happening at the places, cannot be simple neglected because it is the root of human mobility patterns. We believe that the geo-tagged semantic context, representing what individuals do and discuss at a place and a specific time, drives a formation of specific human activity pattern. In this paper, we aim to model human activity patterns not only based on space and time but also with consideration of associated semantics, and attempt to prove a hypothesis that similar mobility patterns may have different motivations. We develop a spatiotemporal-semantic model to quantitatively express human activity patterns based on topic models, leading to an analysis of space, time and semantics. A case study is conducted using Twitter data in Toronto based on our model. Through computing the similarities between users in terms of spatiotemporal pattern, semantic pattern and spatiotemporal-semantic pattern, we find that only a small number of users (2.72%) have very similar activity patterns, while the majority (87.14%) show different activity patterns (i.e., similar spatiotemporal patterns and different semantic patterns, similar semantic patterns and different spatiotemporal patterns, or different in both). The population of users that has very similar activity patterns is decreased by 56.41% after incorporating semantic information in the corresponding spatiotemporal patterns, which can quantitatively prove the hypothesis.

  17. Point Cloud Classification of Tesserae from Terrestrial Laser Data Combined with Dense Image Matching for Archaeological Information Extraction

    NASA Astrophysics Data System (ADS)

    Poux, F.; Neuville, R.; Billen, R.

    2017-08-01

    Reasoning from information extraction given by point cloud data mining allows contextual adaptation and fast decision making. However, to achieve this perceptive level, a point cloud must be semantically rich, retaining relevant information for the end user. This paper presents an automatic knowledge-based method for pre-processing multi-sensory data and classifying a hybrid point cloud from both terrestrial laser scanning and dense image matching. Using 18 features including sensor's biased data, each tessera in the high-density point cloud from the 3D captured complex mosaics of Germigny-des-prés (France) is segmented via a colour multi-scale abstraction-based featuring extracting connectivity. A 2D surface and outline polygon of each tessera is generated by a RANSAC plane extraction and convex hull fitting. Knowledge is then used to classify every tesserae based on their size, surface, shape, material properties and their neighbour's class. The detection and semantic enrichment method shows promising results of 94% correct semantization, a first step toward the creation of an archaeological smart point cloud.

  18. Semantic Document Model to Enhance Data and Knowledge Interoperability

    NASA Astrophysics Data System (ADS)

    Nešić, Saša

    To enable document data and knowledge to be efficiently shared and reused across application, enterprise, and community boundaries, desktop documents should be completely open and queryable resources, whose data and knowledge are represented in a form understandable to both humans and machines. At the same time, these are the requirements that desktop documents need to satisfy in order to contribute to the visions of the Semantic Web. With the aim of achieving this goal, we have developed the Semantic Document Model (SDM), which turns desktop documents into Semantic Documents as uniquely identified and semantically annotated composite resources, that can be instantiated into human-readable (HR) and machine-processable (MP) forms. In this paper, we present the SDM along with an RDF and ontology-based solution for the MP document instance. Moreover, on top of the proposed model, we have built the Semantic Document Management System (SDMS), which provides a set of services that exploit the model. As an application example that takes advantage of SDMS services, we have extended MS Office with a set of tools that enables users to transform MS Office documents (e.g., MS Word and MS PowerPoint) into Semantic Documents, and to search local and distant semantic document repositories for document content units (CUs) over Semantic Web protocols.

  19. A Generic Evaluation Model for Semantic Web Services

    NASA Astrophysics Data System (ADS)

    Shafiq, Omair

    Semantic Web Services research has gained momentum over the last few Years and by now several realizations exist. They are being used in a number of industrial use-cases. Soon software developers will be expected to use this infrastructure to build their B2B applications requiring dynamic integration. However, there is still a lack of guidelines for the evaluation of tools developed to realize Semantic Web Services and applications built on top of them. In normal software engineering practice such guidelines can already be found for traditional component-based systems. Also some efforts are being made to build performance models for servicebased systems. Drawing on these related efforts in component-oriented and servicebased systems, we identified the need for a generic evaluation model for Semantic Web Services applicable to any realization. The generic evaluation model will help users and customers to orient their systems and solutions towards using Semantic Web Services. In this chapter, we have presented the requirements for the generic evaluation model for Semantic Web Services and further discussed the initial steps that we took to sketch such a model. Finally, we discuss related activities for evaluating semantic technologies.

  20. Natural speech reveals the semantic maps that tile human cerebral cortex

    PubMed Central

    Huth, Alexander G.; de Heer, Wendy A.; Griffiths, Thomas L.; Theunissen, Frédéric E.; Gallant, Jack L.

    2016-01-01

    The meaning of language is represented in regions of the cerebral cortex collectively known as the “semantic system”. However, little of the semantic system has been mapped comprehensively, and the semantic selectivity of most regions is unknown. Here we systematically map semantic selectivity across the cortex using voxel-wise modeling of fMRI data collected while subjects listened to hours of narrative stories. We show that the semantic system is organized into intricate patterns that appear consistent across individuals. We then use a novel generative model to create a detailed semantic atlas. Our results suggest that most areas within the semantic system represent information about specific semantic domains, or groups of related concepts, and our atlas shows which domains are represented in each area. This study demonstrates that data-driven methods—commonplace in studies of human neuroanatomy and functional connectivity—provide a powerful and efficient means for mapping functional representations in the brain. PMID:27121839

  1. Linguistic Preprocessing and Tagging for Problem Report Trend Analysis

    NASA Technical Reports Server (NTRS)

    Beil, Robert J.; Malin, Jane T.

    2012-01-01

    Mr. Robert Beil, Systems Engineer at Kennedy Space Center (KSC), requested the NASA Engineering and Safety Center (NESC) develop a prototype tool suite that combines complementary software technology used at Johnson Space Center (JSC) and KSC for problem report preprocessing and semantic tag extraction, to improve input to data mining and trend analysis. This document contains the outcome of the assessment and the Findings, Observations and NESC Recommendations.

  2. Integrating semantic dimension into openEHR archetypes for the management of cerebral palsy electronic medical records.

    PubMed

    Ellouze, Afef Samet; Bouaziz, Rafik; Ghorbel, Hanen

    2016-10-01

    Integrating semantic dimension into clinical archetypes is necessary once modeling medical records. First, it enables semantic interoperability and, it offers applying semantic activities on clinical data and provides a higher design quality of Electronic Medical Record (EMR) systems. However, to obtain these advantages, designers need to use archetypes that cover semantic features of clinical concepts involved in their specific applications. In fact, most of archetypes filed within open repositories are expressed in the Archetype Definition Language (ALD) which allows defining only the syntactic structure of clinical concepts weakening semantic activities on the EMR content in the semantic web environment. This paper focuses on the modeling of an EMR prototype for infants affected by Cerebral Palsy (CP), using the dual model approach and integrating semantic web technologies. Such a modeling provides a better delivery of quality of care and ensures semantic interoperability between all involved therapies' information systems. First, data to be documented are identified and collected from the involved therapies. Subsequently, data are analyzed and arranged into archetypes expressed in accordance of ADL. During this step, open archetype repositories are explored, in order to find the suitable archetypes. Then, ADL archetypes are transformed into archetypes expressed in OWL-DL (Ontology Web Language - Description Language). Finally, we construct an ontological source related to these archetypes enabling hence their annotation to facilitate data extraction and providing possibility to exercise semantic activities on such archetypes. Semantic dimension integration into EMR modeled in accordance to the archetype approach. The feasibility of our solution is shown through the development of a prototype, baptized "CP-SMS", which ensures semantic exploitation of CP EMR. This prototype provides the following features: (i) creation of CP EMR instances and their checking by using a knowledge base which we have constructed by interviews with domain experts, (ii) translation of initially CP ADL archetypes into CP OWL-DL archetypes, (iii) creation of an ontological source which we can use to annotate obtained archetypes and (vi) enrichment and supply of the ontological source and integration of semantic relations by providing hence fueling the ontology with new concepts, ensuring consistency and eliminating ambiguity between concepts. The degree of semantic interoperability that could be reached between EMR systems depends strongly on the quality of the used archetypes. Thus, the integration of semantic dimension in archetypes modeling process is crucial. By creating an ontological source and annotating archetypes, we create a supportive platform ensuring semantic interoperability between archetypes-based EMR-systems. Copyright © 2016. Published by Elsevier Inc.

  3. The Distribution of the Informative Intensity of the Text in Terms of its Structure (On Materials of the English Texts in the Mining Sphere)

    NASA Astrophysics Data System (ADS)

    Znikina, Ludmila; Rozhneva, Elena

    2017-11-01

    The article deals with the distribution of informative intensity of the English-language scientific text based on its structural features contributing to the process of formalization of the scientific text and the preservation of the adequacy of the text with derived semantic information in relation to the primary. Discourse analysis is built on specific compositional and meaningful examples of scientific texts taken from the mining field. It also analyzes the adequacy of the translation of foreign texts into another language, the relationships between elements of linguistic systems, the degree of a formal conformance, translation with the specific objectives and information needs of the recipient. Some key words and ideas are emphasized in the paragraphs of the English-language mining scientific texts. The article gives the characteristic features of the structure of paragraphs of technical text and examples of constructions in English scientific texts based on a mining theme with the aim to explain the possible ways of their adequate translation.

  4. Unsupervised User Similarity Mining in GSM Sensor Networks

    PubMed Central

    Shad, Shafqat Ali; Chen, Enhong

    2013-01-01

    Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining. PMID:23576905

  5. Using RDF to Model the Structure and Process of Systems

    NASA Astrophysics Data System (ADS)

    Rodriguez, Marko A.; Watkins, Jennifer H.; Bollen, Johan; Gershenson, Carlos

    Many systems can be described in terms of networks of discrete elements and their various relationships to one another. A semantic network, or multi-relational network, is a directed labeled graph consisting of a heterogeneous set of entities connected by a heterogeneous set of relationships. Semantic networks serve as a promising general-purpose modeling substrate for complex systems. Various standardized formats and tools are now available to support practical, large-scale semantic network models. First, the Resource Description Framework (RDF) offers a standardized semantic network data model that can be further formalized by ontology modeling languages such as RDF Schema (RDFS) and the Web Ontology Language (OWL). Second, the recent introduction of highly performant triple-stores (i.e. semantic network databases) allows semantic network models on the order of 109 edges to be efficiently stored and manipulated. RDF and its related technologies are currently used extensively in the domains of computer science, digital library science, and the biological sciences. This article will provide an introduction to RDF/RDFS/OWL and an examination of its suitability to model discrete element complex systems.

  6. Modelling the Effects of Semantic Ambiguity in Word Recognition

    ERIC Educational Resources Information Center

    Rodd, Jennifer M.; Gaskell, M. Gareth; Marslen-Wilson, William D.

    2004-01-01

    Most words in English are ambiguous between different interpretations; words can mean different things in different contexts. We investigate the implications of different types of semantic ambiguity for connectionist models of word recognition. We present a model in which there is competition to activate distributed semantic representations. The…

  7. Complex Topographic Feature Ontology Patterns

    USGS Publications Warehouse

    Varanka, Dalia E.; Jerris, Thomas J.

    2015-01-01

    Semantic ontologies are examined as effective data models for the representation of complex topographic feature types. Complex feature types are viewed as integrated relations between basic features for a basic purpose. In the context of topographic science, such component assemblages are supported by resource systems and found on the local landscape. Ontologies are organized within six thematic modules of a domain ontology called Topography that includes within its sphere basic feature types, resource systems, and landscape types. Context is constructed not only as a spatial and temporal setting, but a setting also based on environmental processes. Types of spatial relations that exist between components include location, generative processes, and description. An example is offered in a complex feature type ‘mine.’ The identification and extraction of complex feature types are an area for future research.

  8. A Formal Theory for Modular ERDF Ontologies

    NASA Astrophysics Data System (ADS)

    Analyti, Anastasia; Antoniou, Grigoris; Damásio, Carlos Viegas

    The success of the Semantic Web is impossible without any form of modularity, encapsulation, and access control. In an earlier paper, we extended RDF graphs with weak and strong negation, as well as derivation rules. The ERDF #n-stable model semantics of the extended RDF framework (ERDF) is defined, extending RDF(S) semantics. In this paper, we propose a framework for modular ERDF ontologies, called modular ERDF framework, which enables collaborative reasoning over a set of ERDF ontologies, while support for hidden knowledge is also provided. In particular, the modular ERDF stable model semantics of modular ERDF ontologies is defined, extending the ERDF #n-stable model semantics. Our proposed framework supports local semantics and different points of view, local closed-world and open-world assumptions, and scoped negation-as-failure. Several complexity results are provided.

  9. Overlap in the functional neural systems involved in semantic and episodic memory retrieval.

    PubMed

    Rajah, M N; McIntosh, A R

    2005-03-01

    Neuroimaging and neuropsychological data suggest that episodic and semantic memory may be mediated by distinct neural systems. However, an alternative perspective is that episodic and semantic memory represent different modes of processing within a single declarative memory system. To examine whether the multiple or the unitary system view better represents the data we conducted a network analysis using multivariate partial least squares (PLS ) activation analysis followed by covariance structural equation modeling (SEM) of positron emission tomography data obtained while healthy adults performed episodic and semantic verbal retrieval tasks. It is argued that if performance of episodic and semantic retrieval tasks are mediated by different memory systems, then there should differences in both regional activations and interregional correlations related to each type of retrieval task, respectively. The PLS results identified brain regions that were differentially active during episodic retrieval versus semantic retrieval. Regions that showed maximal differences in regional activity between episodic retrieval tasks were used to construct separate functional models for episodic and semantic retrieval. Omnibus tests of these functional models failed to find a significant difference across tasks for both functional models. The pattern of path coefficients for the episodic retrieval model were not different across tasks, nor were the path coefficients for the semantic retrieval model. The SEM results suggest that the same memory network/system was engaged across tasks, given the similarities in path coefficients. Therefore, activation differences between episodic and semantic retrieval may ref lect variation along a continuum of processing during task performance within the context of a single memory system.

  10. Semantic Service Design for Collaborative Business Processes in Internetworked Enterprises

    NASA Astrophysics Data System (ADS)

    Bianchini, Devis; Cappiello, Cinzia; de Antonellis, Valeria; Pernici, Barbara

    Modern collaborating enterprises can be seen as borderless organizations whose processes are dynamically transformed and integrated with the ones of their partners (Internetworked Enterprises, IE), thus enabling the design of collaborative business processes. The adoption of Semantic Web and service-oriented technologies for implementing collaboration in such distributed and heterogeneous environments promises significant benefits. IE can model their own processes independently by using the Software as a Service paradigm (SaaS). Each enterprise maintains a catalog of available services and these can be shared across IE and reused to build up complex collaborative processes. Moreover, each enterprise can adopt its own terminology and concepts to describe business processes and component services. This brings requirements to manage semantic heterogeneity in process descriptions which are distributed across different enterprise systems. To enable effective service-based collaboration, IEs have to standardize their process descriptions and model them through component services using the same approach and principles. For enabling collaborative business processes across IE, services should be designed following an homogeneous approach, possibly maintaining a uniform level of granularity. In the paper we propose an ontology-based semantic modeling approach apt to enrich and reconcile semantics of process descriptions to facilitate process knowledge management and to enable semantic service design (by discovery, reuse and integration of process elements/constructs). The approach brings together Semantic Web technologies, techniques in process modeling, ontology building and semantic matching in order to provide a comprehensive semantic modeling framework.

  11. Design and Applications of a GeoSemantic Framework for Integration of Data and Model Resources in Hydrologic Systems

    NASA Astrophysics Data System (ADS)

    Elag, M.; Kumar, P.

    2016-12-01

    Hydrologists today have to integrate resources such as data and models, which originate and reside in multiple autonomous and heterogeneous repositories over the Web. Several resource management systems have emerged within geoscience communities for sharing long-tail data, which are collected by individual or small research groups, and long-tail models, which are developed by scientists or small modeling communities. While these systems have increased the availability of resources within geoscience domains, deficiencies remain due to the heterogeneity in the methods, which are used to describe, encode, and publish information about resources over the Web. This heterogeneity limits our ability to access the right information in the right context so that it can be efficiently retrieved and understood without the Hydrologist's mediation. A primary challenge of the Web today is the lack of the semantic interoperability among the massive number of resources, which already exist and are continually being generated at rapid rates. To address this challenge, we have developed a decentralized GeoSemantic (GS) framework, which provides three sets of micro-web services to support (i) semantic annotation of resources, (ii) semantic alignment between the metadata of two resources, and (iii) semantic mediation among Standard Names. Here we present the design of the framework and demonstrate its application for semantic integration between data and models used in the IML-CZO. First we show how the IML-CZO data are annotated using the Semantic Annotation Services. Then we illustrate how the Resource Alignment Services and Knowledge Integration Services are used to create a semantic workflow among TopoFlow model, which is a spatially-distributed hydrologic model and the annotated data. Results of this work are (i) a demonstration of how the GS framework advances the integration of heterogeneous data and models of water-related disciplines by seamless handling of their semantic heterogeneity, (ii) an introduction of new paradigm for reusing existing and new standards as well as tools and models without the need of their implementation in the Cyberinfrastructures of water-related disciplines, and (iii) an investigation of a methodology by which distributed models can be coupled in a workflow using the GS services.

  12. Semantic web data warehousing for caGrid

    PubMed Central

    McCusker, James P; Phillips, Joshua A; Beltrán, Alejandra González; Finkelstein, Anthony; Krauthammer, Michael

    2009-01-01

    The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges. PMID:19796399

  13. Classification with an edge: Improving semantic image segmentation with boundary detection

    NASA Astrophysics Data System (ADS)

    Marmanis, D.; Schindler, K.; Wegner, J. D.; Galliani, S.; Datcu, M.; Stilla, U.

    2018-01-01

    We present an end-to-end trainable deep convolutional neural network (DCNN) for semantic segmentation with built-in awareness of semantically meaningful boundaries. Semantic segmentation is a fundamental remote sensing task, and most state-of-the-art methods rely on DCNNs as their workhorse. A major reason for their success is that deep networks learn to accumulate contextual information over very large receptive fields. However, this success comes at a cost, since the associated loss of effective spatial resolution washes out high-frequency details and leads to blurry object boundaries. Here, we propose to counter this effect by combining semantic segmentation with semantically informed edge detection, thus making class boundaries explicit in the model. First, we construct a comparatively simple, memory-efficient model by adding boundary detection to the SEGNET encoder-decoder architecture. Second, we also include boundary detection in FCN-type models and set up a high-end classifier ensemble. We show that boundary detection significantly improves semantic segmentation with CNNs in an end-to-end training scheme. Our best model achieves >90% overall accuracy on the ISPRS Vaihingen benchmark.

  14. Brain connections of words, perceptions and actions: A neurobiological model of spatio-temporal semantic activation in the human cortex.

    PubMed

    Tomasello, Rosario; Garagnani, Max; Wennekers, Thomas; Pulvermüller, Friedemann

    2017-04-01

    Neuroimaging and patient studies show that different areas of cortex respectively specialize for general and selective, or category-specific, semantic processing. Why are there both semantic hubs and category-specificity, and how come that they emerge in different cortical regions? Can the activation time-course of these areas be predicted and explained by brain-like network models? In this present work, we extend a neurocomputational model of human cortical function to simulate the time-course of cortical processes of understanding meaningful concrete words. The model implements frontal and temporal cortical areas for language, perception, and action along with their connectivity. It uses Hebbian learning to semantically ground words in aspects of their referential object- and action-related meaning. Compared with earlier proposals, the present model incorporates additional neuroanatomical links supported by connectivity studies and downscaled synaptic weights in order to control for functional between-area differences purely due to the number of in- or output links of an area. We show that learning of semantic relationships between words and the objects and actions these symbols are used to speak about, leads to the formation of distributed circuits, which all include neuronal material in connector hub areas bridging between sensory and motor cortical systems. Therefore, these connector hub areas acquire a role as semantic hubs. By differentially reaching into motor or visual areas, the cortical distributions of the emergent 'semantic circuits' reflect aspects of the represented symbols' meaning, thus explaining category-specificity. The improved connectivity structure of our model entails a degree of category-specificity even in the 'semantic hubs' of the model. The relative time-course of activation of these areas is typically fast and near-simultaneous, with semantic hubs central to the network structure activating before modality-preferential areas carrying semantic information. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Spreading Activation in an Attractor Network with Latching Dynamics: Automatic Semantic Priming Revisited

    ERIC Educational Resources Information Center

    Lerner, Itamar; Bentin, Shlomo; Shriki, Oren

    2012-01-01

    Localist models of spreading activation (SA) and models assuming distributed representations offer very different takes on semantic priming, a widely investigated paradigm in word recognition and semantic memory research. In this study, we implemented SA in an attractor neural network model with distributed representations and created a unified…

  16. Wernicke's Aphasia Reflects a Combination of Acoustic-Phonological and Semantic Control Deficits: A Case-Series Comparison of Wernicke's Aphasia, Semantic Dementia and Semantic Aphasia

    ERIC Educational Resources Information Center

    Robson, Holly; Sage, Karen; Lambon Ralph, Matthew A.

    2012-01-01

    Wernicke's aphasia (WA) is the classical neurological model of comprehension impairment and, as a result, the posterior temporal lobe is assumed to be critical to semantic cognition. This conclusion is potentially confused by (a) the existence of patient groups with semantic impairment following damage to other brain regions (semantic dementia and…

  17. Mimicking aphasic semantic errors in normal speech production: evidence from a novel experimental paradigm.

    PubMed

    Hodgson, Catherine; Lambon Ralph, Matthew A

    2008-01-01

    Semantic errors are commonly found in semantic dementia (SD) and some forms of stroke aphasia and provide insights into semantic processing and speech production. Low error rates are found in standard picture naming tasks in normal controls. In order to increase error rates and thus provide an experimental model of aphasic performance, this study utilised a novel method- tempo picture naming. Experiment 1 showed that, compared to standard deadline naming tasks, participants made more errors on the tempo picture naming tasks. Further, RTs were longer and more errors were produced to living items than non-living items a pattern seen in both semantic dementia and semantically-impaired stroke aphasic patients. Experiment 2 showed that providing the initial phoneme as a cue enhanced performance whereas providing an incorrect phonemic cue further reduced performance. These results support the contention that the tempo picture naming paradigm reduces the time allowed for controlled semantic processing causing increased error rates. This experimental procedure would, therefore, appear to mimic the performance of aphasic patients with multi-modal semantic impairment that results from poor semantic control rather than the degradation of semantic representations observed in semantic dementia [Jefferies, E. A., & Lambon Ralph, M. A. (2006). Semantic impairment in stoke aphasia vs. semantic dementia: A case-series comparison. Brain, 129, 2132-2147]. Further implications for theories of semantic cognition and models of speech processing are discussed.

  18. Personal semantics: at the crossroads of semantic and episodic memory.

    PubMed

    Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian

    2012-11-01

    Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Interpreting semantic clustering effects in free recall.

    PubMed

    Manning, Jeremy R; Kahana, Michael J

    2012-07-01

    The order in which participants choose to recall words from a studied list of randomly selected words provides insights into how memories of the words are represented, organised, and retrieved. One pervasive finding is that when a pair of semantically related words (e.g., "cat" and "dog") is embedded in the studied list, the related words are often recalled successively. This tendency to successively recall semantically related words is termed semantic clustering (Bousfield, 1953; Bousfield & Sedgewick, 1944; Cofer, Bruce, & Reicher, 1966). Measuring semantic clustering effects requires making assumptions about which words participants consider to be similar in meaning. However, it is often difficult to gain insights into individual participants' internal semantic models, and for this reason researchers typically rely on standardised semantic similarity metrics. Here we use simulations to gain insights into the expected magnitudes of semantic clustering effects given systematic differences between participants' internal similarity models and the similarity metric used to quantify the degree of semantic clustering. Our results provide a number of useful insights into the interpretation of semantic clustering effects in free recall.

  20. Knowledge management for systems biology a general and visually driven framework applied to translational medicine.

    PubMed

    Maier, Dieter; Kalus, Wenzel; Wolff, Martin; Kalko, Susana G; Roca, Josep; Marin de Mas, Igor; Turan, Nil; Cascante, Marta; Falciani, Francesco; Hernandez, Miguel; Villà-Freixa, Jordi; Losko, Sascha

    2011-03-05

    To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype-phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene--disease and gene--compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development.

  1. Knowledge management for systems biology a general and visually driven framework applied to translational medicine

    PubMed Central

    2011-01-01

    Background To enhance our understanding of complex biological systems like diseases we need to put all of the available data into context and use this to detect relations, pattern and rules which allow predictive hypotheses to be defined. Life science has become a data rich science with information about the behaviour of millions of entities like genes, chemical compounds, diseases, cell types and organs, which are organised in many different databases and/or spread throughout the literature. Existing knowledge such as genotype - phenotype relations or signal transduction pathways must be semantically integrated and dynamically organised into structured networks that are connected with clinical and experimental data. Different approaches to this challenge exist but so far none has proven entirely satisfactory. Results To address this challenge we previously developed a generic knowledge management framework, BioXM™, which allows the dynamic, graphic generation of domain specific knowledge representation models based on specific objects and their relations supporting annotations and ontologies. Here we demonstrate the utility of BioXM for knowledge management in systems biology as part of the EU FP6 BioBridge project on translational approaches to chronic diseases. From clinical and experimental data, text-mining results and public databases we generate a chronic obstructive pulmonary disease (COPD) knowledge base and demonstrate its use by mining specific molecular networks together with integrated clinical and experimental data. Conclusions We generate the first semantically integrated COPD specific public knowledge base and find that for the integration of clinical and experimental data with pre-existing knowledge the configuration based set-up enabled by BioXM reduced implementation time and effort for the knowledge base compared to similar systems implemented as classical software development projects. The knowledgebase enables the retrieval of sub-networks including protein-protein interaction, pathway, gene - disease and gene - compound data which are used for subsequent data analysis, modelling and simulation. Pre-structured queries and reports enhance usability; establishing their use in everyday clinical settings requires further simplification with a browser based interface which is currently under development. PMID:21375767

  2. Annotating images by mining image search results.

    PubMed

    Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

    2008-11-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

  3. Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing.

    PubMed

    Cashman, Sarah A; Meyer, David E; Edelen, Ashley N; Ingwersen, Wesley W; Abraham, John P; Barrett, William M; Gonzalez, Michael A; Randall, Paul M; Ruiz-Mercado, Gerardo; Smith, Raymond L

    2016-09-06

    Demands for quick and accurate life cycle assessments create a need for methods to rapidly generate reliable life cycle inventories (LCI). Data mining is a suitable tool for this purpose, especially given the large amount of available governmental data. These data are typically applied to LCIs on a case-by-case basis. As linked open data becomes more prevalent, it may be possible to automate LCI using data mining by establishing a reproducible approach for identifying, extracting, and processing the data. This work proposes a method for standardizing and eventually automating the discovery and use of publicly available data at the United States Environmental Protection Agency for chemical-manufacturing LCI. The method is developed using a case study of acetic acid. The data quality and gap analyses for the generated inventory found that the selected data sources can provide information with equal or better reliability and representativeness on air, water, hazardous waste, on-site energy usage, and production volumes but with key data gaps including material inputs, water usage, purchased electricity, and transportation requirements. A comparison of the generated LCI with existing data revealed that the data mining inventory is in reasonable agreement with existing data and may provide a more-comprehensive inventory of air emissions and water discharges. The case study highlighted challenges for current data management practices that must be overcome to successfully automate the method using semantic technology. Benefits of the method are that the openly available data can be compiled in a standardized and transparent approach that supports potential automation with flexibility to incorporate new data sources as needed.

  4. Analysis and visualization of disease courses in a semantically-enabled cancer registry.

    PubMed

    Esteban-Gil, Angel; Fernández-Breis, Jesualdo Tomás; Boeker, Martin

    2017-09-29

    Regional and epidemiological cancer registries are important for cancer research and the quality management of cancer treatment. Many technological solutions are available to collect and analyse data for cancer registries nowadays. However, the lack of a well-defined common semantic model is a problem when user-defined analyses and data linking to external resources are required. The objectives of this study are: (1) design of a semantic model for local cancer registries; (2) development of a semantically-enabled cancer registry based on this model; and (3) semantic exploitation of the cancer registry for analysing and visualising disease courses. Our proposal is based on our previous results and experience working with semantic technologies. Data stored in a cancer registry database were transformed into RDF employing a process driven by OWL ontologies. The semantic representation of the data was then processed to extract semantic patient profiles, which were exploited by means of SPARQL queries to identify groups of similar patients and to analyse the disease timelines of patients. Based on the requirements analysis, we have produced a draft of an ontology that models the semantics of a local cancer registry in a pragmatic extensible way. We have implemented a Semantic Web platform that allows transforming and storing data from cancer registries in RDF. This platform also permits users to formulate incremental user-defined queries through a graphical user interface. The query results can be displayed in several customisable ways. The complex disease timelines of individual patients can be clearly represented. Different events, e.g. different therapies and disease courses, are presented according to their temporal and causal relations. The presented platform is an example of the parallel development of ontologies and applications that take advantage of semantic web technologies in the medical field. The semantic structure of the representation renders it easy to analyse key figures of the patients and their evolution at different granularity levels.

  5. Lexical decision with pseudohomophones and reading in the semantic variant of primary progressive aphasia: A double dissociation.

    PubMed

    Boukadi, Mariem; Potvin, Karel; Macoir, Joël; Jr Laforce, Robert; Poulin, Stéphane; Brambati, Simona M; Wilson, Maximiliano A

    2016-06-01

    The co-occurrence of semantic impairment and surface dyslexia in the semantic variant of primary progressive aphasia (svPPA) has often been taken as supporting evidence for the central role of semantics in visual word processing. According to connectionist models, semantic access is needed to accurately read irregular words. They also postulate that reliance on semantics is necessary to perform the lexical decision task under certain circumstances (for example, when the stimulus list comprises pseudohomophones). In the present study, we report two svPPA cases: M.F. who presented with surface dyslexia but performed accurately on the lexical decision task with pseudohomophones, and R.L. who showed no surface dyslexia but performed below the normal range on the lexical decision task with pseudohomophones. This double dissociation between reading and lexical decision with pseudohomophones is in line with the dual-route cascaded (DRC) model of reading. According to this model, impairments in visual word processing in svPPA are not necessarily associated with the semantic deficits characterizing this disease. Our findings also call into question the central role given to semantics in visual word processing within the connectionist account. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. LDRD final report :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brost, Randolph C.; McLendon, William Clarence,

    2013-01-01

    Modeling geospatial information with semantic graphs enables search for sites of interest based on relationships between features, without requiring strong a priori models of feature shape or other intrinsic properties. Geospatial semantic graphs can be constructed from raw sensor data with suitable preprocessing to obtain a discretized representation. This report describes initial work toward extending geospatial semantic graphs to include temporal information, and initial results applying semantic graph techniques to SAR image data. We describe an efficient graph structure that includes geospatial and temporal information, which is designed to support simultaneous spatial and temporal search queries. We also report amore » preliminary implementation of feature recognition, semantic graph modeling, and graph search based on input SAR data. The report concludes with lessons learned and suggestions for future improvements.« less

  7. Model-based semantic dictionaries for medical language understanding.

    PubMed Central

    Rassinoux, A. M.; Baud, R. H.; Ruch, P.; Trombert-Paviot, B.; Rodrigues, J. M.

    1999-01-01

    Semantic dictionaries are emerging as a major cornerstone towards achieving sound natural language understanding. Indeed, they constitute the main bridge between words and conceptual entities that reflect their meanings. Nowadays, more and more wide-coverage lexical dictionaries are electronically available in the public domain. However, associating a semantic content with lexical entries is not a straightforward task as it is subordinate to the existence of a fine-grained concept model of the treated domain. This paper presents the benefits and pitfalls in building and maintaining multilingual dictionaries, the semantics of which is directly established on an existing concept model. Concrete cases, handled through the GALEN-IN-USE project, illustrate the use of such semantic dictionaries for the analysis and generation of multilingual surgical procedures. PMID:10566333

  8. Towards Semantic Modelling of Business Processes for Networked Enterprises

    NASA Astrophysics Data System (ADS)

    Furdík, Karol; Mach, Marián; Sabol, Tomáš

    The paper presents an approach to the semantic modelling and annotation of business processes and information resources, as it was designed within the FP7 ICT EU project SPIKE to support creation and maintenance of short-term business alliances and networked enterprises. A methodology for the development of the resource ontology, as a shareable knowledge model for semantic description of business processes, is proposed. Systematically collected user requirements, conceptual models implied by the selected implementation platform as well as available ontology resources and standards are employed in the ontology creation. The process of semantic annotation is described and illustrated using an example taken from a real application case.

  9. Towards a Theory of Semantic Communication (Extended Technical Report)

    DTIC Science & Technology

    2011-03-01

    counting models of a sentence, when interpretations have different probabilities, what matters is the total probability of models of the sentence, not...of classic logics still hold in the LP semantics, e.g., De Morgan’s laws. However, modus pollens does hold in the LP semantics 10 F. Relation to

  10. Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning.

    PubMed

    Qiao, Hong; Li, Yinlin; Li, Fengfu; Xi, Xuanyang; Wu, Wei

    2016-10-01

    Recently, many biologically inspired visual computational models have been proposed. The design of these models follows the related biological mechanisms and structures, and these models provide new solutions for visual recognition tasks. In this paper, based on the recent biological evidence, we propose a framework to mimic the active and dynamic learning and recognition process of the primate visual cortex. From principle point of view, the main contributions are that the framework can achieve unsupervised learning of episodic features (including key components and their spatial relations) and semantic features (semantic descriptions of the key components), which support higher level cognition of an object. From performance point of view, the advantages of the framework are as follows: 1) learning episodic features without supervision-for a class of objects without a prior knowledge, the key components, their spatial relations and cover regions can be learned automatically through a deep neural network (DNN); 2) learning semantic features based on episodic features-within the cover regions of the key components, the semantic geometrical values of these components can be computed based on contour detection; 3) forming the general knowledge of a class of objects-the general knowledge of a class of objects can be formed, mainly including the key components, their spatial relations and average semantic values, which is a concise description of the class; and 4) achieving higher level cognition and dynamic updating-for a test image, the model can achieve classification and subclass semantic descriptions. And the test samples with high confidence are selected to dynamically update the whole model. Experiments are conducted on face images, and a good performance is achieved in each layer of the DNN and the semantic description learning process. Furthermore, the model can be generalized to recognition tasks of other objects with learning ability.

  11. Towards a Semantically-Enabled Control Strategy for Building Simulations: Integration of Semantic Technologies and Model Predictive Control

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Delgoshaei, Parastoo; Austin, Mark A.; Pertzborn, Amanda J.

    State-of-the-art building simulation control methods incorporate physical constraints into their mathematical models, but omit implicit constraints associated with policies of operation and dependency relationships among rules representing those constraints. To overcome these shortcomings, there is a recent trend in enabling the control strategies with inference-based rule checking capabilities. One solution is to exploit semantic web technologies in building simulation control. Such approaches provide the tools for semantic modeling of domains, and the ability to deduce new information based on the models through use of Description Logic (DL). In a step toward enabling this capability, this paper presents a cross-disciplinary data-drivenmore » control strategy for building energy management simulation that integrates semantic modeling and formal rule checking mechanisms into a Model Predictive Control (MPC) formulation. The results show that MPC provides superior levels of performance when initial conditions and inputs are derived from inference-based rules.« less

  12. Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics.

    PubMed

    Marelli, Marco; Baroni, Marco

    2015-07-01

    The present work proposes a computational model of morpheme combination at the meaning level. The model moves from the tenets of distributional semantics, and assumes that word meanings can be effectively represented by vectors recording their co-occurrence with other words in a large text corpus. Given this assumption, affixes are modeled as functions (matrices) mapping stems onto derived forms. Derived-form meanings can be thought of as the result of a combinatorial procedure that transforms the stem vector on the basis of the affix matrix (e.g., the meaning of nameless is obtained by multiplying the vector of name with the matrix of -less). We show that this architecture accounts for the remarkable human capacity of generating new words that denote novel meanings, correctly predicting semantic intuitions about novel derived forms. Moreover, the proposed compositional approach, once paired with a whole-word route, provides a new interpretative framework for semantic transparency, which is here partially explained in terms of ease of the combinatorial procedure and strength of the transformation brought about by the affix. Model-based predictions are in line with the modulation of semantic transparency on explicit intuitions about existing words, response times in lexical decision, and morphological priming. In conclusion, we introduce a computational model to account for morpheme combination at the meaning level. The model is data-driven, theoretically sound, and empirically supported, and it makes predictions that open new research avenues in the domain of semantic processing. (PsycINFO Database Record (c) 2015 APA, all rights reserved).

  13. ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository.

    PubMed

    Dugas, Martin; Meidt, Alexandra; Neuhaus, Philipp; Storck, Michael; Varghese, Julian

    2016-06-01

    The volume and complexity of patient data - especially in personalised medicine - is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare. Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don't provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository. A web-based tool called ODMedit ( https://odmeditor.uni-muenster.de/ ) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal. Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.

  14. No one way ticket from orthography to semantics in recognition memory: N400 and P200 effects of associations.

    PubMed

    Stuellein, Nicole; Radach, Ralph R; Jacobs, Arthur M; Hofmann, Markus J

    2016-05-15

    Computational models of word recognition already successfully used associative spreading from orthographic to semantic levels to account for false memories. But can they also account for semantic effects on event-related potentials in a recognition memory task? To address this question, target words in the present study had either many or few semantic associates in the stimulus set. We found larger P200 amplitudes and smaller N400 amplitudes for old words in comparison to new words. Words with many semantic associates led to larger P200 amplitudes and a smaller N400 in comparison to words with a smaller number of semantic associations. We also obtained inverted response time and accuracy effects for old and new words: faster response times and fewer errors were found for old words that had many semantic associates, whereas new words with a large number of semantic associates produced slower response times and more errors. Both behavioral and electrophysiological results indicate that semantic associations between words can facilitate top-down driven lexical access and semantic integration in recognition memory. Our results support neurophysiologically plausible predictions of the Associative Read-Out Model, which suggests top-down connections from semantic to orthographic layers. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. An XML-Based Mission Command Language for Autonomous Underwater Vehicles (AUVs)

    DTIC Science & Technology

    2003-06-01

    P. XML: How To Program . Prentice Hall, Inc. Upper Saddle River, New Jersey, 2001 Digital Signature Activity Statement, W3C www.w3.org/Signature...languages because it does not directly specify how information is to be presented, but rather defines the structure (and thus semantics) of the...command and control (C2) aspects of using XML to increase the utility of AUVs. XML programming will be addressed. Current mine warfare doctrine will be

  16. Medical Concept Normalization in Social Media Posts with Recurrent Neural Networks.

    PubMed

    Tutubalina, Elena; Miftahutdinov, Zulfat; Nikolenko, Sergey; Malykh, Valentin

    2018-06-12

    Text mining of scientific libraries and social media has already proven itself as a reliable tool for drug repurposing and hypothesis generation. The task of mapping a disease mention to a concept in a controlled vocabulary, typically to the standard thesaurus in the Unified Medical Language System (UMLS), is known as medical concept normalization. This task is challenging due to the differences in the use of medical terminology between health care professionals and social media texts coming from the lay public. To bridge this gap, we use sequence learning with recurrent neural networks and semantic representation of one- or multi-word expressions: we develop end-to-end architectures directly tailored to the task, including bidirectional Long Short-Term Memory, Gated Recurrent Units with an attention mechanism, and additional semantic similarity features based on UMLS. Our evaluation against a standard benchmark shows that recurrent neural networks improve results over an effective baseline for classification based on convolutional neural networks. A qualitative examination of mentions discovered in a dataset of user reviews collected from popular online health information platforms as well as a quantitative evaluation both show improvements in the semantic representation of health-related expressions in social media. Copyright © 2018. Published by Elsevier Inc.

  17. Formal Semantics and Implementation of BPMN 2.0 Inclusive Gateways

    NASA Astrophysics Data System (ADS)

    Christiansen, David Raymond; Carbone, Marco; Hildebrandt, Thomas

    We present the first direct formalization of the semantics of inclusive gateways as described in the Business Process Modeling Notation (BPMN) 2.0 Beta 1 specification. The formal semantics is given for a minimal subset of BPMN 2.0 containing just the inclusive and exclusive gateways and the start and stop events. By focusing on this subset we achieve a simple graph model that highlights the particular non-local features of the inclusive gateway semantics. We sketch two ways of implementing the semantics using algorithms based on incrementally updated data structures and also discuss distributed communication-based implementations of the two algorithms.

  18. Document cards: a top trumps visualization for documents.

    PubMed

    Strobelt, Hendrik; Oelke, Daniela; Rohrdantz, Christian; Stoffel, Andreas; Keim, Daniel A; Deussen, Oliver

    2009-01-01

    Finding suitable, less space consuming views for a document's main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document's key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images.

  19. A "Semantic" View of Scientific Models for Science Education

    ERIC Educational Resources Information Center

    Adúriz-Bravo, Agustín

    2013-01-01

    In this paper I inspect a "semantic" view of scientific models taken from contemporary philosophy of science-I draw upon the so-called "semanticist family", which frontally challenges the received, syntactic conception of scientific theories. I argue that a semantic view may be of use both for science education in the…

  20. Learning the Semantics of Structured Data Sources

    ERIC Educational Resources Information Center

    Taheriyan, Mohsen

    2015-01-01

    Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data, however, they rarely provide a semantic model to describe their contents. Semantic models of data sources capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a…

  1. A DNA-based semantic fusion model for remote sensing data.

    PubMed

    Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H

    2013-01-01

    Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology.

  2. A DNA-Based Semantic Fusion Model for Remote Sensing Data

    PubMed Central

    Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H.

    2013-01-01

    Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology. PMID:24116207

  3. Modeling the N400 ERP component as transient semantic over-activation within a neural network model of word comprehension

    PubMed Central

    Cheyette, Samuel J.; Plaut, David C.

    2016-01-01

    The study of the N400 event-related brain potential has provided fundamental insights into the nature of real-time comprehension processes, and its amplitude is modulated by a wide variety of stimulus and context factors. It is generally thought to reflect the difficulty of semantic access, but formulating a precise characterization of this process has proved difficult. Laszlo and colleagues (Laszlo & Plaut, 2012, Brain and Language, 120, 271-281; Laszlo & Armstrong, 2014, Brain and Language, 132, 22-27) used physiologically constrained neural networks to model the N400 as transient over-activation within semantic representations, arising as a consequence of the distribution of excitation and inhibition within and between cortical areas. The current work extends this approach to successfully model effects on both N400 amplitudes and behavior of word frequency, semantic richness, repetition, semantic and associative priming, and orthographic neighborhood size. The account is argued to be preferable to one based on “implicit semantic prediction error” (Rabovsky & McRae, 2014, Cognition, 132, 68-98) for a number of reasons, the most fundamental of which is that the current model actually produces N400-like waveforms in its real-time activation dynamics. PMID:27871623

  4. Modeling the N400 ERP component as transient semantic over-activation within a neural network model of word comprehension.

    PubMed

    Cheyette, Samuel J; Plaut, David C

    2017-05-01

    The study of the N400 event-related brain potential has provided fundamental insights into the nature of real-time comprehension processes, and its amplitude is modulated by a wide variety of stimulus and context factors. It is generally thought to reflect the difficulty of semantic access, but formulating a precise characterization of this process has proved difficult. Laszlo and colleagues (Laszlo & Plaut, 2012; Laszlo & Armstrong, 2014) used physiologically constrained neural networks to model the N400 as transient over-activation within semantic representations, arising as a consequence of the distribution of excitation and inhibition within and between cortical areas. The current work extends this approach to successfully model effects on both N400 amplitudes and behavior of word frequency, semantic richness, repetition, semantic and associative priming, and orthographic neighborhood size. The account is argued to be preferable to one based on "implicit semantic prediction error" (Rabovsky & McRae, 2014) for a number of reasons, the most fundamental of which is that the current model actually produces N400-like waveforms in its real-time activation dynamics. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Predication-based semantic indexing: permutations as a means to encode predications in semantic space.

    PubMed

    Cohen, Trevor; Schvaneveldt, Roger W; Rindflesch, Thomas C

    2009-11-14

    Corpus-derived distributional models of semantic distance between terms have proved useful in a number of applications. For both theoretical and practical reasons, it is desirable to extend these models to encode discrete concepts and the ways in which they are related to one another. In this paper, we present a novel vector space model that encodes semantic predications derived from MEDLINE by the SemRep system into a compact spatial representation. The associations captured by this method are of a different and complementary nature to those derived by traditional vector space models, and the encoding of predication types presents new possibilities for knowledge discovery and information retrieval.

  6. Semantic Boost on Episodic Associations: An Empirically-Based Computational Model

    ERIC Educational Resources Information Center

    Silberman, Yaron; Bentin, Shlomo; Miikkulainen, Risto

    2007-01-01

    Words become associated following repeated co-occurrence episodes. This process might be further determined by the semantic characteristics of the words. The present study focused on how semantic and episodic factors interact in incidental formation of word associations. First, we found that human participants associate semantically related words…

  7. Semantic Search of Web Services

    ERIC Educational Resources Information Center

    Hao, Ke

    2013-01-01

    This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…

  8. Semantic Weight and Verb Retrieval in Aphasia

    ERIC Educational Resources Information Center

    Barde, Laura H. F.; Schwartz, Myrna F.; Boronat, Consuelo B.

    2006-01-01

    Individuals with agrammatic aphasia may have difficulty with verb production in comparison to nouns. Additionally, they may have greater difficulty producing verbs that have fewer semantic components (i.e., are semantically "light") compared to verbs that have greater semantic weight. A connectionist verb-production model proposed by Gordon and…

  9. An Intelligent Semantic E-Learning Framework Using Context-Aware Semantic Web Technologies

    ERIC Educational Resources Information Center

    Huang, Weihong; Webster, David; Wood, Dawn; Ishaya, Tanko

    2006-01-01

    Recent developments of e-learning specifications such as Learning Object Metadata (LOM), Sharable Content Object Reference Model (SCORM), Learning Design and other pedagogy research in semantic e-learning have shown a trend of applying innovative computational techniques, especially Semantic Web technologies, to promote existing content-focused…

  10. Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions.

    PubMed

    Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen

    2017-09-25

    In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.

  11. Spreading Activation in an Attractor Network with Latching Dynamics: Automatic Semantic Priming Revisited

    PubMed Central

    Lerner, Itamar; Bentin, Shlomo; Shriki, Oren

    2012-01-01

    Localist models of spreading activation (SA) and models assuming distributed-representations offer very different takes on semantic priming, a widely investigated paradigm in word recognition and semantic memory research. In the present study we implemented SA in an attractor neural network model with distributed representations and created a unified framework for the two approaches. Our models assumes a synaptic depression mechanism leading to autonomous transitions between encoded memory patterns (latching dynamics), which account for the major characteristics of automatic semantic priming in humans. Using computer simulations we demonstrated how findings that challenged attractor-based networks in the past, such as mediated and asymmetric priming, are a natural consequence of our present model’s dynamics. Puzzling results regarding backward priming were also given a straightforward explanation. In addition, the current model addresses some of the differences between semantic and associative relatedness and explains how these differences interact with stimulus onset asynchrony in priming experiments. PMID:23094718

  12. Establishing semantic interoperability of biomedical metadata registries using extended semantic relationships.

    PubMed

    Park, Yu Rang; Yoon, Young Jo; Kim, Hye Hyeon; Kim, Ju Han

    2013-01-01

    Achieving semantic interoperability is critical for biomedical data sharing between individuals, organizations and systems. The ISO/IEC 11179 MetaData Registry (MDR) standard has been recognized as one of the solutions for this purpose. The standard model, however, is limited. Representing concepts consist of two or more values, for instance, are not allowed including blood pressure with systolic and diastolic values. We addressed the structural limitations of ISO/IEC 11179 by an integrated metadata object model in our previous research. In the present study, we introduce semantic extensions for the model by defining three new types of semantic relationships; dependency, composite and variable relationships. To evaluate our extensions in a real world setting, we measured the efficiency of metadata reduction by means of mapping to existing others. We extracted metadata from the College of American Pathologist Cancer Protocols and then evaluated our extensions. With no semantic loss, one third of the extracted metadata could be successfully eliminated, suggesting better strategy for implementing clinical MDRs with improved efficiency and utility.

  13. Semantic concept-enriched dependence model for medical information retrieval.

    PubMed

    Choi, Sungbin; Choi, Jinwook; Yoo, Sooyoung; Kim, Heechun; Lee, Youngho

    2014-02-01

    In medical information retrieval research, semantic resources have been mostly used by expanding the original query terms or estimating the concept importance weight. However, implicit term-dependency information contained in semantic concept terms has been overlooked or at least underused in most previous studies. In this study, we incorporate a semantic concept-based term-dependence feature into a formal retrieval model to improve its ranking performance. Standardized medical concept terms used by medical professionals were assumed to have implicit dependency within the same concept. We hypothesized that, by elaborately revising the ranking algorithms to favor documents that preserve those implicit dependencies, the ranking performance could be improved. The implicit dependence features are harvested from the original query using MetaMap. These semantic concept-based dependence features were incorporated into a semantic concept-enriched dependence model (SCDM). We designed four different variants of the model, with each variant having distinct characteristics in the feature formulation method. We performed leave-one-out cross validations on both a clinical document corpus (TREC Medical records track) and a medical literature corpus (OHSUMED), which are representative test collections in medical information retrieval research. Our semantic concept-enriched dependence model consistently outperformed other state-of-the-art retrieval methods. Analysis shows that the performance gain has occurred independently of the concept's explicit importance in the query. By capturing implicit knowledge with regard to the query term relationships and incorporating them into a ranking model, we could build a more robust and effective retrieval model, independent of the concept importance. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    PubMed Central

    Liu, Hongfang; Hu, Zhang-Zhi; Wu, Cathy H

    2005-01-01

    Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website . PMID:16091147

  15. Generic Information Can Retrieve Known Biological Associations: Implications for Biomedical Knowledge Discovery

    PubMed Central

    van Haagen, Herman H. H. B. M.; 't Hoen, Peter A. C.; Mons, Barend; Schultes, Erik A.

    2013-01-01

    Motivation Weighted semantic networks built from text-mined literature can be used to retrieve known protein-protein or gene-disease associations, and have been shown to anticipate associations years before they are explicitly stated in the literature. Our text-mining system recognizes over 640,000 biomedical concepts: some are specific (i.e., names of genes or proteins) others generic (e.g., ‘Homo sapiens’). Generic concepts may play important roles in automated information retrieval, extraction, and inference but may also result in concept overload and confound retrieval and reasoning with low-relevance or even spurious links. Here, we attempted to optimize the retrieval performance for protein-protein interactions (PPI) by filtering generic concepts (node filtering) or links to generic concepts (edge filtering) from a weighted semantic network. First, we defined metrics based on network properties that quantify the specificity of concepts. Then using these metrics, we systematically filtered generic information from the network while monitoring retrieval performance of known protein-protein interactions. We also systematically filtered specific information from the network (inverse filtering), and assessed the retrieval performance of networks composed of generic information alone. Results Filtering generic or specific information induced a two-phase response in retrieval performance: initially the effects of filtering were minimal but beyond a critical threshold network performance suddenly drops. Contrary to expectations, networks composed exclusively of generic information demonstrated retrieval performance comparable to unfiltered networks that also contain specific concepts. Furthermore, an analysis using individual generic concepts demonstrated that they can effectively support the retrieval of known protein-protein interactions. For instance the concept “binding” is indicative for PPI retrieval and the concept “mutation abnormality” is indicative for gene-disease associations. Conclusion Generic concepts are important for information retrieval and cannot be removed from semantic networks without negative impact on retrieval performance. PMID:24260124

  16. The semantic-similarity effect in children: influence of long-term knowledge on verbal short-term memory.

    PubMed

    Monnier, Catherine; Bonthoux, Françoise

    2011-11-01

    The present research was designed to highlight the relation between children's categorical knowledge and their verbal short-term memory (STM) performance. To do this, we manipulated the categorical organization of the words composing lists to be memorized by 5- and 9-year-old children. Three types of word list were drawn up: semantically similar context-dependent (CD) lists, semantically similar context-independent (CI) lists, and semantically dissimilar lists. In line with the procedure used by Poirier and Saint-Aubin (1995), the dissimilar lists were produced using words from the semantically similar lists. Both 5- and 9-year-old children showed better recall for the semantically similar CD lists than they did for the unrelated lists. In the semantic similar CI condition, semantic similarity enhanced immediate serial recall only at age 9 but contributed to item information memory both at ages 5 and 9. These results, which indicate a semantic influence of long-term memory (LTM) on serial recall from age 5, are discussed in the light of current models of STM. Moreover, we suggest that differences between results at 5 and 9 years are compatible with pluralist models of development. ©2011 The British Psychological Society.

  17. Jointly learning word embeddings using a corpus and a knowledge base

    PubMed Central

    Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi

    2018-01-01

    Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052

  18. Word Recognition is Affected by the Meaning of Orthographic Neighbours: Evidence from Semantic Decision Tasks

    ERIC Educational Resources Information Center

    Boot, Inge; Pecher, Diane

    2008-01-01

    Many models of word recognition predict that neighbours of target words will be activated during word processing. Cascaded models can make the additional prediction that semantic features of those neighbours get activated before the target has been uniquely identified. In two semantic decision tasks neighbours that were congruent (i.e., from the…

  19. Determinants of Multiple Semantic Priming: A Meta-Analysis and Spike Frequency Adaptive Model of a Cortical Network

    ERIC Educational Resources Information Center

    Lavigne, Frederic; Dumercy, Laurent; Darmon, Nelly

    2011-01-01

    Recall and language comprehension while processing sequences of words involves multiple semantic priming between several related and/or unrelated words. Accounting for multiple and interacting priming effects in terms of underlying neuronal structure and dynamics is a challenge for current models of semantic priming. Further elaboration of current…

  20. When Sufficiently Processed, Semantically Related Distractor Pictures Hamper Picture Naming.

    PubMed

    Matushanskaya, Asya; Mädebach, Andreas; Müller, Matthias M; Jescheniak, Jörg D

    2016-11-01

    Prominent speech production models view lexical access as a competitive process. According to these models, a semantically related distractor picture should interfere with target picture naming more strongly than an unrelated one. However, several studies failed to obtain such an effect. Here, we demonstrate that semantic interference is obtained, when the distractor picture is sufficiently processed. Participants named one of two pictures presented in close temporal succession, with color cueing the target. Experiment 1 induced the prediction that the target appears first. When this prediction was violated (distractor first), semantic interference was observed. Experiment 2 ruled out that the time available for distractor processing was the driving force. These results show that semantically related distractor pictures interfere with the naming response when they are sufficiently processed. The data thus provide further support for models viewing lexical access as a competitive process.

  1. Three-Mode Models and Individual Differences in Semantic Differential Data.

    ERIC Educational Resources Information Center

    Murakami, Takashi; Kroonenberg, Pieter M.

    2003-01-01

    Demonstrated how individual differences in semantic differential data can be modeled and assessed using three-mode models by studying the characterization of Chopin's "Preludes" by 38 Japanese college students. (SLD)

  2. The COPD Knowledge Base: enabling data analysis and computational simulation in translational COPD research.

    PubMed

    Cano, Isaac; Tényi, Ákos; Schueller, Christine; Wolff, Martin; Huertas Migueláñez, M Mercedes; Gomez-Cabrero, David; Antczak, Philipp; Roca, Josep; Cascante, Marta; Falciani, Francesco; Maier, Dieter

    2014-11-28

    Previously we generated a chronic obstructive pulmonary disease (COPD) specific knowledge base (http://www.copdknowledgebase.eu) from clinical and experimental data, text-mining results and public databases. This knowledge base allowed the retrieval of specific molecular networks together with integrated clinical and experimental data. The COPDKB has now been extended to integrate over 40 public data sources on functional interaction (e.g. signal transduction, transcriptional regulation, protein-protein interaction, gene-disease association). In addition we integrated COPD-specific expression and co-morbidity networks connecting over 6 000 genes/proteins with physiological parameters and disease states. Three mathematical models describing different aspects of systemic effects of COPD were connected to clinical and experimental data. We have completely redesigned the technical architecture of the user interface and now provide html and web browser-based access and form-based searches. A network search enables the use of interconnecting information and the generation of disease-specific sub-networks from general knowledge. Integration with the Synergy-COPD Simulation Environment enables multi-scale integrated simulation of individual computational models while integration with a Clinical Decision Support System allows delivery into clinical practice. The COPD Knowledge Base is the only publicly available knowledge resource dedicated to COPD and combining genetic information with molecular, physiological and clinical data as well as mathematical modelling. Its integrated analysis functions provide overviews about clinical trends and connections while its semantically mapped content enables complex analysis approaches. We plan to further extend the COPDKB by offering it as a repository to publish and semantically integrate data from relevant clinical trials. The COPDKB is freely available after registration at http://www.copdknowledgebase.eu.

  3. A Domain Independent Framework for Extracting Linked Semantic Data from Tables

    DTIC Science & Technology

    2012-01-01

    Evidence - based medicine , the essential role of systematic reviews, and the need for automated text mining tools. In: Proc. 1st ACM Int. Health... based medicine : what it is and what it isn’t. Bmj 312(7023), 71 (1996) 23. Sahoo, S.S., Halb, W., Hellmann, S., Idehen, K., Thibodeau Jr, T., Auer, S., Se...Journal of Artificial Intelligence Research 11(1), 95–130 (1999) 22. Sackett, D., Rosenberg, W., Gray, J., Haynes, R., Richardson, W.: Evidence

  4. Temporal Representation in Semantic Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levandoski, J J; Abdulla, G M

    2007-08-07

    A wide range of knowledge discovery and analysis applications, ranging from business to biological, make use of semantic graphs when modeling relationships and concepts. Most of the semantic graphs used in these applications are assumed to be static pieces of information, meaning temporal evolution of concepts and relationships are not taken into account. Guided by the need for more advanced semantic graph queries involving temporal concepts, this paper surveys the existing work involving temporal representations in semantic graphs.

  5. Semantic Processing Persists despite Anomalous Syntactic Category: ERP Evidence from Chinese Passive Sentences.

    PubMed

    Yang, Yang; Wu, Fuyun; Zhou, Xiaolin

    2015-01-01

    The syntax-first model and the parallel/interactive models make different predictions regarding whether syntactic category processing has a temporal and functional primacy over semantic processing. To further resolve this issue, an event-related potential experiment was conducted on 24 Chinese speakers reading Chinese passive sentences with the passive marker BEI (NP1 + BEI + NP2 + Verb). This construction was selected because it is the most-commonly used Chinese passive and very much resembles German passives, upon which the syntax-first hypothesis was primarily based. We manipulated semantic consistency (consistent vs. inconsistent) and syntactic category (noun vs. verb) of the critical verb, yielding four conditions: CORRECT (correct sentences), SEMANTIC (semantic anomaly), SYNTACTIC (syntactic category anomaly), and COMBINED (combined anomalies). Results showed both N400 and P600 effects for sentences with semantic anomaly, with syntactic category anomaly, or with combined anomalies. Converging with recent findings of Chinese ERP studies on various constructions, our study provides further evidence that syntactic category processing does not precede semantic processing in reading Chinese.

  6. SAS- Semantic Annotation Service for Geoscience resources on the web

    NASA Astrophysics Data System (ADS)

    Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

    2015-12-01

    There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.

  7. Application of the Financial Industry Business Ontology (FIBO) for development of a financial organization ontology

    NASA Astrophysics Data System (ADS)

    Petrova, G. G.; Tuzovsky, A. F.; Aksenova, N. V.

    2017-01-01

    The article considers an approach to a formalized description and meaning harmonization for financial terms and means of semantic modeling. Ontologies for the semantic models are described with the help of special languages developed for the Semantic Web. Results of FIBO application to solution of different tasks in the Russian financial sector are given.

  8. Semantic attributes based texture generation

    NASA Astrophysics Data System (ADS)

    Chi, Huifang; Gan, Yanhai; Qi, Lin; Dong, Junyu; Madessa, Amanuel Hirpa

    2018-04-01

    Semantic attributes are commonly used for texture description. They can be used to describe the information of a texture, such as patterns, textons, distributions, brightness, and so on. Generally speaking, semantic attributes are more concrete descriptors than perceptual features. Therefore, it is practical to generate texture images from semantic attributes. In this paper, we propose to generate high-quality texture images from semantic attributes. Over the last two decades, several works have been done on texture synthesis and generation. Most of them focusing on example-based texture synthesis and procedural texture generation. Semantic attributes based texture generation still deserves more devotion. Gan et al. proposed a useful joint model for perception driven texture generation. However, perceptual features are nonobjective spatial statistics used by humans to distinguish different textures in pre-attentive situations. To give more describing information about texture appearance, semantic attributes which are more in line with human description habits are desired. In this paper, we use sigmoid cross entropy loss in an auxiliary model to provide enough information for a generator. Consequently, the discriminator is released from the relatively intractable mission of figuring out the joint distribution of condition vectors and samples. To demonstrate the validity of our method, we compare our method to Gan et al.'s method on generating textures by designing experiments on PTD and DTD. All experimental results show that our model can generate textures from semantic attributes.

  9. Children and adolescents' performance on a medium-length/nonsemantic word-list test.

    PubMed

    Flores-Lázaro, Julio César; Salgado Soruco, María Alejandra; Stepanov, Igor I

    2017-01-01

    Word-list learning tasks are among the most important and frequently used tests for declarative memory evaluation. For example, the California Verbal Learning Test-Children's Version (CVLT-C) and Rey Auditory Verbal Learning Test provide important information about different cognitive-neuropsychological processes. However, the impact of test length (i.e., number of words) and semantic organization (i.e., type of words) on children's and adolescents' memory performance remains to be clarified, especially during this developmental stage. To explore whether a medium-length non-semantically organized test can produce the typical curvilinear performance that semantically organized tests produce, reflecting executive control, we studied and compared the cognitive performance of normal children and adolescents by utilizing mathematical modeling. The model is based on the first-order system transfer function and has been successfully applied to learning curves for the CVLT-C (15 words, semantically organized paradigm). Results indicate that learning nine semantically unrelated words produces typical curvilinear (executive function) performance in children and younger adolescents and that performance could be effectively analyzed with the mathematical model. This indicates that the exponential increase (curvilinear performance) of correctly learned words does not solely depend on semantic and/or length features. This type of test controls semantic and length effects and may represent complementary tools for executive function evaluation in clinical populations in which semantic and/or length processing are affected.

  10. Integrating the automatic and the controlled: Strategies in Semantic Priming in an Attractor Network with Latching Dynamics

    PubMed Central

    Lerner, Itamar; Bentin, Shlomo; Shriki, Oren

    2014-01-01

    Semantic priming has long been recognized to reflect, along with automatic semantic mechanisms, the contribution of controlled strategies. However, previous theories of controlled priming were mostly qualitative, lacking common grounds with modern mathematical models of automatic priming based on neural networks. Recently, we have introduced a novel attractor network model of automatic semantic priming with latching dynamics. Here, we extend this work to show how the same model can also account for important findings regarding controlled processes. Assuming the rate of semantic transitions in the network can be adapted using simple reinforcement learning, we show how basic findings attributed to controlled processes in priming can be achieved, including their dependency on stimulus onset asynchrony and relatedness proportion and their unique effect on associative, category-exemplar, mediated and backward prime-target relations. We discuss how our mechanism relates to the classic expectancy theory and how it can be further extended in future developments of the model. PMID:24890261

  11. Quality models for audiovisual streaming

    NASA Astrophysics Data System (ADS)

    Thang, Truong Cong; Kim, Young Suk; Kim, Cheon Seog; Ro, Yong Man

    2006-01-01

    Quality is an essential factor in multimedia communication, especially in compression and adaptation. Quality metrics can be divided into three categories: within-modality quality, cross-modality quality, and multi-modality quality. Most research has so far focused on within-modality quality. Moreover, quality is normally just considered from the perceptual perspective. In practice, content may be drastically adapted, even converted to another modality. In this case, we should consider the quality from semantic perspective as well. In this work, we investigate the multi-modality quality from the semantic perspective. To model the semantic quality, we apply the concept of "conceptual graph", which consists of semantic nodes and relations between the nodes. As an typical of multi-modality example, we focus on audiovisual streaming service. Specifically, we evaluate the amount of information conveyed by a audiovisual content where both video and audio channels may be strongly degraded, even audio are converted to text. In the experiments, we also consider the perceptual quality model of audiovisual content, so as to see the difference with semantic quality model.

  12. Predicting Word Maturity from Frequency and Semantic Diversity: A Computational Study

    ERIC Educational Resources Information Center

    Jorge-Botana, Guillermo; Olmos, Ricardo; Sanjosé, Vicente

    2017-01-01

    Semantic word representation changes over different ages of childhood until it reaches its adult form. One method to formally model this change is the word maturity paradigm. This method uses a text sample for each age, including adult age, and transforms the samples into a semantic space by means of Latent Semantic Analysis. The representation of…

  13. Multiple Meanings Are Not Necessarily a Disadvantage in Semantic Processing: Evidence from Homophone Effects in Semantic Categorisation

    ERIC Educational Resources Information Center

    Siakaluk, Paul D.; Pexman, Penny M.; Sears, Christopher R.; Owen, William J.

    2007-01-01

    The ambiguity disadvantage (slower processing of ambiguous words relative to unambiguous words) has been taken as evidence for a distributed semantic representational system like that embodied in parallel distributed processing (PDP) models. In the present study, we investigated whether semantic ambiguity slows meaning activation, as PDP models…

  14. Ambiguity and Relatedness Effects in Semantic Tasks: Are They Due to Semantic Coding?

    ERIC Educational Resources Information Center

    Hino, Yasushi; Pexman, Penny M.; Lupker, Stephen J.

    2006-01-01

    According to parallel distributed processing (PDP) models of visual word recognition, the speed of semantic coding is modulated by the nature of the orthographic-to-semantic mappings. Consistent with this idea, an ambiguity disadvantage and a relatedness-of-meaning (ROM) advantage have been reported in some word recognition tasks in which semantic…

  15. Visual cues for data mining

    NASA Astrophysics Data System (ADS)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  16. LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

    2011-01-01

    The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.

  17. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    NASA Astrophysics Data System (ADS)

    Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.

    2016-12-01

    Big oceanographic data have been produced, archived and made available online, but finding the right data for scientific research and application development is still a significant challenge. A long-standing problem in data discovery is how to find the interrelationships between keywords and data, as well as the intrarelationships of the two individually. Most previous research attempted to solve this problem by building domain-specific ontology either manually or through automatic machine learning techniques. The former is costly, labor intensive and hard to keep up-to-date, while the latter is prone to noise and may be difficult for human to understand. Large-scale user behavior data modelling represents a largely untapped, unique, and valuable source for discovering semantic relationships among domain-specific vocabulary. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, user behaviors, and existing ontology. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the proposed search engine helps both scientists and general users search with better ranking results, recommendation, and ontology navigation.

  18. A Practical Approach to Implementing Real-Time Semantics

    NASA Technical Reports Server (NTRS)

    Luettgen, Gerald; Bhat, Girish; Cleaveland, Rance

    1999-01-01

    This paper investigates implementations of process algebras which are suitable for modeling concurrent real-time systems. It suggests an approach for efficiently implementing real-time semantics using dynamic priorities. For this purpose a proces algebra with dynamic priority is defined, whose semantics corresponds one-to-one to traditional real-time semantics. The advantage of the dynamic-priority approach is that it drastically reduces the state-space sizes of the systems in question while preserving all properties of their functional and real-time behavior. The utility of the technique is demonstrated by a case study which deals with the formal modeling and verification of the SCSI-2 bus-protocol. The case study is carried out in the Concurrency Workbench of North Carolina, an automated verification tool in which the process algebra with dynamic priority is implemented. It turns out that the state space of the bus-protocol model is about an order of magnitude smaller than the one resulting from real-time semantics. The accuracy of the model is proved by applying model checking for verifying several mandatory properties of the bus protocol.

  19. Model for Semantically Rich Point Cloud Data

    NASA Astrophysics Data System (ADS)

    Poux, F.; Neuville, R.; Hallot, P.; Billen, R.

    2017-10-01

    This paper proposes an interoperable model for managing high dimensional point clouds while integrating semantics. Point clouds from sensors are a direct source of information physically describing a 3D state of the recorded environment. As such, they are an exhaustive representation of the real world at every scale: 3D reality-based spatial data. Their generation is increasingly fast but processing routines and data models lack of knowledge to reason from information extraction rather than interpretation. The enhanced smart point cloud developed model allows to bring intelligence to point clouds via 3 connected meta-models while linking available knowledge and classification procedures that permits semantic injection. Interoperability drives the model adaptation to potentially many applications through specialized domain ontologies. A first prototype is implemented in Python and PostgreSQL database and allows to combine semantic and spatial concepts for basic hybrid queries on different point clouds.

  20. SemanticFind: Locating What You Want in a Patient Record, Not Just What You Ask For

    PubMed Central

    Prager, John M.; Liang, Jennifer J.; Devarakonda, Murthy V.

    2017-01-01

    We present a new model of patient record search, called SemanticFind, which goes beyond traditional textual and medical synonym matches by locating patient data that a clinician would want to see rather than just what they ask for. The new model is implemented by making extensive use of the UMLS semantic network, distributional semantics, and NLP, to match query terms along several dimensions in a patient record with the returned matches organized accordingly. The new approach finds all clinically related concepts without the user having to ask for them. An evaluation of the accuracy of SemanticFind shows that it found twice as many relevant matches compared to those found by literal (traditional) search alone, along with very high precision and recall. These results suggest potential uses for SemanticFind in clinical practice, retrospective chart reviews, and in automated extraction of quality metrics. PMID:28815139

  1. Semantic Coherence Facilitates Distributional Learning.

    PubMed

    Ouyang, Long; Boroditsky, Lera; Frank, Michael C

    2017-04-01

    Computational models have shown that purely statistical knowledge about words' linguistic contexts is sufficient to learn many properties of words, including syntactic and semantic category. For example, models can infer that "postman" and "mailman" are semantically similar because they have quantitatively similar patterns of association with other words (e.g., they both tend to occur with words like "deliver," "truck," "package"). In contrast to these computational results, artificial language learning experiments suggest that distributional statistics alone do not facilitate learning of linguistic categories. However, experiments in this paradigm expose participants to entirely novel words, whereas real language learners encounter input that contains some known words that are semantically organized. In three experiments, we show that (a) the presence of familiar semantic reference points facilitates distributional learning and (b) this effect crucially depends both on the presence of known words and the adherence of these known words to some semantic organization. Copyright © 2016 Cognitive Science Society, Inc.

  2. A model-driven approach for representing clinical archetypes for Semantic Web environments.

    PubMed

    Martínez-Costa, Catalina; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás; Maldonado, José Alberto

    2009-02-01

    The life-long clinical information of any person supported by electronic means configures his Electronic Health Record (EHR). This information is usually distributed among several independent and heterogeneous systems that may be syntactically or semantically incompatible. There are currently different standards for representing and exchanging EHR information among different systems. In advanced EHR approaches, clinical information is represented by means of archetypes. Most of these approaches use the Archetype Definition Language (ADL) to specify archetypes. However, ADL has some drawbacks when attempting to perform semantic activities in Semantic Web environments. In this work, Semantic Web technologies are used to specify clinical archetypes for advanced EHR architectures. The advantages of using the Ontology Web Language (OWL) instead of ADL are described and discussed in this work. Moreover, a solution combining Semantic Web and Model-driven Engineering technologies is proposed to transform ADL into OWL for the CEN EN13606 EHR architecture.

  3. Attractor Dynamics and Semantic Neighborhood Density: Processing Is Slowed by Near Neighbors and Speeded by Distant Neighbors

    PubMed Central

    Mirman, Daniel; Magnuson, James S.

    2008-01-01

    The authors investigated semantic neighborhood density effects on visual word processing to examine the dynamics of activation and competition among semantic representations. Experiment 1 validated feature-based semantic representations as a basis for computing semantic neighborhood density and suggested that near and distant neighbors have opposite effects on word processing. Experiment 2 confirmed these results: Word processing was slower for dense near neighborhoods and faster for dense distant neighborhoods. Analysis of a computational model showed that attractor dynamics can produce this pattern of neighborhood effects. The authors argue for reconsideration of traditional models of neighborhood effects in terms of attractor dynamics, which allow both inhibitory and facilitative effects to emerge. PMID:18194055

  4. Cognitive search model and a new query paradigm

    NASA Astrophysics Data System (ADS)

    Xu, Zhonghui

    2001-06-01

    This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.

  5. Constraint-Based Abstract Semantics for Temporal Logic: A Direct Approach to Design and Implementation

    NASA Astrophysics Data System (ADS)

    Banda, Gourinath; Gallagher, John P.

    interpretation provides a practical approach to verifying properties of infinite-state systems. We apply the framework of abstract interpretation to derive an abstract semantic function for the modal μ-calculus, which is the basis for abstract model checking. The abstract semantic function is constructed directly from the standard concrete semantics together with a Galois connection between the concrete state-space and an abstract domain. There is no need for mixed or modal transition systems to abstract arbitrary temporal properties, as in previous work in the area of abstract model checking. Using the modal μ-calculus to implement CTL, the abstract semantics gives an over-approximation of the set of states in which an arbitrary CTL formula holds. Then we show that this leads directly to an effective implementation of an abstract model checking algorithm for CTL using abstract domains based on linear constraints. The implementation of the abstract semantic function makes use of an SMT solver. We describe an implemented system for proving properties of linear hybrid automata and give some experimental results.

  6. Data Mining and Knowledge Discover - IBM Cognitive Alternatives for NASA KSC

    NASA Technical Reports Server (NTRS)

    Velez, Victor Hugo

    2016-01-01

    Skillful tools in cognitive computing to transform industries have been found favorable and profitable for different Directorates at NASA KSC. In this study is shown how cognitive computing systems can be useful for NASA when computers are trained in the same way as humans are to gain knowledge over time. Increasing knowledge through senses, learning and a summation of events is how the applications created by the firm IBM empower the artificial intelligence in a cognitive computing system. NASA has explored and applied for the last decades the artificial intelligence approach specifically with cognitive computing in few projects adopting similar models proposed by IBM Watson. However, the usage of semantic technologies by the dedicated business unit developed by IBM leads these cognitive computing applications to outperform the functionality of the inner tools and present outstanding analysis to facilitate the decision making for managers and leads in a management information system.

  7. Situation Tracking in Large Data Streams

    DTIC Science & Technology

    2015-02-01

    Pike, R. 1989: Different Ways to Cue a Coherent Memory System: A Theory for Episodic , Semantic , and Procedural Tasks. Psychological Review, 96(2), 208...to extract general concept models, such as Latent Semantic Indexing. This technique generally extracts a single topic model, and does not extract a...2001. Semantic leaps: frame-shifting and conceptual blending in meaning construction. Cambridge University Press. Humphreys, M. S, Bain, J. D

  8. Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.

    PubMed

    Jiang, Guoqian; Evans, Julie; Endle, Cory M; Solbrig, Harold R; Chute, Christopher G

    2016-01-01

    The Biomedical Research Integrated Domain Group (BRIDG) model is a formal domain analysis model for protocol-driven biomedical research, and serves as a semantic foundation for application and message development in the standards developing organizations (SDOs). The increasing sophistication and complexity of the BRIDG model requires new approaches to the management and utilization of the underlying semantics to harmonize domain-specific standards. The objective of this study is to develop and evaluate a Semantic Web-based approach that integrates the BRIDG model with ISO 21090 data types to generate domain-specific templates to support clinical study metadata standards development. We developed a template generation and visualization system based on an open source Resource Description Framework (RDF) store backend, a SmartGWT-based web user interface, and a "mind map" based tool for the visualization of generated domain-specific templates. We also developed a RESTful Web Service informed by the Clinical Information Modeling Initiative (CIMI) reference model for access to the generated domain-specific templates. A preliminary usability study is performed and all reviewers (n = 3) had very positive responses for the evaluation questions in terms of the usability and the capability of meeting the system requirements (with the average score of 4.6). Semantic Web technologies provide a scalable infrastructure and have great potential to enable computable semantic interoperability of models in the intersection of health care and clinical research.

  9. The Syntax and Semantics of ERICA. Technical Report No. 185, Psychology and Education Series.

    ERIC Educational Resources Information Center

    Smith, Robert Lawrence, Jr.

    This report is a detailed empirical examination of Suppes' ideas about the syntax and semantics of natural language, and an attempt at supporting the proposal that model-theoretic semantics of the type first proposed by Tarski is a useful tool for understanding the semantics of natural language. Child speech was selected as the best place to find…

  10. Mimicking Aphasic Semantic Errors in Normal Speech Production: Evidence from a Novel Experimental Paradigm

    ERIC Educational Resources Information Center

    Hodgson, Catherine; Lambon Ralph, Matthew A.

    2008-01-01

    Semantic errors are commonly found in semantic dementia (SD) and some forms of stroke aphasia and provide insights into semantic processing and speech production. Low error rates are found in standard picture naming tasks in normal controls. In order to increase error rates and thus provide an experimental model of aphasic performance, this study…

  11. A logical approach to semantic interoperability in healthcare.

    PubMed

    Bird, Linda; Brooks, Colleen; Cheong, Yu Chye; Tun, Nwe Ni

    2011-01-01

    Singapore is in the process of rolling out a number of national e-health initiatives, including the National Electronic Health Record (NEHR). A critical enabler in the journey towards semantic interoperability is a Logical Information Model (LIM) that harmonises the semantics of the information structure with the terminology. The Singapore LIM uses a combination of international standards, including ISO 13606-1 (a reference model for electronic health record communication), ISO 21090 (healthcare datatypes), and SNOMED CT (healthcare terminology). The LIM is accompanied by a logical design approach, used to generate interoperability artifacts, and incorporates mechanisms for achieving unidirectional and bidirectional semantic interoperability.

  12. Multiple Regions of a Cortical Network Commonly Encode the Meaning of Words in Multiple Grammatical Positions of Read Sentences.

    PubMed

    Anderson, Andrew James; Lalor, Edmund C; Lin, Feng; Binder, Jeffrey R; Fernandino, Leonardo; Humphries, Colin J; Conant, Lisa L; Raizada, Rajeev D S; Grimm, Scott; Wang, Xixi

    2018-05-16

    Deciphering how sentence meaning is represented in the brain remains a major challenge to science. Semantically related neural activity has recently been shown to arise concurrently in distributed brain regions as successive words in a sentence are read. However, what semantic content is represented by different regions, what is common across them, and how this relates to words in different grammatical positions of sentences is weakly understood. To address these questions, we apply a semantic model of word meaning to interpret brain activation patterns elicited in sentence reading. The model is based on human ratings of 65 sensory/motor/emotional and cognitive features of experience with words (and their referents). Through a process of mapping functional Magnetic Resonance Imaging activation back into model space we test: which brain regions semantically encode content words in different grammatical positions (e.g., subject/verb/object); and what semantic features are encoded by different regions. In left temporal, inferior parietal, and inferior/superior frontal regions we detect the semantic encoding of words in all grammatical positions tested and reveal multiple common components of semantic representation. This suggests that sentence comprehension involves a common core representation of multiple words' meaning being encoded in a network of regions distributed across the brain.

  13. SATware: A Semantic Approach for Building Sentient Spaces

    NASA Astrophysics Data System (ADS)

    Massaguer, Daniel; Mehrotra, Sharad; Vaisenberg, Ronen; Venkatasubramanian, Nalini

    This chapter describes the architecture of a semantic-based middleware environment for building sensor-driven sentient spaces. The proposed middleware explicitly models sentient space semantics (i.e., entities, spaces, activities) and supports mechanisms to map sensor observations to the state of the sentient space. We argue how such a semantic approach provides a powerful programming environment for building sensor spaces. In addition, the approach provides natural ways to exploit semantics for variety of purposes including scheduling under resource constraints and sensor recalibration.

  14. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    PubMed

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  15. Modeling and mining term association for improving biomedical information retrieval performance.

    PubMed

    Hu, Qinmin; Huang, Jimmy Xiangji; Hu, Xiaohua

    2012-06-11

    The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

  16. Modeling and mining term association for improving biomedical information retrieval performance

    PubMed Central

    2012-01-01

    Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages. PMID:22901087

  17. LinkEHR-Ed: a multi-reference model archetype editor based on formal semantics.

    PubMed

    Maldonado, José A; Moner, David; Boscá, Diego; Fernández-Breis, Jesualdo T; Angulo, Carlos; Robles, Montserrat

    2009-08-01

    To develop a powerful archetype editing framework capable of handling multiple reference models and oriented towards the semantic description and standardization of legacy data. The main prerequisite for implementing tools providing enhanced support for archetypes is the clear specification of archetype semantics. We propose a formalization of the definition section of archetypes based on types over tree-structured data. It covers the specialization of archetypes, the relationship between reference models and archetypes and conformance of data instances to archetypes. LinkEHR-Ed, a visual archetype editor based on the former formalization with advanced processing capabilities that supports multiple reference models, the editing and semantic validation of archetypes, the specification of mappings to data sources, and the automatic generation of data transformation scripts, is developed. LinkEHR-Ed is a useful tool for building, processing and validating archetypes based on any reference model.

  18. The construction of meaning.

    PubMed

    Kintsch, Walter; Mangalath, Praful

    2011-04-01

    We argue that word meanings are not stored in a mental lexicon but are generated in the context of working memory from long-term memory traces that record our experience with words. Current statistical models of semantics, such as latent semantic analysis and the Topic model, describe what is stored in long-term memory. The CI-2 model describes how this information is used to construct sentence meanings. This model is a dual-memory model, in that it distinguishes between a gist level and an explicit level. It also incorporates syntactic information about how words are used, derived from dependency grammar. The construction of meaning is conceptualized as feature sampling from the explicit memory traces, with the constraint that the sampling must be contextually relevant both semantically and syntactically. Semantic relevance is achieved by sampling topically relevant features; local syntactic constraints as expressed by dependency relations ensure syntactic relevance. Copyright © 2010 Cognitive Science Society, Inc.

  19. Semantic modeling and structural synthesis of onboard electronics protection means as open information system

    NASA Astrophysics Data System (ADS)

    Zhevnerchuk, D. V.; Surkova, A. S.; Lomakina, L. S.; Golubev, A. S.

    2018-05-01

    The article describes the component representation approach and semantic models of on-board electronics protection from ionizing radiation of various nature. Semantic models are constructed, the feature of which is the representation of electronic elements, protection modules, sources of impact in the form of blocks with interfaces. The rules of logical inference and algorithms for synthesizing the object properties of the semantic network, imitating the interface between the components of the protection system and the sources of radiation, are developed. The results of the algorithm are considered using the example of radiation-resistant microcircuits 1645RU5U, 1645RT2U and the calculation and experimental method for estimating the durability of on-board electronics.

  20. Neural correlates of lexical-semantic memory: A voxel-based morphometry study in mild AD, aMCI and normal aging

    PubMed Central

    Balthazar, Marcio L.F.; Yasuda, Clarissa L.; Lopes, Tátila M.; Pereira, Fabrício R.S.; Damasceno, Benito Pereira; Cendes, Fernando

    2011-01-01

    Neuroanatomical correlations of naming and lexical-semantic memory are not yet fully understood. The most influential approaches share the view that semantic representations reflect the manner in which information has been acquired through perception and action, and that each brain area processes different modalities of semantic representations. Despite these anatomical differences in semantic processing, generalization across different features that have similar semantic significance is one of the main characteristics of human cognition. Methods We evaluated the brain regions related to naming, and to the semantic generalization, of visually presented drawings of objects from the Boston Naming Test (BNT), which comprises different categories, such as animals, vegetables, tools, food, and furniture. In order to create a model of lesion method, a sample of 48 subjects presenting with a continuous decline both in cognitive functions, including naming skills, and in grey matter density (GMD) was compared to normal young adults with normal aging, amnestic mild cognitive impairment (aMCI) and mild Alzheimer’s disease (AD). Semantic errors on the BNT, as well as naming performance, were correlated with whole brain GMD as measured by voxel-based morphometry (VBM). Results The areas most strongly related to naming and to semantic errors were the medial temporal structures, thalami, superior and inferior temporal gyri, especially their anterior parts, as well as prefrontal cortices (inferior and superior frontal gyri). Conclusion The possible role of each of these areas in the lexical-semantic networks was discussed, along with their contribution to the models of semantic memory organization. PMID:29213726

  1. Investigating the capabilities of semantic enrichment of 3D CityEngine data

    NASA Astrophysics Data System (ADS)

    Solou, Dimitra; Dimopoulou, Efi

    2016-08-01

    In recent years the development of technology and the lifting of several technical limitations, has brought the third dimension to the fore. The complexity of urban environments and the strong need for land administration, intensify the need of using a three-dimensional cadastral system. Despite the progress in the field of geographic information systems and 3D modeling techniques, there is no fully digital 3D cadastre. The existing geographic information systems and the different methods of three-dimensional modeling allow for better management, visualization and dissemination of information. Nevertheless, these opportunities cannot be totally exploited because of deficiencies in standardization and interoperability in these systems. Within this context, CityGML was developed as an international standard of the Open Geospatial Consortium (OGC) for 3D city models' representation and exchange. CityGML defines geometry and topology for city modeling, also focusing on semantic aspects of 3D city information. The scope of CityGML is to reach common terminology, also addressing the imperative need for interoperability and data integration, taking into account the number of available geographic information systems and modeling techniques. The aim of this paper is to develop an application for managing semantic information of a model generated based on procedural modeling. The model was initially implemented in CityEngine ESRI's software, and then imported to ArcGIS environment. Final goal was the original model's semantic enrichment and then its conversion to CityGML format. Semantic information management and interoperability seemed to be feasible by the use of the 3DCities Project ESRI tools, since its database structure ensures adding semantic information to the CityEngine model and therefore automatically convert to CityGML for advanced analysis and visualization in different application areas.

  2. Software analysis in the semantic web

    NASA Astrophysics Data System (ADS)

    Taylor, Joshua; Hall, Robert T.

    2013-05-01

    Many approaches in software analysis, particularly dynamic malware analyis, benefit greatly from the use of linked data and other Semantic Web technology. In this paper, we describe AIS, Inc.'s Semantic Extractor (SemEx) component from the Malware Analysis and Attribution through Genetic Information (MAAGI) effort, funded under DARPA's Cyber Genome program. The SemEx generates OWL-based semantic models of high and low level behaviors in malware samples from system call traces generated by AIS's introspective hypervisor, IntroVirtTM. Within MAAGI, these semantic models were used by modules that cluster malware samples by functionality, and construct "genealogical" malware lineages. Herein, we describe the design, implementation, and use of the SemEx, as well as the C2DB, an OWL ontology used for representing software behavior and cyber-environments.

  3. Quantization, Frobenius and Bi algebras from the Categorical Framework of Quantum Mechanics to Natural Language Semantics

    NASA Astrophysics Data System (ADS)

    Sadrzadeh, Mehrnoosh

    2017-07-01

    Compact Closed categories and Frobenius and Bi algebras have been applied to model and reason about Quantum protocols. The same constructions have also been applied to reason about natural language semantics under the name: ``categorical distributional compositional'' semantics, or in short, the ``DisCoCat'' model. This model combines the statistical vector models of word meaning with the compositional models of grammatical structure. It has been applied to natural language tasks such as disambiguation, paraphrasing and entailment of phrases and sentences. The passage from the grammatical structure to vectors is provided by a functor, similar to the Quantization functor of Quantum Field Theory. The original DisCoCat model only used compact closed categories. Later, Frobenius algebras were added to it to model long distance dependancies such as relative pronouns. Recently, bialgebras have been added to the pack to reason about quantifiers. This paper reviews these constructions and their application to natural language semantics. We go over the theory and present some of the core experimental results.

  4. Contextually guided very-high-resolution imagery classification with semantic segments

    NASA Astrophysics Data System (ADS)

    Zhao, Wenzhi; Du, Shihong; Wang, Qiao; Emery, William J.

    2017-10-01

    Contextual information, revealing relationships and dependencies between image objects, is one of the most important information for the successful interpretation of very-high-resolution (VHR) remote sensing imagery. Over the last decade, geographic object-based image analysis (GEOBIA) technique has been widely used to first divide images into homogeneous parts, and then to assign semantic labels according to the properties of image segments. However, due to the complexity and heterogeneity of VHR images, segments without semantic labels (i.e., semantic-free segments) generated with low-level features often fail to represent geographic entities (such as building roofs usually be partitioned into chimney/antenna/shadow parts). As a result, it is hard to capture contextual information across geographic entities when using semantic-free segments. In contrast to low-level features, "deep" features can be used to build robust segments with accurate labels (i.e., semantic segments) in order to represent geographic entities at higher levels. Based on these semantic segments, semantic graphs can be constructed to capture contextual information in VHR images. In this paper, semantic segments were first explored with convolutional neural networks (CNN) and a conditional random field (CRF) model was then applied to model the contextual information between semantic segments. Experimental results on two challenging VHR datasets (i.e., the Vaihingen and Beijing scenes) indicate that the proposed method is an improvement over existing image classification techniques in classification performance (overall accuracy ranges from 82% to 96%).

  5. Semantic-gap-oriented active learning for multilabel image annotation.

    PubMed

    Tang, Jinhui; Zha, Zheng-Jun; Tao, Dacheng; Chua, Tat-Seng

    2012-04-01

    User interaction is an effective way to handle the semantic gap problem in image annotation. To minimize user effort in the interactions, many active learning methods were proposed. These methods treat the semantic concepts individually or correlatively. However, they still neglect the key motivation of user feedback: to tackle the semantic gap. The size of the semantic gap of each concept is an important factor that affects the performance of user feedback. User should pay more efforts to the concepts with large semantic gaps, and vice versa. In this paper, we propose a semantic-gap-oriented active learning method, which incorporates the semantic gap measure into the information-minimization-based sample selection strategy. The basic learning model used in the active learning framework is an extended multilabel version of the sparse-graph-based semisupervised learning method that incorporates the semantic correlation. Extensive experiments conducted on two benchmark image data sets demonstrated the importance of bringing the semantic gap measure into the active learning process.

  6. Algorithms and semantic infrastructure for mutation impact extraction and grounding.

    PubMed

    Laurila, Jonas B; Naderi, Nona; Witte, René; Riazanov, Alexandre; Kouznetsov, Alexandre; Baker, Christopher J O

    2010-12-02

    Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.

  7. CelOWS: an ontology based framework for the provision of semantic web services related to biological models.

    PubMed

    Matos, Ely Edison; Campos, Fernanda; Braga, Regina; Palazzi, Daniele

    2010-02-01

    The amount of information generated by biological research has lead to an intensive use of models. Mathematical and computational modeling needs accurate description to share, reuse and simulate models as formulated by original authors. In this paper, we introduce the Cell Component Ontology (CelO), expressed in OWL-DL. This ontology captures both the structure of a cell model and the properties of functional components. We use this ontology in a Web project (CelOWS) to describe, query and compose CellML models, using semantic web services. It aims to improve reuse and composition of existent components and allow semantic validation of new models.

  8. Semantically Interoperable XML Data

    PubMed Central

    Vergara-Niedermayr, Cristobal; Wang, Fusheng; Pan, Tony; Kurc, Tahsin; Saltz, Joel

    2013-01-01

    XML is ubiquitously used as an information exchange platform for web-based applications in healthcare, life sciences, and many other domains. Proliferating XML data are now managed through latest native XML database technologies. XML data sources conforming to common XML schemas could be shared and integrated with syntactic interoperability. Semantic interoperability can be achieved through semantic annotations of data models using common data elements linked to concepts from ontologies. In this paper, we present a framework and software system to support the development of semantic interoperable XML based data sources that can be shared through a Grid infrastructure. We also present our work on supporting semantic validated XML data through semantic annotations for XML Schema, semantic validation and semantic authoring of XML data. We demonstrate the use of the system for a biomedical database of medical image annotations and markups. PMID:25298789

  9. Representation of Semantic Similarity in the Left Intraparietal Sulcus: Functional Magnetic Resonance Imaging Evidence

    PubMed Central

    Neyens, Veerle; Bruffaerts, Rose; Liuzzi, Antonietta G.; Kalfas, Ioannis; Peeters, Ronald; Keuleers, Emmanuel; Vogels, Rufin; De Deyne, Simon; Storms, Gert; Dupont, Patrick; Vandenberghe, Rik

    2017-01-01

    According to a recent study, semantic similarity between concrete entities correlates with the similarity of activity patterns in left middle IPS during category naming. We examined the replicability of this effect under passive viewing conditions, the potential role of visuoperceptual similarity, where the effect is situated compared to regions that have been previously implicated in visuospatial attention, and how it compares to effects of object identity and location. Forty-six subjects participated. Subjects passively viewed pictures from two categories, musical instruments and vehicles. Semantic similarity between entities was estimated based on a concept-feature matrix obtained in more than 1,000 subjects. Visuoperceptual similarity was modeled based on the HMAX model, the AlexNet deep convolutional learning model, and thirdly, based on subjective visuoperceptual similarity ratings. Among the IPS regions examined, only left middle IPS showed a semantic similarity effect. The effect was significant in hIP1, hIP2, and hIP3. Visuoperceptual similarity did not correlate with similarity of activity patterns in left middle IPS. The semantic similarity effect in left middle IPS was significantly stronger than in the right middle IPS and also stronger than in the left or right posterior IPS. The semantic similarity effect was similar to that seen in the angular gyrus. Object identity effects were much more widespread across nearly all parietal areas examined. Location effects were relatively specific for posterior IPS and area 7 bilaterally. To conclude, the current findings replicate the semantic similarity effect in left middle IPS under passive viewing conditions, and demonstrate its anatomical specificity within a cytoarchitectonic reference frame. We propose that the semantic similarity effect in left middle IPS reflects the transient uploading of semantic representations in working memory. PMID:28824405

  10. Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams.

    PubMed

    Eshleman, Ryan; Singh, Rahul

    2016-10-06

    Adverse drug events (ADEs) constitute one of the leading causes of post-therapeutic death and their identification constitutes an important challenge of modern precision medicine. Unfortunately, the onset and effects of ADEs are often underreported complicating timely intervention. At over 500 million posts per day, Twitter is a commonly used social media platform. The ubiquity of day-to-day personal information exchange on Twitter makes it a promising target for data mining for ADE identification and intervention. Three technical challenges are central to this problem: (1) identification of salient medical keywords in (noisy) tweets, (2) mapping drug-effect relationships, and (3) classification of such relationships as adverse or non-adverse. We use a bipartite graph-theoretic representation called a drug-effect graph (DEG) for modeling drug and side effect relationships by representing the drugs and side effects as vertices. We construct individual DEGs on two data sources. The first DEG is constructed from the drug-effect relationships found in FDA package inserts as recorded in the SIDER database. The second DEG is constructed by mining the history of Twitter users. We use dictionary-based information extraction to identify medically-relevant concepts in tweets. Drugs, along with co-occurring symptoms are connected with edges weighted by temporal distance and frequency. Finally, information from the SIDER DEG is integrate with the Twitter DEG and edges are classified as either adverse or non-adverse using supervised machine learning. We examine both graph-theoretic and semantic features for the classification task. The proposed approach can identify adverse drug effects with high accuracy with precision exceeding 85 % and F1 exceeding 81 %. When compared with leading methods at the state-of-the-art, which employ un-enriched graph-theoretic analysis alone, our method leads to improvements ranging between 5 and 8 % in terms of the aforementioned measures. Additionally, we employ our method to discover several ADEs which, though present in medical literature and Twitter-streams, are not represented in the SIDER databases. We present a DEG integration model as a powerful formalism for the analysis of drug-effect relationships that is general enough to accommodate diverse data sources, yet rigorous enough to provide a strong mechanism for ADE identification.

  11. Using a high-dimensional graph of semantic space to model relationships among words

    PubMed Central

    Jackson, Alice F.; Bolger, Donald J.

    2014-01-01

    The GOLD model (Graph Of Language Distribution) is a network model constructed based on co-occurrence in a large corpus of natural language that may be used to explore what information may be present in a graph-structured model of language, and what information may be extracted through theoretically-driven algorithms as well as standard graph analysis methods. The present study will employ GOLD to examine two types of relationship between words: semantic similarity and associative relatedness. Semantic similarity refers to the degree of overlap in meaning between words, while associative relatedness refers to the degree to which two words occur in the same schematic context. It is expected that a graph structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. However, it is hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Two versions the GOLD model that differed in terms of the co-occurence window, bigGOLD at the paragraph level and smallGOLD at the adjacent word level, were directly compared to the performance of a well-established distributional model, Latent Semantic Analysis (LSA). The superior performance of the GOLD models (big and small) suggest that a single acquisition and storage mechanism, namely co-occurrence, can account for associative and conceptual relationships between words and is more psychologically plausible than models using singular value decomposition (SVD). PMID:24860525

  12. Using a high-dimensional graph of semantic space to model relationships among words.

    PubMed

    Jackson, Alice F; Bolger, Donald J

    2014-01-01

    The GOLD model (Graph Of Language Distribution) is a network model constructed based on co-occurrence in a large corpus of natural language that may be used to explore what information may be present in a graph-structured model of language, and what information may be extracted through theoretically-driven algorithms as well as standard graph analysis methods. The present study will employ GOLD to examine two types of relationship between words: semantic similarity and associative relatedness. Semantic similarity refers to the degree of overlap in meaning between words, while associative relatedness refers to the degree to which two words occur in the same schematic context. It is expected that a graph structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. However, it is hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Two versions the GOLD model that differed in terms of the co-occurence window, bigGOLD at the paragraph level and smallGOLD at the adjacent word level, were directly compared to the performance of a well-established distributional model, Latent Semantic Analysis (LSA). The superior performance of the GOLD models (big and small) suggest that a single acquisition and storage mechanism, namely co-occurrence, can account for associative and conceptual relationships between words and is more psychologically plausible than models using singular value decomposition (SVD).

  13. Extracting Useful Semantic Information from Large Scale Corpora of Text

    ERIC Educational Resources Information Center

    Mendoza, Ray Padilla, Jr.

    2012-01-01

    Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…

  14. The Fusion Model of Intelligent Transportation Systems Based on the Urban Traffic Ontology

    NASA Astrophysics Data System (ADS)

    Yang, Wang-Dong; Wang, Tao

    On these issues unified representation of urban transport information using urban transport ontology, it defines the statute and the algebraic operations of semantic fusion in ontology level in order to achieve the fusion of urban traffic information in the semantic completeness and consistency. Thus this paper takes advantage of the semantic completeness of the ontology to build urban traffic ontology model with which we resolve the problems as ontology mergence and equivalence verification in semantic fusion of traffic information integration. Information integration in urban transport can increase the function of semantic fusion, and reduce the amount of data integration of urban traffic information as well enhance the efficiency and integrity of traffic information query for the help, through the practical application of intelligent traffic information integration platform of Changde city, the paper has practically proved that the semantic fusion based on ontology increases the effect and efficiency of the urban traffic information integration, reduces the storage quantity, and improve query efficiency and information completeness.

  15. Facilitation and Interference in Identification of Pictures and Words

    DTIC Science & Technology

    1994-10-05

    semantic activation and episodic memory encoding. Journal of Verbal Learning and Verbal Behavior, 22, 88-104. Becker, C. A. (1979). Semantic context...set of items, such as pictures of common objects or known words, which have representations in semantic memory . To test this, we compared the...activation model in particular because nonwords have no memorial representation in semantic memory and thus cannot interfere with ore another. 2. Long-term

  16. Effect of perceptual load on semantic access by speech in children.

    PubMed

    Jerger, Susan; Damian, Markus F; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Hervé

    2013-04-01

    To examine whether semantic access by speech requires attention in children. Children (N = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual-dynamic face) picture word task. The cross-modal task had a low load, and the multimodal task had a high load (i.e., respectively naming pictures displayed on a blank screen vs. below the talker's face on his T-shirt). Semantic content of distractors was manipulated to be related vs. unrelated to the picture (e.g., picture "dog" with distractors "bear" vs. "cheese"). If irrelevant semantic content manipulation influences naming times on both tasks despite variations in loads, Lavie's (2005) perceptual load model proposes that semantic access is independent of capacity-limited attentional resources; if, however, irrelevant content influences naming only on the cross-modal task (low load), the perceptual load model proposes that semantic access is dependent on attentional resources exhausted by the higher load task. Irrelevant semantic content affected performance for both tasks in 6- to 9-year-olds but only on the cross-modal task in 4- to 5-year-olds. The addition of visual speech did not influence results on the multimodal task. Younger and older children differ in dependence on attentional resources for semantic access by speech.

  17. Provenance Usage in the OceanLink Project

    NASA Astrophysics Data System (ADS)

    Narock, T.; Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Finin, T.; Hitzler, P.; Janowicz, K.; Jones, M.; Krisnadhi, A.; Lehnert, K. A.; Mickle, A.; Raymond, L. M.; Schildhauer, M.; Shepherd, A.; Wiebe, P. H.

    2014-12-01

    A wide spectrum of maturing methods and tools, collectively characterized as the Semantic Web, is helping to vastly improve thedissemination of scientific research. The OceanLink project, an NSF EarthCube Building Block, is utilizing semantic technologies tointegrate geoscience data repositories, library holdings, conference abstracts, and funded research awards. Provenance is a vital componentin meeting both the scientific and engineering requirements of OceanLink. Provenance plays a key role in justification and understanding when presenting users with results aggregated from multiple sources. In the engineering sense, provenance enables the identification of new data and the ability to determine which data sources to query. Additionally, OceanLink will leverage human and machine computation for crowdsourcing, text mining, and co-reference resolution. The results of these computations, and their associated provenance, will be folded back into the constituent systems to continually enhance precision and utility. We will touch on the various roles provenance is playing in OceanLink as well as present our use of the PROV Ontology and associated Ontology Design Patterns.

  18. Query optimization for graph analytics on linked data using SPARQL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan

    2015-07-01

    Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less

  19. Testing the attentional boundary conditions of subliminal semantic priming: the influence of semantic and phonological task sets

    PubMed Central

    Adams, Sarah C.; Kiefer, Markus

    2012-01-01

    Recent studies challenged the classical notion of automaticity and indicated that even unconscious automatic semantic processing is under attentional control to some extent. In line with our attentional sensitization model, these data suggest that a sensitization of semantic pathways by a semantic task set is necessary for subliminal semantic priming to occur while non-semantic task sets attenuate priming. In the present study, we tested whether masked semantic priming is also reduced by phonological task sets using the previously developed induction task paradigm. This would substantiate the notion that attention to semantics is necessary for eliciting unconscious semantic priming. Participants first performed semantic and phonological induction tasks that should either activate a semantic or a phonological task set. Subsequent to the induction task, a masked prime word, either associated or non-associated with the following lexical decision target word, was presented. Across two experiments, we varied the nature of the phonological induction task (word phonology vs. letter phonology) to assess whether the attentional focus on the entire word vs. single letters modulates subsequent masked semantic priming. In both experiments, subliminal semantic priming was only found subsequent to the semantic induction task, but was attenuated following either phonological induction task. These results indicate that attention to phonology attenuates subsequent semantic processing of unconsciously presented primes whether or not attention is directed to the entire word or to single letters. The present findings therefore substantiate earlier evidence that an attentional orientation toward semantics is necessary for subliminal semantic priming to be elicited. PMID:22952461

  20. Model-based document categorization employing semantic pattern analysis and local structure clustering

    NASA Astrophysics Data System (ADS)

    Fume, Kosei; Ishitani, Yasuto

    2008-01-01

    We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.

  1. A pool of pairs of related objects (POPORO) for investigating visual semantic integration: behavioral and electrophysiological validation.

    PubMed

    Kovalenko, Lyudmyla Y; Chaumon, Maximilien; Busch, Niko A

    2012-07-01

    Semantic processing of verbal and visual stimuli has been investigated in semantic violation or semantic priming paradigms in which a stimulus is either related or unrelated to a previously established semantic context. A hallmark of semantic priming is the N400 event-related potential (ERP)--a deflection of the ERP that is more negative for semantically unrelated target stimuli. The majority of studies investigating the N400 and semantic integration have used verbal material (words or sentences), and standardized stimulus sets with norms for semantic relatedness have been published for verbal but not for visual material. However, semantic processing of visual objects (as opposed to words) is an important issue in research on visual cognition. In this study, we present a set of 800 pairs of semantically related and unrelated visual objects. The images were rated for semantic relatedness by a sample of 132 participants. Furthermore, we analyzed low-level image properties and matched the two semantic categories according to these features. An ERP study confirmed the suitability of this image set for evoking a robust N400 effect of semantic integration. Additionally, using a general linear modeling approach of single-trial data, we also demonstrate that low-level visual image properties and semantic relatedness are in fact only minimally overlapping. The image set is available for download from the authors' website. We expect that the image set will facilitate studies investigating mechanisms of semantic and contextual processing of visual stimuli.

  2. Query Auto-Completion Based on Word2vec Semantic Similarity

    NASA Astrophysics Data System (ADS)

    Shao, Taihua; Chen, Honghui; Chen, Wanyu

    2018-04-01

    Query auto-completion (QAC) is the first step of information retrieval, which helps users formulate the entire query after inputting only a few prefixes. Regarding the models of QAC, the traditional method ignores the contribution from the semantic relevance between queries. However, similar queries always express extremely similar search intention. In this paper, we propose a hybrid model FS-QAC based on query semantic similarity as well as the query frequency. We choose word2vec method to measure the semantic similarity between intended queries and pre-submitted queries. By combining both features, our experiments show that FS-QAC model improves the performance when predicting the user’s query intention and helping formulate the right query. Our experimental results show that the optimal hybrid model contributes to a 7.54% improvement in terms of MRR against a state-of-the-art baseline using the public AOL query logs.

  3. Toward Automated Inventory Modeling in Life Cycle Assessment: The Utility of Semantic Data Modeling to Predict Real-WorldChemical Production

    EPA Science Inventory

    A set of coupled semantic data models, i.e., ontologies, are presented to advance a methodology towards automated inventory modeling of chemical manufacturing in life cycle assessment. The cradle-to-gate life cycle inventory for chemical manufacturing is a detailed collection of ...

  4. Localising semantic and syntactic processing in spoken and written language comprehension: an Activation Likelihood Estimation meta-analysis.

    PubMed

    Rodd, Jennifer M; Vitello, Sylvia; Woollams, Anna M; Adank, Patti

    2015-02-01

    We conducted an Activation Likelihood Estimation (ALE) meta-analysis to identify brain regions that are recruited by linguistic stimuli requiring relatively demanding semantic or syntactic processing. We included 54 functional MRI studies that explicitly varied the semantic or syntactic processing load, while holding constant demands on earlier stages of processing. We included studies that introduced a syntactic/semantic ambiguity or anomaly, used a priming manipulation that specifically reduced the load on semantic/syntactic processing, or varied the level of syntactic complexity. The results confirmed the critical role of the posterior left Inferior Frontal Gyrus (LIFG) in semantic and syntactic processing. These results challenge models of sentence comprehension highlighting the role of anterior LIFG for semantic processing. In addition, the results emphasise the posterior (but not anterior) temporal lobe for both semantic and syntactic processing. Crown Copyright © 2014. Published by Elsevier Inc. All rights reserved.

  5. The organization and dissolution of semantic-conceptual knowledge: is the 'amodal hub' the only plausible model?

    PubMed

    Gainotti, Guido

    2011-04-01

    In recent years, the anatomical and functional bases of conceptual activity have attracted a growing interest. In particular, Patterson and Lambon-Ralph have proposed the existence, in the anterior parts of the temporal lobes, of a mechanism (the 'amodal semantic hub') supporting the interactive activation of semantic representations in all modalities and for all semantic categories. The aim of then present paper is to discuss this model, arguing against the notion of an 'amodal' semantic hub, because we maintain, in agreement with the Damasio's construct of 'higher-order convergence zone', that a continuum exists between perceptual information and conceptual representations, whereas the 'amodal' account views perceptual informations only as a channel through which abstract semantic knowledge can be activated. According to our model, semantic organization can be better explained by two orthogonal higher-order convergence systems, concerning, on one hand, the right vs. left hemisphere and, on the other hand, the ventral vs. dorsal processing pathways. This model posits that conceptual representations may be mainly based upon perceptual activities in the right hemisphere and upon verbal mediation in the left side of the brain. It also assumes that conceptual knowledge based on the convergence of highly processed visual information with other perceptual data (and mainly concerning living categories) may be bilaterally represented in the anterior parts of the temporal lobes, whereas knowledge based on the integration of visual data with action schemata (namely knowledge of actions, body parts and artefacts) may be more represented in the left fronto-temporo-parietal areas. Copyright © 2010 Elsevier Inc. All rights reserved.

  6. Co-occurrence frequency evaluated with large language corpora boosts semantic priming effects.

    PubMed

    Brunellière, Angèle; Perre, Laetitia; Tran, ThiMai; Bonnotte, Isabelle

    2017-09-01

    In recent decades, many computational techniques have been developed to analyse the contextual usage of words in large language corpora. The present study examined whether the co-occurrence frequency obtained from large language corpora might boost purely semantic priming effects. Two experiments were conducted: one with conscious semantic priming, the other with subliminal semantic priming. Both experiments contrasted three semantic priming contexts: an unrelated priming context and two related priming contexts with word pairs that are semantically related and that co-occur either frequently or infrequently. In the conscious priming presentation (166-ms stimulus-onset asynchrony, SOA), a semantic priming effect was recorded in both related priming contexts, which was greater with higher co-occurrence frequency. In the subliminal priming presentation (66-ms SOA), no significant priming effect was shown, regardless of the related priming context. These results show that co-occurrence frequency boosts pure semantic priming effects and are discussed with reference to models of semantic network.

  7. Dissociating the semantic function of two neighbouring subregions in the left lateral anterior temporal lobe

    PubMed Central

    Sanjuán, Ana; Hope, Thomas M.H.; Parker Jones, 'Ōiwi; Prejawa, Susan; Oberhuber, Marion; Guerin, Julie; Seghier, Mohamed L.; Green, David W.; Price, Cathy J.

    2015-01-01

    We used fMRI in 35 healthy participants to investigate how two neighbouring subregions in the lateral anterior temporal lobe (LATL) contribute to semantic matching and object naming. Four different levels of processing were considered: (A) recognition of the object concepts; (B) search for semantic associations related to object stimuli; (C) retrieval of semantic concepts of interest; and (D) retrieval of stimulus specific concepts as required for naming. During semantic association matching on picture stimuli or heard object names, we found that activation in both subregions was higher when the objects were semantically related (mug–kettle) than unrelated (car–teapot). This is consistent with both LATL subregions playing a role in (C), the successful retrieval of amodal semantic concepts. In addition, one subregion was more activated for object naming than matching semantically related objects, consistent with (D), the retrieval of a specific concept for naming. We discuss the implications of these novel findings for cognitive models of semantic processing and left anterior temporal lobe function. PMID:25496810

  8. Reading visually embodied meaning from the brain: Visually grounded computational models decode visual-object mental imagery induced by written text.

    PubMed

    Anderson, Andrew James; Bruni, Elia; Lopopolo, Alessandro; Poesio, Massimo; Baroni, Marco

    2015-10-15

    Embodiment theory predicts that mental imagery of object words recruits neural circuits involved in object perception. The degree of visual imagery present in routine thought and how it is encoded in the brain is largely unknown. We test whether fMRI activity patterns elicited by participants reading objects' names include embodied visual-object representations, and whether we can decode the representations using novel computational image-based semantic models. We first apply the image models in conjunction with text-based semantic models to test predictions of visual-specificity of semantic representations in different brain regions. Representational similarity analysis confirms that fMRI structure within ventral-temporal and lateral-occipital regions correlates most strongly with the image models and conversely text models correlate better with posterior-parietal/lateral-temporal/inferior-frontal regions. We use an unsupervised decoding algorithm that exploits commonalities in representational similarity structure found within both image model and brain data sets to classify embodied visual representations with high accuracy (8/10) and then extend it to exploit model combinations to robustly decode different brain regions in parallel. By capturing latent visual-semantic structure our models provide a route into analyzing neural representations derived from past perceptual experience rather than stimulus-driven brain activity. Our results also verify the benefit of combining multimodal data to model human-like semantic representations. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Activation of semantic information at the sublexical level during handwriting production: Evidence from inhibition effects of Chinese semantic radicals in the picture-word interference paradigm.

    PubMed

    Chen, Xuqian; Liao, Yuanlan; Chen, Xianzhe

    2017-08-01

    Using a non-alphabetic language (e.g., Chinese), the present study tested a novel view that semantic information at the sublexical level should be activated during handwriting production. Over 80% of Chinese characters are phonograms, in which semantic radicals represent category information (e.g., 'chair,' 'peach,' 'orange' are related to plants) while phonetic radicals represent phonetic information (e.g., 'wolf,' 'brightness,' 'male,' are all pronounced /lang/). Under different semantic category conditions at the lexical level (semantically related in Experiment 1; semantically unrelated in Experiment 2), the orthographic relatedness and semantic relatedness of semantic radicals in the picture name and its distractor were manipulated under different SOAs (i.e., stimulus onset asynchrony, the interval between the onset of the picture and the onset of the interference word). Two questions were addressed: (1) Is it possible that semantic information could be activated in the sublexical level conditions? (2) How are semantic and orthographic information dynamically accessed in word production? Results showed that both orthographic and semantic information were activated under the present picture-word interference paradigm, dynamically under different SOAs, which supported our view that discussions on semantic processes in the writing modality should be extended to the sublexical level. The current findings provide possibility for building new orthography-phonology-semantics models in writing. © 2017 Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  10. Refractory Access Disorders and the Organization of Concrete and Abstract Semantics: Do they Differ?

    PubMed Central

    Hamilton, A. Cris; Coslett, H. Branch

    2010-01-01

    Patients with “refractory semantic access deficits” demonstrate several unique features that make them important sources of insight into the organization of semantic representations. Here we attempt to replicate several novel findings from single-case studies reported in the literature. Patient UM– 103 displays the cardinal features of a “refractory semantic access deficit” and showed many of the same effects of semantic relatedness reported in the literature. However, when probing concrete and abstract words, this patient revealed very different patterns of performance compared to two previously reported patients. We discuss the implications of our data for models of semantic organization of abstract and concrete words. PMID:18569737

  11. A neotropical Miocene pollen database employing image-based search and semantic modeling.

    PubMed

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-08-01

    Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.

  12. Semantic-based surveillance video retrieval.

    PubMed

    Hu, Weiming; Xie, Dan; Fu, Zhouyu; Zeng, Wenrong; Maybank, Steve

    2007-04-01

    Visual surveillance produces large amounts of video data. Effective indexing and retrieval from surveillance video databases are very important. Although there are many ways to represent the content of video clips in current video retrieval algorithms, there still exists a semantic gap between users and retrieval systems. Visual surveillance systems supply a platform for investigating semantic-based video retrieval. In this paper, a semantic-based video retrieval framework for visual surveillance is proposed. A cluster-based tracking algorithm is developed to acquire motion trajectories. The trajectories are then clustered hierarchically using the spatial and temporal information, to learn activity models. A hierarchical structure of semantic indexing and retrieval of object activities, where each individual activity automatically inherits all the semantic descriptions of the activity model to which it belongs, is proposed for accessing video clips and individual objects at the semantic level. The proposed retrieval framework supports various queries including queries by keywords, multiple object queries, and queries by sketch. For multiple object queries, succession and simultaneity restrictions, together with depth and breadth first orders, are considered. For sketch-based queries, a method for matching trajectories drawn by users to spatial trajectories is proposed. The effectiveness and efficiency of our framework are tested in a crowded traffic scene.

  13. Semantic interoperability--HL7 Version 3 compared to advanced architecture standards.

    PubMed

    Blobel, B G M E; Engel, K; Pharow, P

    2006-01-01

    To meet the challenge for high quality and efficient care, highly specialized and distributed healthcare establishments have to communicate and co-operate in a semantically interoperable way. Information and communication technology must be open, flexible, scalable, knowledge-based and service-oriented as well as secure and safe. For enabling semantic interoperability, a unified process for defining and implementing the architecture, i.e. structure and functions of the cooperating systems' components, as well as the approach for knowledge representation, i.e. the used information and its interpretation, algorithms, etc. have to be defined in a harmonized way. Deploying the Generic Component Model, systems and their components, underlying concepts and applied constraints must be formally modeled, strictly separating platform-independent from platform-specific models. As HL7 Version 3 claims to represent the most successful standard for semantic interoperability, HL7 has been analyzed regarding the requirements for model-driven, service-oriented design of semantic interoperable information systems, thereby moving from a communication to an architecture paradigm. The approach is compared with advanced architectural approaches for information systems such as OMG's CORBA 3 or EHR systems such as GEHR/openEHR and CEN EN 13606 Electronic Health Record Communication. HL7 Version 3 is maturing towards an architectural approach for semantic interoperability. Despite current differences, there is a close collaboration between the teams involved guaranteeing a convergence between competing approaches.

  14. Semantic graphs and associative memories

    NASA Astrophysics Data System (ADS)

    Pomi, Andrés; Mizraji, Eduardo

    2004-12-01

    Graphs have been increasingly utilized in the characterization of complex networks from diverse origins, including different kinds of semantic networks. Human memories are associative and are known to support complex semantic nets; these nets are represented by graphs. However, it is not known how the brain can sustain these semantic graphs. The vision of cognitive brain activities, shown by modern functional imaging techniques, assigns renewed value to classical distributed associative memory models. Here we show that these neural network models, also known as correlation matrix memories, naturally support a graph representation of the stored semantic structure. We demonstrate that the adjacency matrix of this graph of associations is just the memory coded with the standard basis of the concept vector space, and that the spectrum of the graph is a code invariant of the memory. As long as the assumptions of the model remain valid this result provides a practical method to predict and modify the evolution of the cognitive dynamics. Also, it could provide us with a way to comprehend how individual brains that map the external reality, almost surely with different particular vector representations, are nevertheless able to communicate and share a common knowledge of the world. We finish presenting adaptive association graphs, an extension of the model that makes use of the tensor product, which provides a solution to the known problem of branching in semantic nets.

  15. Concepts, Control, and Context: A Connectionist Account of Normal and Disordered Semantic Cognition

    PubMed Central

    2018-01-01

    Semantic cognition requires conceptual representations shaped by verbal and nonverbal experience and executive control processes that regulate activation of knowledge to meet current situational demands. A complete model must also account for the representation of concrete and abstract words, of taxonomic and associative relationships, and for the role of context in shaping meaning. We present the first major attempt to assimilate all of these elements within a unified, implemented computational framework. Our model combines a hub-and-spoke architecture with a buffer that allows its state to be influenced by prior context. This hybrid structure integrates the view, from cognitive neuroscience, that concepts are grounded in sensory-motor representation with the view, from computational linguistics, that knowledge is shaped by patterns of lexical co-occurrence. The model successfully codes knowledge for abstract and concrete words, associative and taxonomic relationships, and the multiple meanings of homonyms, within a single representational space. Knowledge of abstract words is acquired through (a) their patterns of co-occurrence with other words and (b) acquired embodiment, whereby they become indirectly associated with the perceptual features of co-occurring concrete words. The model accounts for executive influences on semantics by including a controlled retrieval mechanism that provides top-down input to amplify weak semantic relationships. The representational and control elements of the model can be damaged independently, and the consequences of such damage closely replicate effects seen in neuropsychological patients with loss of semantic representation versus control processes. Thus, the model provides a wide-ranging and neurally plausible account of normal and impaired semantic cognition. PMID:29733663

  16. User centered and ontology based information retrieval system for life sciences.

    PubMed

    Sy, Mohameth-François; Ranwez, Sylvie; Montmain, Jacky; Regnault, Armelle; Crampes, Michel; Ranwez, Vincent

    2012-01-25

    Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help.

  17. User centered and ontology based information retrieval system for life sciences

    PubMed Central

    2012-01-01

    Background Because of the increasing number of electronic resources, designing efficient tools to retrieve and exploit them is a major challenge. Some improvements have been offered by semantic Web technologies and applications based on domain ontologies. In life science, for instance, the Gene Ontology is widely exploited in genomic applications and the Medical Subject Headings is the basis of biomedical publications indexation and information retrieval process proposed by PubMed. However current search engines suffer from two main drawbacks: there is limited user interaction with the list of retrieved resources and no explanation for their adequacy to the query is provided. Users may thus be confused by the selection and have no idea on how to adapt their queries so that the results match their expectations. Results This paper describes an information retrieval system that relies on domain ontology to widen the set of relevant documents that is retrieved and that uses a graphical rendering of query results to favor user interactions. Semantic proximities between ontology concepts and aggregating models are used to assess documents adequacy with respect to a query. The selection of documents is displayed in a semantic map to provide graphical indications that make explicit to what extent they match the user's query; this man/machine interface favors a more interactive and iterative exploration of data corpus, by facilitating query concepts weighting and visual explanation. We illustrate the benefit of using this information retrieval system on two case studies one of which aiming at collecting human genes related to transcription factors involved in hemopoiesis pathway. Conclusions The ontology based information retrieval system described in this paper (OBIRS) is freely available at: http://www.ontotoolkit.mines-ales.fr/ObirsClient/. This environment is a first step towards a user centred application in which the system enlightens relevant information to provide decision help. PMID:22373375

  18. Synonym extraction and abbreviation expansion with ensembles of semantic spaces.

    PubMed

    Henriksson, Aron; Moen, Hans; Skeppstedt, Maria; Daudaravičius, Vidas; Duneld, Martin

    2014-02-05

    Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. A combination of two distributional models - Random Indexing and Random Permutation - employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora - a corpus of clinical text and a corpus of medical journal articles - further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models - with different model parameters - and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.

  19. Synonym extraction and abbreviation expansion with ensembles of semantic spaces

    PubMed Central

    2014-01-01

    Background Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. Results A combination of two distributional models – Random Indexing and Random Permutation – employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. Conclusions This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks. PMID:24499679

  20. Levels of Processing and the Cue-Dependent Nature of Recollection

    ERIC Educational Resources Information Center

    Mulligan, Neil W.; Picklesimer, Milton

    2012-01-01

    Dual-process models differentiate between two bases of memory, recollection and familiarity. It is routinely claimed that deeper, semantic encoding enhances recollection relative to shallow, non-semantic encoding, and that recollection is largely a product of semantic, elaborative rehearsal. The present experiments show that this is not always the…

  1. Typicality Shift in Semantic Categories as a Result of Cultural Transition: Evidence from Jewish Argentine Immigrants in Israel.

    ERIC Educational Resources Information Center

    Shimron, Joseph; Chernitsky, Roberto

    1995-01-01

    Investigates changes in the internal structure of semantic categories as a result of cultural transition. Examines typicality shifts in semantic categories of Jewish Argentine immigrants in Israel. Presents a model mapping typicality shift patterns onto acculturation patterns. (HB)

  2. Speed and Accuracy in the Processing of False Statements About Semantic Information.

    ERIC Educational Resources Information Center

    Ratcliff, Roger

    1982-01-01

    A standard reaction time procedure and a response signal procedure were used on data from eight experiments on semantic verifications. Results suggest that simple models of the semantic verification task that assume a single yes/no dimension on which discrimination is made are not correct. (Author/PN)

  3. The semantic planetary data system

    NASA Technical Reports Server (NTRS)

    Hughes, J. Steven; Crichton, Daniel; Kelly, Sean; Mattmann, Chris

    2005-01-01

    This paper will provide a brief overview of the PDS data model and the PDS catalog. It will then describe the implentation of the Semantic PDS including the development of the formal ontology, the generation of RDFS/XML and RDF/XML data sets, and the buiding of the semantic search application.

  4. The Relevance Aura of Bibliographic Records.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.

    1997-01-01

    Analyzes relevance assessments of topical descriptors for bibliographic records for two dimensions: (1) a vertical conceptual hierarchy of broad to narrow descriptors, and (2) a horizontal linkage of related terms. The data were analyzed for a semantic distance and semantic direction effect as postulated by the Semantic Distance Model. (Author/LRW)

  5. A computational language approach to modeling prose recall in schizophrenia

    PubMed Central

    Rosenstein, Mark; Diaz-Asper, Catherine; Foltz, Peter W.; Elvevåg, Brita

    2014-01-01

    Many cortical disorders are associated with memory problems. In schizophrenia, verbal memory deficits are a hallmark feature. However, the exact nature of this deficit remains elusive. Modeling aspects of language features used in memory recall have the potential to provide means for measuring these verbal processes. We employ computational language approaches to assess time-varying semantic and sequential properties of prose recall at various retrieval intervals (immediate, 30 min and 24 h later) in patients with schizophrenia, unaffected siblings and healthy unrelated control participants. First, we model the recall data to quantify the degradation of performance with increasing retrieval interval and the effect of diagnosis (i.e., group membership) on performance. Next we model the human scoring of recall performance using an n-gram language sequence technique, and then with a semantic feature based on Latent Semantic Analysis. These models show that automated analyses of the recalls can produce scores that accurately mimic human scoring. The final analysis addresses the validity of this approach by ascertaining the ability to predict group membership from models built on the two classes of language features. Taken individually, the semantic feature is most predictive, while a model combining the features improves accuracy of group membership prediction slightly above the semantic feature alone as well as over the human rating approach. We discuss the implications for cognitive neuroscience of such a computational approach in exploring the mechanisms of prose recall. PMID:24709122

  6. Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain

    PubMed Central

    Madkour, Mohcine; Benhaddou, Driss; Tao, Cui

    2016-01-01

    Background and Objective We live our lives by the calendar and the clock, but time is also an abstraction, even an illusion. The sense of time can be both domain-specific and complex, and is often left implicit, requiring significant domain knowledge to accurately recognize and harness. In the clinical domain, the momentum gained from recent advances in infrastructure and governance practices has enabled the collection of tremendous amount of data at each moment in time. Electronic Health Records (EHRs) have paved the way to making these data available for practitioners and researchers. However, temporal data representation, normalization, extraction and reasoning are very important in order to mine such massive data and therefore for constructing the clinical timeline. The objective of this work is to provide an overview of the problem of constructing a timeline at the clinical point of care and to summarize the state-of-the-art in processing temporal information of clinical narratives. Methods This review surveys the methods used in three important area: modeling and representing of time, Medical NLP methods for extracting time, and methods of time reasoning and processing. The review emphasis on the current existing gap between present methods and the semantic web technologies and catch up with the possible combinations. Results the main findings of this review is revealing the importance of time processing not only in constructing timelines and clinical decision support systems but also as a vital component of EHR data models and operations. Conclusions Extracting temporal information in clinical narratives is a challenging task. The inclusion of ontologies and semantic web will lead to better assessment of the annotation task and, together with medical NLP techniques, will help resolving granularity and co-reference resolution problems. PMID:27040831

  7. Linguistic multi-criteria decision-making with representing semantics by programming

    NASA Astrophysics Data System (ADS)

    Yang, Wu-E.; Ma, Chao-Qun; Han, Zhi-Qiu

    2017-01-01

    A linguistic multi-criteria decision-making method is introduced. In this method, a maximising discrimination programming assigns the semanteme values to linguistic variables to represent their semantics. Incomplete preferences from using linguistic information are expressed by the constraints of the model. Such assignment can amplify the difference between alternatives. Thus, the discrimination of the decision model is increased, which facilitates the decision-maker to rank or order the alternatives for making a decision. We also discuss the parameter setting and its influence, and use an application example to illustrate the proposed method. Further, the results with three types of semantic structure highlight the ability of the method in handling different semantic structures.

  8. ADO: a disease ontology representing the domain knowledge specific to Alzheimer's disease.

    PubMed

    Malhotra, Ashutosh; Younesi, Erfan; Gündel, Michaela; Müller, Bernd; Heneka, Michael T; Hofmann-Apitius, Martin

    2014-03-01

    Biomedical ontologies offer the capability to structure and represent domain-specific knowledge semantically. Disease-specific ontologies can facilitate knowledge exchange across multiple disciplines, and ontology-driven mining approaches can generate great value for modeling disease mechanisms. However, in the case of neurodegenerative diseases such as Alzheimer's disease, there is a lack of formal representation of the relevant knowledge domain. Alzheimer's disease ontology (ADO) is constructed in accordance to the ontology building life cycle. The Protégé OWL editor was used as a tool for building ADO in Ontology Web Language format. ADO was developed with the purpose of containing information relevant to four main biological views-preclinical, clinical, etiological, and molecular/cellular mechanisms-and was enriched by adding synonyms and references. Validation of the lexicalized ontology by means of named entity recognition-based methods showed a satisfactory performance (F score = 72%). In addition to structural and functional evaluation, a clinical expert in the field performed a manual evaluation and curation of ADO. Through integration of ADO into an information retrieval environment, we show that the ontology supports semantic search in scientific text. The usefulness of ADO is authenticated by dedicated use case scenarios. Development of ADO as an open ADO is a first attempt to organize information related to Alzheimer's disease in a formalized, structured manner. We demonstrate that ADO is able to capture both established and scattered knowledge existing in scientific text. Copyright © 2014 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  9. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

    PubMed

    Piñero, Janet; Queralt-Rosinach, Núria; Bravo, Àlex; Deu-Pons, Jordi; Bauer-Mehren, Anna; Baron, Martin; Sanz, Ferran; Furlong, Laura I

    2015-01-01

    DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380,000 associations between >16,000 genes and 13,000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/ © The Author(s) 2015. Published by Oxford University Press.

  10. Wernicke's aphasia reflects a combination of acoustic-phonological and semantic control deficits: a case-series comparison of Wernicke's aphasia, semantic dementia and semantic aphasia.

    PubMed

    Robson, Holly; Sage, Karen; Ralph, Matthew A Lambon

    2012-01-01

    Wernicke's aphasia (WA) is the classical neurological model of comprehension impairment and, as a result, the posterior temporal lobe is assumed to be critical to semantic cognition. This conclusion is potentially confused by (a) the existence of patient groups with semantic impairment following damage to other brain regions (semantic dementia and semantic aphasia) and (b) an ongoing debate about the underlying causes of comprehension impairment in WA. By directly comparing these three patient groups for the first time, we demonstrate that the comprehension impairment in Wernicke's aphasia is best accounted for by dual deficits in acoustic-phonological analysis (associated with pSTG) and semantic cognition (associated with pMTG and angular gyrus). The WA group were impaired on both nonverbal and verbal comprehension assessments consistent with a generalised semantic impairment. This semantic deficit was most similar in nature to that of the semantic aphasia group suggestive of a disruption to semantic control processes. In addition, only the WA group showed a strong effect of input modality on comprehension, with accuracy decreasing considerably as acoustic-phonological requirements increased. These results deviate from traditional accounts which emphasise a single impairment and, instead, implicate two deficits underlying the comprehension disorder in WA. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Topic categorisation of statements in suicide notes with integrated rules and machine learning.

    PubMed

    Kovačević, Aleksandar; Dehghan, Azad; Keane, John A; Nenadic, Goran

    2012-01-01

    We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.

  12. A Visual Analytics Framework for Identifying Topic Drivers in Media Events.

    PubMed

    Lu, Yafeng; Wang, Hong; Landis, Steven; Maciejewski, Ross

    2017-09-14

    Media data has been the subject of large scale analysis with applications of text mining being used to provide overviews of media themes and information flows. Such information extracted from media articles has also shown its contextual value of being integrated with other data, such as criminal records and stock market pricing. In this work, we explore linking textual media data with curated secondary textual data sources through user-guided semantic lexical matching for identifying relationships and data links. In this manner, critical information can be identified and used to annotate media timelines in order to provide a more detailed overview of events that may be driving media topics and frames. These linked events are further analyzed through an application of causality modeling to model temporal drivers between the data series. Such causal links are then annotated through automatic entity extraction which enables the analyst to explore persons, locations, and organizations that may be pertinent to the media topic of interest. To demonstrate the proposed framework, two media datasets and an armed conflict event dataset are explored.

  13. Semantic Likelihood Models for Bayesian Inference in Human-Robot Interaction

    NASA Astrophysics Data System (ADS)

    Sweet, Nicholas

    Autonomous systems, particularly unmanned aerial systems (UAS), remain limited in au- tonomous capabilities largely due to a poor understanding of their environment. Current sensors simply do not match human perceptive capabilities, impeding progress towards full autonomy. Recent work has shown the value of humans as sources of information within a human-robot team; in target applications, communicating human-generated 'soft data' to autonomous systems enables higher levels of autonomy through large, efficient information gains. This requires development of a 'human sensor model' that allows soft data fusion through Bayesian inference to update the probabilistic belief representations maintained by autonomous systems. Current human sensor models that capture linguistic inputs as semantic information are limited in their ability to generalize likelihood functions for semantic statements: they may be learned from dense data; they do not exploit the contextual information embedded within groundings; and they often limit human input to restrictive and simplistic interfaces. This work provides mechanisms to synthesize human sensor models from constraints based on easily attainable a priori knowledge, develops compression techniques to capture information-dense semantics, and investigates the problem of capturing and fusing semantic information contained within unstructured natural language. A robotic experimental testbed is also developed to validate the above contributions.

  14. A Diffusive-Particle Theory of Free Recall

    PubMed Central

    Fumarola, Francesco

    2017-01-01

    Diffusive models of free recall have been recently introduced in the memory literature, but their potential remains largely unexplored. In this paper, a diffusive model of short-term verbal memory is considered, in which the psychological state of the subject is encoded as the instantaneous position of a particle diffusing over a semantic graph. The model is particularly suitable for studying the dependence of free-recall observables on the semantic properties of the words to be recalled. Besides predicting some well-known experimental features (forward asymmetry, semantic clustering, word-length effect), a novel prediction is obtained on the relationship between the contiguity effect and the syllabic length of words; shorter words, by way of their wider semantic range, are predicted to be characterized by stronger forward contiguity. A fresh analysis of archival free-recall data allows to confirm this prediction. PMID:29085521

  15. Pure Misallocation of ''0'' in Number Transcoding: A New Symptom of Right Cerebral Dysfunction

    ERIC Educational Resources Information Center

    Furumoto, Hideharu

    2006-01-01

    To account for the mechanism of number transcoding, many authors have proposed various models, for example, semantic-abstract model, lexical-semantic model, triple-code model, and so on. However, almost all of them are based on the symptoms of patients with left cerebral damage. Previously, I reported two Japanese patients with right posterior…

  16. Information Extraction for Clinical Data Mining: A Mammography Case Study

    PubMed Central

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2013-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts’ input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level. PMID:23765123

  17. Information Extraction for Clinical Data Mining: A Mammography Case Study.

    PubMed

    Nassif, Houssam; Woods, Ryan; Burnside, Elizabeth; Ayvaci, Mehmet; Shavlik, Jude; Page, David

    2009-01-01

    Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F 1 -score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

  18. Virtual Observatories, Data Mining, and Astroinformatics

    NASA Astrophysics Data System (ADS)

    Borne, Kirk

    The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development ofnew information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys' databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management.Each of these areas is now an active research discipline, with significantscience-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential for astronomical discovery is equally large, and so the data-oriented research methods, algorithms, and techniques that are presented here will enable the greatest discovery potential from the ever-growing data and information resources in astronomy.

  19. Spatio-Temporal Change Modeling of Lulc: a Semantic Kriging Approach

    NASA Astrophysics Data System (ADS)

    Bhattacharjee, S.; Ghosh, S. K.

    2015-07-01

    Spatio-temporal land-use/ land-cover (LULC) change modeling is important to forecast the future LULC distribution, which may facilitate natural resource management, urban planning, etc. The spatio-temporal change in LULC trend often exhibits non-linear behavior, due to various dynamic factors, such as, human intervention (e.g., urbanization), environmental factors, etc. Hence, proper forecasting of LULC distribution should involve the study and trend modeling of historical data. Existing literatures have reported that the meteorological attributes (e.g., NDVI, LST, MSI), are semantically related to the terrain. Being influenced by the terrestrial dynamics, the temporal changes of these attributes depend on the LULC properties. Hence, incorporating meteorological knowledge into the temporal prediction process may help in developing an accurate forecasting model. This work attempts to study the change in inter-annual LULC pattern and the distribution of different meteorological attributes of a region in Kolkata (a metropolitan city in India) during the years 2000-2010 and forecast the future spread of LULC using semantic kriging (SemK) approach. A new variant of time-series SemK is proposed, namely Rev-SemKts to capture the multivariate semantic associations between different attributes. From empirical analysis, it may be observed that the augmentation of semantic knowledge in spatio-temporal modeling of meteorological attributes facilitate more precise forecasting of LULC pattern.

  20. A Robust Geometric Model for Argument Classification

    NASA Astrophysics Data System (ADS)

    Giannone, Cristina; Croce, Danilo; Basili, Roberto; de Cao, Diego

    Argument classification is the task of assigning semantic roles to syntactic structures in natural language sentences. Supervised learning techniques for frame semantics have been recently shown to benefit from rich sets of syntactic features. However argument classification is also highly dependent on the semantics of the involved lexicals. Empirical studies have shown that domain dependence of lexical information causes large performance drops in outside domain tests. In this paper a distributional approach is proposed to improve the robustness of the learning model against out-of-domain lexical phenomena.

  1. TaggerOne: joint named entity recognition and normalization with semi-Markov Models

    PubMed Central

    Leaman, Robert; Lu, Zhiyong

    2016-01-01

    Motivation: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. Methods: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. Results: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. Availability and Implementation: The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone Contact: zhiyong.lu@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27283952

  2. TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

    PubMed

    Leaman, Robert; Lu, Zhiyong

    2016-09-15

    Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone zhiyong.lu@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  3. Integrating semantic web technologies and geospatial catalog services for geospatial information discovery and processing in cyberinfrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yue, Peng; Gong, Jianya; Di, Liping

    Abstract A geospatial catalogue service provides a network-based meta-information repository and interface for advertising and discovering shared geospatial data and services. Descriptive information (i.e., metadata) for geospatial data and services is structured and organized in catalogue services. The approaches currently available for searching and using that information are often inadequate. Semantic Web technologies show promise for better discovery methods by exploiting the underlying semantics. Such development needs special attention from the Cyberinfrastructure perspective, so that the traditional focus on discovery of and access to geospatial data can be expanded to support the increased demand for processing of geospatial information andmore » discovery of knowledge. Semantic descriptions for geospatial data, services, and geoprocessing service chains are structured, organized, and registered through extending elements in the ebXML Registry Information Model (ebRIM) of a geospatial catalogue service, which follows the interface specifications of the Open Geospatial Consortium (OGC) Catalogue Services for the Web (CSW). The process models for geoprocessing service chains, as a type of geospatial knowledge, are captured, registered, and discoverable. Semantics-enhanced discovery for geospatial data, services/service chains, and process models is described. Semantic search middleware that can support virtual data product materialization is developed for the geospatial catalogue service. The creation of such a semantics-enhanced geospatial catalogue service is important in meeting the demands for geospatial information discovery and analysis in Cyberinfrastructure.« less

  4. Semantics of Context-Free Fragments of Natural Languages.

    ERIC Educational Resources Information Center

    Suppes, Patrick

    The objective of this paper is to combine the viewpoint of model-theoretic semantics and generative grammar, to define semantics for context-free languages, and to apply the results to some fragments of natural language. Following the introduction in the first section, Section 2 describes a simple artificial example to illustrate how a semantic…

  5. High-Dimensional Semantic Space Accounts of Priming

    ERIC Educational Resources Information Center

    Jones, Michael N.; Kintsch, Walter; Mewhort, Douglas J. K.

    2006-01-01

    A broad range of priming data has been used to explore the structure of semantic memory and to test between models of word representation. In this paper, we examine the computational mechanisms required to learn distributed semantic representations for words directly from unsupervised experience with language. To best account for the variety of…

  6. Warfighter IT Interoperability Standards Study

    DTIC Science & Technology

    2012-07-22

    data (e.g. messages) between systems ? ii) What process did you used to validate and certify semantic interoperability between your...other systems at this time There was no requirement to validate and certify semantic interoperability The DLS program exchanges data with... semantics Testing for System Compliance with Data Models Verify and Certify Interoperability Using Data

  7. 'I remember therefore I am, and I am therefore I remember': exploring the contributions of episodic and semantic self-knowledge to strength of identity.

    PubMed

    Haslam, Catherine; Jetten, Jolanda; Haslam, S Alexander; Pugliese, Cara; Tonks, James

    2011-05-01

    The present research explores the relationship between the two components of autobiographical memory--episodic and semantic self-knowledge--and identity strength in older adults living in the community and residential care. Participants (N= 32) completed the autobiographical memory interview and measures of personal identity strength and multiple group memberships. Contrary to previous research, autobiographical memory for all time periods (childhood, early adulthood, and recent life) in the semantic domain was associated with greater strength in personal identity. Further, we obtained support for the hypothesis that the relationship between episodic self-knowledge and identity strength would be mediated by knowledge of personal semantic facts. However, there was also support for a reverse mediation model indicating that a strong sense of identity is associated with semantic self-knowledge and through this may enhance self-relevant recollection. The discussion elaborates on these findings and we propose a self-knowledge and identity model (SKIM) whereby semantic self-knowledge mediates a bidirectional relationship between episodic self-knowledge and identity. ©2010 The British Psychological Society.

  8. A novel co-occurrence-based approach to predict pure associative and semantic priming.

    PubMed

    Roelke, Andre; Franke, Nicole; Biemann, Chris; Radach, Ralph; Jacobs, Arthur M; Hofmann, Markus J

    2018-03-15

    The theoretical "difficulty in separating association strength from [semantic] feature overlap" has resulted in inconsistent findings of either the presence or absence of "pure" associative priming in recent literature (Hutchison, 2003, Psychonomic Bulletin & Review, 10(4), p. 787). The present study used co-occurrence statistics of words in sentences to provide a full factorial manipulation of direct association (strong/no) and the number of common associates (many/no) of the prime and target words. These common associates were proposed to serve as semantic features for a recent interactive activation model of semantic processing (i.e., the associative read-out model; Hofmann & Jacobs, 2014). With stimulus onset asynchrony (SOA) as an additional factor, our findings indicate that associative and semantic priming are indeed dissociable. Moreover, the effect of direct association was strongest at a long SOA (1,000 ms), while many common associates facilitated lexical decisions primarily at a short SOA (200 ms). This response pattern is consistent with previous performance-based accounts and suggests that associative and semantic priming can be evoked by computationally determined direct and common associations.

  9. Semantic Agent-Based Service Middleware and Simulation for Smart Cities

    PubMed Central

    Liu, Ming; Xu, Yang; Hu, Haixiao; Mohammed, Abdul-Wahid

    2016-01-01

    With the development of Machine-to-Machine (M2M) technology, a variety of embedded and mobile devices is integrated to interact via the platform of the Internet of Things, especially in the domain of smart cities. One of the primary challenges is that selecting the appropriate services or service combination for upper layer applications is hard, which is due to the absence of a unified semantical service description pattern, as well as the service selection mechanism. In this paper, we define a semantic service representation model from four key properties: Capability (C), Deployment (D), Resource (R) and IOData (IO). Based on this model, an agent-based middleware is built to support semantic service enablement. In this middleware, we present an efficient semantic service discovery and matching approach for a service combination process, which calculates the semantic similarity between services, and a heuristic algorithm to search the service candidates for a specific service request. Based on this design, we propose a simulation of virtual urban fire fighting, and the experimental results manifest the feasibility and efficiency of our design. PMID:28009818

  10. Semantic Agent-Based Service Middleware and Simulation for Smart Cities.

    PubMed

    Liu, Ming; Xu, Yang; Hu, Haixiao; Mohammed, Abdul-Wahid

    2016-12-21

    With the development of Machine-to-Machine (M2M) technology, a variety of embedded and mobile devices is integrated to interact via the platform of the Internet of Things, especially in the domain of smart cities. One of the primary challenges is that selecting the appropriate services or service combination for upper layer applications is hard, which is due to the absence of a unified semantical service description pattern, as well as the service selection mechanism. In this paper, we define a semantic service representation model from four key properties: Capability (C), Deployment (D), Resource (R) and IOData (IO). Based on this model, an agent-based middleware is built to support semantic service enablement. In this middleware, we present an efficient semantic service discovery and matching approach for a service combination process, which calculates the semantic similarity between services, and a heuristic algorithm to search the service candidates for a specific service request. Based on this design, we propose a simulation of virtual urban fire fighting, and the experimental results manifest the feasibility and efficiency of our design.

  11. Mining the pharmacogenomics literature—a survey of the state of the art

    PubMed Central

    Cohen, K. Bretonnel; Garten, Yael; Shah, Nigam H.

    2012-01-01

    This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research. PMID:22833496

  12. Mining the pharmacogenomics literature--a survey of the state of the art.

    PubMed

    Hahn, Udo; Cohen, K Bretonnel; Garten, Yael; Shah, Nigam H

    2012-07-01

    This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.

  13. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    PubMed

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  14. Internally- and externally-driven network transitions as a basis for automatic and strategic processes in semantic priming: theory and experimental validation

    PubMed Central

    Lerner, Itamar; Shriki, Oren

    2014-01-01

    For the last four decades, semantic priming—the facilitation in recognition of a target word when it follows the presentation of a semantically related prime word—has been a central topic in research of human cognitive processing. Studies have drawn a complex picture of findings which demonstrated the sensitivity of this priming effect to a unique combination of variables, including, but not limited to, the type of relatedness between primes and targets, the prime-target Stimulus Onset Asynchrony (SOA), the relatedness proportion (RP) in the stimuli list and the specific task subjects are required to perform. Automatic processes depending on the activation patterns of semantic representations in memory and controlled strategies adapted by individuals when attempting to maximize their recognition performance have both been implicated in contributing to the results. Lately, we have published a new model of semantic priming that addresses the majority of these findings within one conceptual framework. In our model, semantic memory is depicted as an attractor neural network in which stochastic transitions from one stored pattern to another are continually taking place due to synaptic depression mechanisms. We have shown how such transitions, in combination with a reinforcement-learning rule that adjusts their pace, resemble the classic automatic and controlled processes involved in semantic priming and account for a great number of the findings in the literature. Here, we review the core findings of our model and present new simulations that show how similar principles of parameter-adjustments could account for additional data not addressed in our previous studies, such as the relation between expectancy and inhibition in priming, target frequency and target degradation effects. Finally, we describe two human experiments that validate several key predictions of the model. PMID:24795670

  15. [Artificial intelligence meeting neuropsychology. Semantic memory in normal and pathological aging].

    PubMed

    Aimé, Xavier; Charlet, Jean; Maillet, Didier; Belin, Catherine

    2015-03-01

    Artificial intelligence (IA) is the subject of much research, but also many fantasies. It aims to reproduce human intelligence in its learning capacity, knowledge storage and computation. In 2014, the Defense Advanced Research Projects Agency (DARPA) started the restoring active memory (RAM) program that attempt to develop implantable technology to bridge gaps in the injured brain and restore normal memory function to people with memory loss caused by injury or disease. In another IA's field, computational ontologies (a formal and shared conceptualization) try to model knowledge in order to represent a structured and unambiguous meaning of the concepts of a target domain. The aim of these structures is to ensure a consensual understanding of their meaning and a univariant use (the same concept is used by all to categorize the same individuals). The first representations of knowledge in the AI's domain are largely based on model tests of semantic memory. This one, as a component of long-term memory is the memory of words, ideas, concepts. It is the only declarative memory system that resists so remarkably to the effects of age. In contrast, non-specific cognitive changes may decrease the performance of elderly in various events and instead report difficulties of access to semantic representations that affect the semantics stock itself. Some dementias, like semantic dementia and Alzheimer's disease, are linked to alteration of semantic memory. We propose in this paper, using the computational ontologies model, a formal and relatively thin modeling, in the service of neuropsychology: 1) for the practitioner with decision support systems, 2) for the patient as cognitive prosthesis outsourced, and 3) for the researcher to study semantic memory.

  16. @Note: a workbench for biomedical text mining.

    PubMed

    Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel

    2009-08-01

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.

  17. Towards a Semantic E-Learning Theory by Using a Modelling Approach

    ERIC Educational Resources Information Center

    Yli-Luoma, Pertti V. J.; Naeve, Ambjorn

    2006-01-01

    In the present study, a semantic perspective on e-learning theory is advanced and a modelling approach is used. This modelling approach towards the new learning theory is based on the four SECI phases of knowledge conversion: Socialisation, Externalisation, Combination and Internalisation, introduced by Nonaka in 1994, and involving two levels of…

  18. A Schema Theory Account of Some Cognitive Processes in Complex Learning. Technical Report No. 81.

    ERIC Educational Resources Information Center

    Munro, Allen; Rigney, Joseph W.

    Procedural semantics models have diminished the distinction between data structures and procedures in computer simulations of human intelligence. This development has theoretical consequences for models of cognition. One type of procedural semantics model, called schema theory, is presented, and a variety of cognitive processes are explained in…

  19. Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

    PubMed Central

    2012-01-01

    Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations. PMID:23244446

  20. Predicting Raters’ Transparency Judgments of English and Chinese Morphological Constituents using Latent Semantic Analysis

    PubMed Central

    Wang, Hsueh-Cheng; Hsu, Li-Chuan; Tien, Yi-Min; Pomplun, Marc

    2013-01-01

    The morphological constituents of English compounds (e.g., “butter” and “fly” for “butterfly”) and two-character Chinese compounds may differ in meaning from the whole word. Subjective differences and ambiguity of transparency make the judgments difficult, and a computational alternative based on a general model may be a way to average across subjective differences. The current study proposes two approaches based on Latent Semantic Analysis (Landauer & Dumais, 1997): Model 1 compares the semantic similarity between a compound word and each of its constituents, and Model 2 derives the dominant meaning of a constituent based on a clustering analysis of morphological family members (e.g., “butterfingers” or “buttermilk” for “butter”). The proposed models successfully predicted participants’ transparency ratings, and we recommend that experimenters use Model 1 for English compounds and Model 2 for Chinese compounds, due to raters’ morphological processing in different writing systems. The dominance of lexical meaning, semantic transparency, and the average similarity between all pairs within a morphological family are provided, and practical applications for future studies are discussed. PMID:23784009

  1. Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation

    PubMed Central

    Grego, Tiago; Couto, Francisco M.

    2013-01-01

    With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/. PMID:23658791

  2. Discovering semantic features in the literature: a foundation for building functional associations

    PubMed Central

    Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto

    2006-01-01

    Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716

  3. Closed-Loop Lifecycle Management of Service and Product in the Internet of Things: Semantic Framework for Knowledge Integration.

    PubMed

    Yoo, Min-Jung; Grozel, Clément; Kiritsis, Dimitris

    2016-07-08

    This paper describes our conceptual framework of closed-loop lifecycle information sharing for product-service in the Internet of Things (IoT). The framework is based on the ontology model of product-service and a type of IoT message standard, Open Messaging Interface (O-MI) and Open Data Format (O-DF), which ensures data communication. (1) BACKGROUND: Based on an existing product lifecycle management (PLM) methodology, we enhanced the ontology model for the purpose of integrating efficiently the product-service ontology model that was newly developed; (2) METHODS: The IoT message transfer layer is vertically integrated into a semantic knowledge framework inside which a Semantic Info-Node Agent (SINA) uses the message format as a common protocol of product-service lifecycle data transfer; (3) RESULTS: The product-service ontology model facilitates information retrieval and knowledge extraction during the product lifecycle, while making more information available for the sake of service business creation. The vertical integration of IoT message transfer, encompassing all semantic layers, helps achieve a more flexible and modular approach to knowledge sharing in an IoT environment; (4) Contribution: A semantic data annotation applied to IoT can contribute to enhancing collected data types, which entails a richer knowledge extraction. The ontology-based PLM model enables as well the horizontal integration of heterogeneous PLM data while breaking traditional vertical information silos; (5) CONCLUSION: The framework was applied to a fictive case study with an electric car service for the purpose of demonstration. For the purpose of demonstrating the feasibility of the approach, the semantic model is implemented in Sesame APIs, which play the role of an Internet-connected Resource Description Framework (RDF) database.

  4. Closed-Loop Lifecycle Management of Service and Product in the Internet of Things: Semantic Framework for Knowledge Integration

    PubMed Central

    Yoo, Min-Jung; Grozel, Clément; Kiritsis, Dimitris

    2016-01-01

    This paper describes our conceptual framework of closed-loop lifecycle information sharing for product-service in the Internet of Things (IoT). The framework is based on the ontology model of product-service and a type of IoT message standard, Open Messaging Interface (O-MI) and Open Data Format (O-DF), which ensures data communication. (1) Background: Based on an existing product lifecycle management (PLM) methodology, we enhanced the ontology model for the purpose of integrating efficiently the product-service ontology model that was newly developed; (2) Methods: The IoT message transfer layer is vertically integrated into a semantic knowledge framework inside which a Semantic Info-Node Agent (SINA) uses the message format as a common protocol of product-service lifecycle data transfer; (3) Results: The product-service ontology model facilitates information retrieval and knowledge extraction during the product lifecycle, while making more information available for the sake of service business creation. The vertical integration of IoT message transfer, encompassing all semantic layers, helps achieve a more flexible and modular approach to knowledge sharing in an IoT environment; (4) Contribution: A semantic data annotation applied to IoT can contribute to enhancing collected data types, which entails a richer knowledge extraction. The ontology-based PLM model enables as well the horizontal integration of heterogeneous PLM data while breaking traditional vertical information silos; (5) Conclusion: The framework was applied to a fictive case study with an electric car service for the purpose of demonstration. For the purpose of demonstrating the feasibility of the approach, the semantic model is implemented in Sesame APIs, which play the role of an Internet-connected Resource Description Framework (RDF) database. PMID:27399717

  5. Insights from child development on the relationship between episodic and semantic memory.

    PubMed

    Robertson, Erin K; Köhler, Stefan

    2007-11-05

    The present study was motivated by a recent controversy in the neuropsychological literature on semantic dementia as to whether episodic encoding requires semantic processing or whether it can proceed solely based on perceptual processing. We addressed this issue by examining the effect of age-related limitations in semantic competency on episodic memory in 4-6-year-old children (n=67). We administered three different forced-choice recognition memory tests for pictures previously encountered in a single study episode. The tests varied in the degree to which access to semantically encoded information was required at retrieval. Semantic competency predicted recognition performance regardless of whether access to semantic information was required. A direct relation between picture naming at encoding and subsequent recognition was also found for all tests. Our findings emphasize the importance of semantic encoding processes even in retrieval situations that purportedly do not require access to semantic information. They also highlight the importance of testing neuropsychological models of memory in different populations, healthy and brain damaged, at both ends of the developmental continuum.

  6. The effects of associative and semantic priming in the lexical decision task.

    PubMed

    Perea, Manuel; Rosa, Eva

    2002-08-01

    Four lexical decision experiments were conducted to examine under which conditions automatic semantic priming effects can be obtained. Experiments 1 and 2 analyzed associative/semantic effects at several very short stimulus-onset asynchronies (SOAs), whereas Experiments 3 and 4 used a single-presentation paradigm at two response-stimulus intervals (RSIs). Experiment 1 tested associatively related pairs from three semantic categories (synonyms, antonyms, and category coordinates). The results showed reliable associative priming effects at all SOAs. In addition, the correlation between associative strength and magnitude of priming was significant only at the shortest SOA (66 ms). When prime-target pairs were semantically but not associatively related (Experiment 2), reliable priming effects were obtained at SOAs of 83 ms and longer. Using the single-presentation paradigm with a short RSI (200 ms, Experiment 3), the priming effect was equal in size for associative + semantic and for semantic-only pairs (a 21-ms effect). When the RSI was set much longer (1,750 ms, Experiment 4), only the associative + semantic pairs showed a reliable priming effect (23 ms). The results are interpreted in the context of models of semantic memory.

  7. Cross border semantic interoperability for clinical research: the EHR4CR semantic resources and services

    PubMed Central

    Daniel, Christel; Ouagne, David; Sadou, Eric; Forsberg, Kerstin; Gilchrist, Mark Mc; Zapletal, Eric; Paris, Nicolas; Hussain, Sajjad; Jaulent, Marie-Christine; MD, Dipka Kalra

    2016-01-01

    With the development of platforms enabling the use of routinely collected clinical data in the context of international clinical research, scalable solutions for cross border semantic interoperability need to be developed. Within the context of the IMI EHR4CR project, we first defined the requirements and evaluation criteria of the EHR4CR semantic interoperability platform and then developed the semantic resources and supportive services and tooling to assist hospital sites in standardizing their data for allowing the execution of the project use cases. The experience gained from the evaluation of the EHR4CR platform accessing to semantically equivalent data elements across 11 European participating EHR systems from 5 countries demonstrated how far the mediation model and mapping efforts met the expected requirements of the project. Developers of semantic interoperability platforms are beginning to address a core set of requirements in order to reach the goal of developing cross border semantic integration of data. PMID:27570649

  8. Knowledge of the human body: a distinct semantic domain.

    PubMed

    Coslett, H Branch; Saffran, Eleanor M; Schwoebel, John

    2002-08-13

    Patients with selective deficits in the naming and comprehension of animals, plants, and artifacts have been reported. These descriptions of specific semantic category deficits have contributed substantially to the understanding of the architecture of semantic representations. This study sought to further understanding of the organization of the semantic system by demonstrating that another semantic category, knowledge of the human body, may be selectively preserved. The performance of a patient with semantic dementia was compared with the performance of healthy controls on a variety of tasks assessing distinct types of body representations, including the body schema, body image, and body structural description. Despite substantial deficits on tasks involving language and knowledge of the world generally, the patient performed normally on all tests of body knowledge except body part naming; even in this naming task, however, her performance with body parts was significantly better than on artifacts. The demonstration that body knowledge may be preserved despite substantial semantic deficits involving other types of semantic information argues that body knowledge is a distinct and dissociable semantic category. These data are interpreted as support for a model of semantics that proposes that knowledge is distributed across different cortical regions reflecting the manner in which the information was acquired.

  9. Triangulation of the neurocomputational architecture underpinning reading aloud

    PubMed Central

    Hoffman, Paul; Lambon Ralph, Matthew A.; Woollams, Anna M.

    2015-01-01

    The goal of cognitive neuroscience is to integrate cognitive models with knowledge about underlying neural machinery. This significant challenge was explored in relation to word reading, where sophisticated computational-cognitive models exist but have made limited contact with neural data. Using distortion-corrected functional MRI and dynamic causal modeling, we investigated the interactions between brain regions dedicated to orthographic, semantic, and phonological processing while participants read words aloud. We found that the lateral anterior temporal lobe exhibited increased activation when participants read words with irregular spellings. This area is implicated in semantic processing but has not previously been considered part of the reading network. We also found meaningful individual differences in the activation of this region: Activity was predicted by an independent measure of the degree to which participants use semantic knowledge to read. These characteristics are predicted by the connectionist Triangle Model of reading and indicate a key role for semantic knowledge in reading aloud. Premotor regions associated with phonological processing displayed the reverse characteristics. Changes in the functional connectivity of the reading network during irregular word reading also were consistent with semantic recruitment. These data support the view that reading aloud is underpinned by the joint operation of two neural pathways. They reveal that (i) the ATL is an important element of the ventral semantic pathway and (ii) the division of labor between the two routes varies according to both the properties of the words being read and individual differences in the degree to which participants rely on each route. PMID:26124121

  10. Knowledge represented using RDF semantic network in the concept of semantic web

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lukasova, A., E-mail: alena.lukasova@osu.cz; Vajgl, M., E-mail: marek.vajgl@osu.cz; Zacek, M., E-mail: martin.zacek@osu.cz

    The RDF(S) model has been declared as the basic model to capture knowledge of the semantic web. It provides a common and flexible way to decompose composed knowledge to elementary statements, which can be represented by RDF triples or by RDF graph vectors. From the logical point of view, elements of knowledge can be expressed using at most binary predicates, which can be converted to RDF-triples or graph vectors. However, it is not able to capture implicit knowledge representable by logical formulas. This contribution shows how existing approaches (semantic networks and clausal form logic) can be combined together with RDFmore » to obtain RDF-compatible system with ability to represent implicit knowledge and inference over knowledge base.« less

  11. Semantic acquisition without memories: evidence from transient global amnesia.

    PubMed

    Guillery, B; Desgranges, B; Katis, S; de la Sayette, V; Viader, F; Eustache, F

    2001-12-04

    Transient global amnesia (TGA), characterised by a profound anterograde amnesia, is a model of interest to study the acquisition of novel meanings independent of episodic functioning. Three patients were tested during a TGA attack, two in the early recovery phase and the third during the acute phase of TGA, with a semantic priming task involving a restructuring process of conceptual knowledge. During TGA, all patients demonstrated priming effects. Results obtained the day after the episode with the same task showed that these effects persisted at least one day. Episodic memory seems not to be critical for the formation of novel connections among unrelated semantic representations, in accordance with Tulving's model of memory, i.e. episodic memory is not necessary for the acquisition of semantic information.

  12. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  13. Semantic Segmentation of Indoor Point Clouds Using Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Babacan, K.; Chen, L.; Sohn, G.

    2017-11-01

    As Building Information Modelling (BIM) thrives, geometry becomes no longer sufficient; an ever increasing variety of semantic information is needed to express an indoor model adequately. On the other hand, for the existing buildings, automatically generating semantically enriched BIM from point cloud data is in its infancy. The previous research to enhance the semantic content rely on frameworks in which some specific rules and/or features that are hand coded by specialists. These methods immanently lack generalization and easily break in different circumstances. On this account, a generalized framework is urgently needed to automatically and accurately generate semantic information. Therefore we propose to employ deep learning techniques for the semantic segmentation of point clouds into meaningful parts. More specifically, we build a volumetric data representation in order to efficiently generate the high number of training samples needed to initiate a convolutional neural network architecture. The feedforward propagation is used in such a way to perform the classification in voxel level for achieving semantic segmentation. The method is tested both for a mobile laser scanner point cloud, and a larger scale synthetically generated data. We also demonstrate a case study, in which our method can be effectively used to leverage the extraction of planar surfaces in challenging cluttered indoor environments.

  14. The structure of semantic person memory: evidence from semantic priming in person recognition.

    PubMed

    Wiese, Holger

    2011-11-01

    This paper reviews research on the structure of semantic person memory as examined with semantic priming. In this experimental paradigm, a familiarity decision on a target face or written name is usually faster when it is preceded by a related as compared to an unrelated prime. This effect has been shown to be relatively short lived and susceptible to interfering items. Moreover, semantic priming can cross stimulus domains, such that a written name can prime a target face and vice versa. However, it remains controversial whether representations of people are stored in associative networks based on co-occurrence, or in more abstract semantic categories. In line with prominent cognitive models of face recognition, which explain semantic priming by shared semantic information between prime and target, recent research demonstrated that priming could be obtained from purely categorically related, non-associated prime/target pairs. Although strategic processes, such as expectancy and retrospective matching likely contribute, there is also evidence for a non-strategic contribution to priming, presumably related to spreading activation. Finally, a semantic priming effect has been demonstrated in the N400 event-related potential (ERP) component, which may reflect facilitated access to semantic information. It is concluded that categorical relatedness is one organizing principle of semantic person memory. ©2011 The British Psychological Society.

  15. Neural Substrates of Processing Anger in Language: Contributions of Prosody and Semantics.

    PubMed

    Castelluccio, Brian C; Myers, Emily B; Schuh, Jillian M; Eigsti, Inge-Marie

    2016-12-01

    Emotions are conveyed primarily through two channels in language: semantics and prosody. While many studies confirm the role of a left hemisphere network in processing semantic emotion, there has been debate over the role of the right hemisphere in processing prosodic emotion. Some evidence suggests a preferential role for the right hemisphere, and other evidence supports a bilateral model. The relative contributions of semantics and prosody to the overall processing of affect in language are largely unexplored. The present work used functional magnetic resonance imaging to elucidate the neural bases of processing anger conveyed by prosody or semantic content. Results showed a robust, distributed, bilateral network for processing angry prosody and a more modest left hemisphere network for processing angry semantics when compared to emotionally neutral stimuli. Findings suggest the nervous system may be more responsive to prosodic cues in speech than to the semantic content of speech.

  16. Hemispheric specialization in selective attention and short-term memory: a fine-coarse model of left- and right-ear disadvantages

    PubMed Central

    Marsh, John E.; Pilgrim, Lea K.; Sörqvist, Patrik

    2013-01-01

    Serial short-term memory is impaired by irrelevant sound, particularly when the sound changes acoustically. This acoustic effect is larger when the sound is presented to the left compared to the right ear (a left-ear disadvantage). Serial memory appears relatively insensitive to distraction from the semantic properties of a background sound. In contrast, short-term free recall of semantic-category exemplars is impaired by the semantic properties of background speech and is relatively insensitive to the sound's acoustic properties. This semantic effect is larger when the sound is presented to the right compared to the left ear (a right-ear disadvantage). In this paper, we outline a speculative neurocognitive fine-coarse model of these hemispheric differences in relation to short-term memory and selective attention, and explicate empirical directions in which this model can be critically evaluated. PMID:24399988

  17. Integration of nursing assessment concepts into the medical entities dictionary using the LOINC semantic structure as a terminology model.

    PubMed

    Cieslowski, B J; Wajngurt, D; Cimino, J J; Bakken, S

    2001-01-01

    Recent investigations have tested the applicability of various terminology models for the representing nursing concepts including those related to nursing diagnoses, nursing interventions, and standardized nursing assessments as a prerequisite for building a reference terminology that supports the nursing domain. We used the semantic structure of Clinical LOINC (Logical Observations, Identifiers, Names, and Codes) as a reference terminology model to support the integration of standardized assessment terms from two nursing terminologies into the Medical Entities Dictionary (MED), the concept-oriented, metadata dictionary at New York Presbyterian Hospital. Although the LOINC semantic structure was used previously to represent laboratory terms in the MED, selected hierarchies and semantic slots required revisions in order to incorporate the nursing assessment concepts. This project was an initial step in integrating nursing assessment concepts into the MED in a manner consistent with evolving standards for reference terminology models. Moreover, the revisions provide the foundation for adding other types of standardized assessments to the MED.

  18. Integration of nursing assessment concepts into the medical entities dictionary using the LOINC semantic structure as a terminology model.

    PubMed Central

    Cieslowski, B. J.; Wajngurt, D.; Cimino, J. J.; Bakken, S.

    2001-01-01

    Recent investigations have tested the applicability of various terminology models for the representing nursing concepts including those related to nursing diagnoses, nursing interventions, and standardized nursing assessments as a prerequisite for building a reference terminology that supports the nursing domain. We used the semantic structure of Clinical LOINC (Logical Observations, Identifiers, Names, and Codes) as a reference terminology model to support the integration of standardized assessment terms from two nursing terminologies into the Medical Entities Dictionary (MED), the concept-oriented, metadata dictionary at New York Presbyterian Hospital. Although the LOINC semantic structure was used previously to represent laboratory terms in the MED, selected hierarchies and semantic slots required revisions in order to incorporate the nursing assessment concepts. This project was an initial step in integrating nursing assessment concepts into the MED in a manner consistent with evolving standards for reference terminology models. Moreover, the revisions provide the foundation for adding other types of standardized assessments to the MED. PMID:11825165

  19. The dark side of incremental learning: a model of cumulative semantic interference during lexical access in speech production.

    PubMed

    Oppenheim, Gary M; Dell, Gary S; Schwartz, Myrna F

    2010-02-01

    Naming a picture of a dog primes the subsequent naming of a picture of a dog (repetition priming) and interferes with the subsequent naming of a picture of a cat (semantic interference). Behavioral studies suggest that these effects derive from persistent changes in the way that words are activated and selected for production, and some have claimed that the findings are only understandable by positing a competitive mechanism for lexical selection. We present a simple model of lexical retrieval in speech production that applies error-driven learning to its lexical activation network. This model naturally produces repetition priming and semantic interference effects. It predicts the major findings from several published experiments, demonstrating that these effects may arise from incremental learning. Furthermore, analysis of the model suggests that competition during lexical selection is not necessary for semantic interference if the learning process is itself competitive. Copyright 2009 Elsevier B.V. All rights reserved.

  20. Semantics-Based Composition of Integrated Cardiomyocyte Models Motivated by Real-World Use Cases.

    PubMed

    Neal, Maxwell L; Carlson, Brian E; Thompson, Christopher T; James, Ryan C; Kim, Karam G; Tran, Kenneth; Crampin, Edmund J; Cook, Daniel L; Gennari, John H

    2015-01-01

    Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen's semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the "Pandit-Hinch-Niederer" (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach.

  1. Semantics-Based Composition of Integrated Cardiomyocyte Models Motivated by Real-World Use Cases

    PubMed Central

    Neal, Maxwell L.; Carlson, Brian E.; Thompson, Christopher T.; James, Ryan C.; Kim, Karam G.; Tran, Kenneth; Crampin, Edmund J.; Cook, Daniel L.; Gennari, John H.

    2015-01-01

    Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen’s semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the “Pandit-Hinch-Niederer” (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach. PMID:26716837

  2. Semantic and Phonological Loop Effects on Verbal Working Memory in Middle-Age Adults with Mental Retardation

    ERIC Educational Resources Information Center

    Kittler, Phyllis; Krinsky-McHale, Sharon J.; Devenny, Darlynne A.

    2004-01-01

    Semantic and phonological loop effects on verbal working memory were examined among middle-age adults with Down syndrome and those with unspecified mental retardation in the context of Baddeley's working memory model. Recall was poorer for phonologically similar, semantically similar, and long words compared to recall of dissimilar short words.…

  3. The Organization and Dissolution of Semantic-Conceptual Knowledge: Is the "Amodal Hub" the Only Plausible Model?

    ERIC Educational Resources Information Center

    Gainotti, Guido

    2011-01-01

    In recent years, the anatomical and functional bases of conceptual activity have attracted a growing interest. In particular, Patterson and Lambon-Ralph have proposed the existence, in the anterior parts of the temporal lobes, of a mechanism (the "amodal semantic hub") supporting the interactive activation of semantic representations in all…

  4. The Semantic Mapping of Archival Metadata to the CIDOC CRM Ontology

    ERIC Educational Resources Information Center

    Bountouri, Lina; Gergatsoulis, Manolis

    2011-01-01

    In this article we analyze the main semantics of archival description, expressed through Encoded Archival Description (EAD). Our main target is to map the semantics of EAD to the CIDOC Conceptual Reference Model (CIDOC CRM) ontology as part of a wider integration architecture of cultural heritage metadata. Through this analysis, it is concluded…

  5. Masked Associative/Semantic Priming Effects across Languages with Highly Proficient Bilinguals

    ERIC Educational Resources Information Center

    Perea, Manuel; Dunabeitia, Jon Andoni; Carreiras, Manuel

    2008-01-01

    One key issue for models of bilingual memory is to what degree the semantic representation from one of the languages is shared with the other language. In the present paper, we examine whether there is an early, automatic semantic priming effect across languages for noncognates with highly proficient (Basque/Spanish) bilinguals. Experiment 1 was a…

  6. A Neurocomputational Model of the N400 and the P600 in Language Processing

    ERIC Educational Resources Information Center

    Brouwer, Harm; Crocker, Matthew W.; Venhuizen, Noortje J.; Hoeks, John C. J.

    2017-01-01

    Ten years ago, researchers using event-related brain potentials (ERPs) to study language comprehension were puzzled by what looked like a "Semantic Illusion": Semantically anomalous, but structurally well-formed sentences did not affect the N400 component--traditionally taken to reflect semantic integration--but instead produced a P600…

  7. Maintenance and Generalization Effects of Semantic and Phonological Treatments of Anomia: A Case Study

    ERIC Educational Resources Information Center

    Macoir, Joel; Routhier, Sonia; Simard, Anne; Picard, Josee

    2012-01-01

    Anomia is one of the most frequent manifestations in aphasia. Model-based treatments for anomia usually focus on semantic and/or phonological levels of processing. This study reports treatment of anomia in an individual with chronic aphasia. After baseline testing, she received a training program in which semantic and phonological treatments were…

  8. Designing Collaborative E-Learning Environments Based upon Semantic Wiki: From Design Models to Application Scenarios

    ERIC Educational Resources Information Center

    Li, Yanyan; Dong, Mingkai; Huang, Ronghuai

    2011-01-01

    The knowledge society requires life-long learning and flexible learning environment that enables fast, just-in-time and relevant learning, aiding the development of communities of knowledge, linking learners and practitioners with experts. Based upon semantic wiki, a combination of wiki and Semantic Web technology, this paper designs and develops…

  9. Emotional modelling and classification of a large-scale collection of scene images in a cluster environment

    PubMed Central

    Li, Yanfei; Tian, Yun

    2018-01-01

    The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model. PMID:29320579

  10. Emotional modelling and classification of a large-scale collection of scene images in a cluster environment.

    PubMed

    Cao, Jianfang; Li, Yanfei; Tian, Yun

    2018-01-01

    The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model.

  11. Text mining to decipher free-response consumer complaints: insights from the NHTSA vehicle owner's complaint database.

    PubMed

    Ghazizadeh, Mahtab; McDonald, Anthony D; Lee, John D

    2014-09-01

    This study applies text mining to extract clusters of vehicle problems and associated trends from free-response data in the National Highway Traffic Safety Administration's vehicle owner's complaint database. As the automotive industry adopts new technologies, it is important to systematically assess the effect of these changes on traffic safety. Driving simulators, naturalistic driving data, and crash databases all contribute to a better understanding of how drivers respond to changing vehicle technology, but other approaches, such as automated analysis of incident reports, are needed. Free-response data from incidents representing two severity levels (fatal incidents and incidents involving injury) were analyzed using a text mining approach: latent semantic analysis (LSA). LSA and hierarchical clustering identified clusters of complaints for each severity level, which were compared and analyzed across time. Cluster analysis identified eight clusters of fatal incidents and six clusters of incidents involving injury. Comparisons showed that although the airbag clusters across the two severity levels have the same most frequent terms, the circumstances around the incidents differ. The time trends show clear increases in complaints surrounding the Ford/Firestone tire recall and the Toyota unintended acceleration recall. Increases in complaints may be partially driven by these recall announcements and the associated media attention. Text mining can reveal useful information from free-response databases that would otherwise be prohibitively time-consuming and difficult to summarize manually. Text mining can extend human analysis capabilities for large free-response databases to support earlier detection of problems and more timely safety interventions.

  12. Anterior temporal lobes mediate semantic representation: Mimicking semantic dementia by using rTMS in normal participants

    PubMed Central

    Pobric, Gorana; Jefferies, Elizabeth; Ralph, Matthew A. Lambon

    2007-01-01

    Studies of semantic dementia and PET neuroimaging investigations suggest that the anterior temporal lobes (ATL) are a critical substrate for semantic representation. In stark contrast, classical neurological models of comprehension do not include ATL, and likewise functional MRI studies often fail to show activations in the ATL, reinforcing the classical view. Using a novel application of low-frequency, repetitive transcranial magnetic stimulation (rTMS) over the ATL, we demonstrate that the behavioral pattern of semantic dementia can be mirrored in neurologically intact participants: Specifically, we show that temporary disruption to neural processing in the ATL produces a selective semantic impairment leading to significant slowing in both picture naming and word comprehension but not to other equally demanding, nonsemantic cognitive tasks. PMID:18056637

  13. Must analysis of meaning follow analysis of form? A time course analysis

    PubMed Central

    Feldman, Laurie B.; Milin, Petar; Cho, Kit W.; Moscoso del Prado Martín, Fermín; O’Connor, Patrick A.

    2015-01-01

    Many models of word recognition assume that processing proceeds sequentially from analysis of form to analysis of meaning. In the context of morphological processing, this implies that morphemes are processed as units of form prior to any influence of their meanings. Some interpret the apparent absence of differences in recognition latencies to targets (SNEAK) in form and semantically similar (sneaky-SNEAK) and in form similar and semantically dissimilar (sneaker-SNEAK) prime contexts at a stimulus onset asynchrony (SOA) of 48 ms as consistent with this claim. To determine the time course over which degree of semantic similarity between morphologically structured primes and their targets influences recognition in the forward masked priming variant of the lexical decision paradigm, we compared facilitation for the same targets after semantically similar and dissimilar primes across a range of SOAs (34–100 ms). The effect of shared semantics on recognition latency increased linearly with SOA when long SOAs were intermixed (Experiments 1A and 1B) and latencies were significantly faster after semantically similar than dissimilar primes at homogeneous SOAs of 48 ms (Experiment 2) and 34 ms (Experiment 3). Results limit the scope of form-then-semantics models of recognition and demonstrate that semantics influences even the very early stages of recognition. Finally, once general performance across trials has been accounted for, we fail to provide evidence for individual differences in morphological processing that can be linked to measures of reading proficiency. PMID:25852512

  14. Must analysis of meaning follow analysis of form? A time course analysis.

    PubMed

    Feldman, Laurie B; Milin, Petar; Cho, Kit W; Moscoso Del Prado Martín, Fermín; O'Connor, Patrick A

    2015-01-01

    Many models of word recognition assume that processing proceeds sequentially from analysis of form to analysis of meaning. In the context of morphological processing, this implies that morphemes are processed as units of form prior to any influence of their meanings. Some interpret the apparent absence of differences in recognition latencies to targets (SNEAK) in form and semantically similar (sneaky-SNEAK) and in form similar and semantically dissimilar (sneaker-SNEAK) prime contexts at a stimulus onset asynchrony (SOA) of 48 ms as consistent with this claim. To determine the time course over which degree of semantic similarity between morphologically structured primes and their targets influences recognition in the forward masked priming variant of the lexical decision paradigm, we compared facilitation for the same targets after semantically similar and dissimilar primes across a range of SOAs (34-100 ms). The effect of shared semantics on recognition latency increased linearly with SOA when long SOAs were intermixed (Experiments 1A and 1B) and latencies were significantly faster after semantically similar than dissimilar primes at homogeneous SOAs of 48 ms (Experiment 2) and 34 ms (Experiment 3). Results limit the scope of form-then-semantics models of recognition and demonstrate that semantics influences even the very early stages of recognition. Finally, once general performance across trials has been accounted for, we fail to provide evidence for individual differences in morphological processing that can be linked to measures of reading proficiency.

  15. Interaction between Phonological and Semantic Representations: Time Matters

    ERIC Educational Resources Information Center

    Chen, Qi; Mirman, Daniel

    2015-01-01

    Computational modeling and eye-tracking were used to investigate how phonological and semantic information interact to influence the time course of spoken word recognition. We extended our recent models (Chen & Mirman, 2012; Mirman, Britt, & Chen, 2013) to account for new evidence that competition among phonological neighbors influences…

  16. Component Models for Semantic Web Languages

    NASA Astrophysics Data System (ADS)

    Henriksson, Jakob; Aßmann, Uwe

    Intelligent applications and agents on the Semantic Web typically need to be specified with, or interact with specifications written in, many different kinds of formal languages. Such languages include ontology languages, data and metadata query languages, as well as transformation languages. As learnt from years of experience in development of complex software systems, languages need to support some form of component-based development. Components enable higher software quality, better understanding and reusability of already developed artifacts. Any component approach contains an underlying component model, a description detailing what valid components are and how components can interact. With the multitude of languages developed for the Semantic Web, what are their underlying component models? Do we need to develop one for each language, or is a more general and reusable approach achievable? We present a language-driven component model specification approach. This means that a component model can be (automatically) generated from a given base language (actually, its specification, e.g. its grammar). As a consequence, we can provide components for different languages and simplify the development of software artifacts used on the Semantic Web.

  17. Generating Poetry Title Based on Semantic Relevance with Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Li, Z.; Niu, K.; He, Z. Q.

    2017-09-01

    Several approaches have been proposed to automatically generate Chinese classical poetry (CCP) in the past few years, but automatically generating the title of CCP is still a difficult problem. The difficulties are mainly reflected in two aspects. First, the words used in CCP are very different from modern Chinese words and there are no valid word segmentation tools. Second, the semantic relevance of characters in CCP not only exists in one sentence but also exists between the same positions of adjacent sentences, which is hard to grasp by the traditional text summarization models. In this paper, we propose an encoder-decoder model for generating the title of CCP. Our model encoder is a convolutional neural network (CNN) with two kinds of filters. To capture the commonly used words in one sentence, one kind of filters covers two characters horizontally at each step. The other covers two characters vertically at each step and can grasp the semantic relevance of characters between adjacent sentences. Experimental results show that our model is better than several other related models and can capture the semantic relevance of CCP more accurately.

  18. Accelerating Cancer Systems Biology Research through Semantic Web Technology

    PubMed Central

    Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S.

    2012-01-01

    Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute’s caBIG®, so users can not only interact with the DMR through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers’ intellectual property. PMID:23188758

  19. Degree versus direction: a comparison of four handedness classification schemes through the investigation of lateralised semantic priming.

    PubMed

    Kaploun, Kristen A; Abeare, Christopher A

    2010-09-01

    Four classification systems were examined using lateralised semantic priming in order to investigate whether degree or direction of handedness better captures the pattern of lateralised semantic priming. A total of 85 participants completed a lateralised semantic priming task and three handedness questionnaires. The classification systems tested were: (1) the traditional right- vs left-handed (RHs vs LHs); (2) a four-factor model of strong and weak right- and left-handers (SRHs, WRHs, SLHs, WLHs); (3) strong- vs mixed-handed (SHs vs MHs); and (4) a three-factor model of consistent left- (CLHs), inconsistent left- (ILHs), and consistent right-handers (CRHs). Mixed-factorial ANOVAs demonstrated significant visual field (VF) by handedness interactions for all but the third model. Results show that LHs, SLHs, CLHs, and ILHs responded faster to LVF targets, whereas RHs, SRHs, and CRHs responded faster to RVF targets; no significant VF by handedness interaction was found between SHs and MHs. The three-factor model better captures handedness group divergence on lateralised semantic priming by incorporating the direction of handedness as well as the degree. These findings help explain some of the variance in language lateralisation, demonstrating that direction of handedness is as important as degree. The need for greater consideration of handedness subgroups in laterality research is highlighted.

  20. MPEG-7-based description infrastructure for an audiovisual content analysis and retrieval system

    NASA Astrophysics Data System (ADS)

    Bailer, Werner; Schallauer, Peter; Hausenblas, Michael; Thallinger, Georg

    2005-01-01

    We present a case study of establishing a description infrastructure for an audiovisual content-analysis and retrieval system. The description infrastructure consists of an internal metadata model and access tool for using it. Based on an analysis of requirements, we have selected, out of a set of candidates, MPEG-7 as the basis of our metadata model. The openness and generality of MPEG-7 allow using it in broad range of applications, but increase complexity and hinder interoperability. Profiling has been proposed as a solution, with the focus on selecting and constraining description tools. Semantic constraints are currently only described in textual form. Conformance in terms of semantics can thus not be evaluated automatically and mappings between different profiles can only be defined manually. As a solution, we propose an approach to formalize the semantic constraints of an MPEG-7 profile using a formal vocabulary expressed in OWL, which allows automated processing of semantic constraints. We have defined the Detailed Audiovisual Profile as the profile to be used in our metadata model and we show how some of the semantic constraints of this profile can be formulated using ontologies. To work practically with the metadata model, we have implemented a MPEG-7 library and a client/server document access infrastructure.

  1. Accelerating cancer systems biology research through Semantic Web technology.

    PubMed

    Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S

    2013-01-01

    Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute's caBIG, so users can interact with the DMR not only through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers' intellectual property. Copyright © 2012 Wiley Periodicals, Inc.

  2. Automated Predictive Big Data Analytics Using Ontology Based Semantics.

    PubMed

    Nural, Mustafa V; Cotterell, Michael E; Peng, Hao; Xie, Rui; Ma, Ping; Miller, John A

    2015-10-01

    Predictive analytics in the big data era is taking on an ever increasingly important role. Issues related to choice on modeling technique, estimation procedure (or algorithm) and efficient execution can present significant challenges. For example, selection of appropriate and optimal models for big data analytics often requires careful investigation and considerable expertise which might not always be readily available. In this paper, we propose to use semantic technology to assist data analysts and data scientists in selecting appropriate modeling techniques and building specific models as well as the rationale for the techniques and models selected. To formally describe the modeling techniques, models and results, we developed the Analytics Ontology that supports inferencing for semi-automated model selection. The SCALATION framework, which currently supports over thirty modeling techniques for predictive big data analytics is used as a testbed for evaluating the use of semantic technology.

  3. Automated Predictive Big Data Analytics Using Ontology Based Semantics

    PubMed Central

    Nural, Mustafa V.; Cotterell, Michael E.; Peng, Hao; Xie, Rui; Ma, Ping; Miller, John A.

    2017-01-01

    Predictive analytics in the big data era is taking on an ever increasingly important role. Issues related to choice on modeling technique, estimation procedure (or algorithm) and efficient execution can present significant challenges. For example, selection of appropriate and optimal models for big data analytics often requires careful investigation and considerable expertise which might not always be readily available. In this paper, we propose to use semantic technology to assist data analysts and data scientists in selecting appropriate modeling techniques and building specific models as well as the rationale for the techniques and models selected. To formally describe the modeling techniques, models and results, we developed the Analytics Ontology that supports inferencing for semi-automated model selection. The SCALATION framework, which currently supports over thirty modeling techniques for predictive big data analytics is used as a testbed for evaluating the use of semantic technology. PMID:29657954

  4. Hybrid Schema Matching for Deep Web

    NASA Astrophysics Data System (ADS)

    Chen, Kerui; Zuo, Wanli; He, Fengling; Chen, Yongheng

    Schema matching is the process of identifying semantic mappings, or correspondences, between two or more schemas. Schema matching is a first step and critical part of data integration. For schema matching of deep web, most researches only interested in query interface, while rarely pay attention to abundant schema information contained in query result pages. This paper proposed a mixed schema matching technique, which combines attributes that appeared in query structures and query results of different data sources, and mines the matched schemas inside. Experimental results prove the effectiveness of this method for improving the accuracy of schema matching.

  5. The role of semantic self-perceptions in temporal distance perceptions toward autobiographical events: the semantic congruence model.

    PubMed

    Gebauer, Jochen E; Haddock, Geoffrey; Broemer, Philip; von Hecker, Ulrich

    2013-11-01

    Why do some autobiographical events feel as if they happened yesterday, whereas others feel like ancient history? Such temporal distance perceptions have surprisingly little to do with actual calendar time distance. Instead, psychologists have found that people typically perceive positive autobiographical events as overly recent, while perceiving negative events as overly distant. The origins of this temporal distance bias have been sought in self-enhancement strivings and mood congruence between autobiographical events and chronic mood. As such, past research exclusively focused on the evaluative features of autobiographical events, while neglecting semantic features. To close this gap, we introduce a semantic congruence model. Capitalizing on the Big Two self-perception dimensions, Study 1 showed that high semantic congruence between recalled autobiographical events and trait self-perceptions render the recalled events subjectively recent. Specifically, interpersonally warm (competent) individuals perceived autobiographical events reflecting warmth (competence) as relatively recent, but warm (competent) individuals did not perceive events reflecting competence (warmth) as relatively recent. Study 2 found that conscious perceptions of congruence mediate these effects. Studies 3 and 4 showed that neither mood congruence nor self-enhancement account for these results. Study 5 extended the results from the Big Two to the Big Five self-perception dimensions, while affirming the independence of the semantic congruence model from evaluative influences. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  6. Predicting Visual Semantic Descriptive Terms from Radiological Image Data: Preliminary Results with Liver Lesions in CT

    PubMed Central

    Depeursinge, Adrien; Kurtz, Camille; Beaulieu, Christopher F.; Napel, Sandy; Rubin, Daniel L.

    2014-01-01

    We describe a framework to model visual semantics of liver lesions in CT images in order to predict the visual semantic terms (VST) reported by radiologists in describing these lesions. Computational models of VST are learned from image data using high–order steerable Riesz wavelets and support vector machines (SVM). The organization of scales and directions that are specific to every VST are modeled as linear combinations of directional Riesz wavelets. The models obtained are steerable, which means that any orientation of the model can be synthesized from linear combinations of the basis filters. The latter property is leveraged to model VST independently from their local orientation. In a first step, these models are used to predict the presence of each semantic term that describes liver lesions. In a second step, the distances between all VST models are calculated to establish a non–hierarchical computationally–derived ontology of VST containing inter–term synonymy and complementarity. A preliminary evaluation of the proposed framework was carried out using 74 liver lesions annotated with a set of 18 VSTs from the RadLex ontology. A leave–one–patient–out cross–validation resulted in an average area under the ROC curve of 0.853 for predicting the presence of each VST when using SVMs in a feature space combining the magnitudes of the steered models with CT intensities. Likelihood maps are created for each VST, which enables high transparency of the information modeled. The computationally–derived ontology obtained from the VST models was found to be consistent with the underlying semantics of the visual terms. It was found to be complementary to the RadLex ontology, and constitutes a potential method to link the image content to visual semantics. The proposed framework is expected to foster human–computer synergies for the interpretation of radiological images while using rotation–covariant computational models of VSTs to (1) quantify their local likelihood and (2) explicitly link them with pixel–based image content in the context of a given imaging domain. PMID:24808406

  7. First Steps to Automated Interior Reconstruction from Semantically Enriched Point Clouds and Imagery

    NASA Astrophysics Data System (ADS)

    Obrock, L. S.; Gülch, E.

    2018-05-01

    The automated generation of a BIM-Model from sensor data is a huge challenge for the modeling of existing buildings. Currently the measurements and analyses are time consuming, allow little automation and require expensive equipment. We do lack an automated acquisition of semantical information of objects in a building. We are presenting first results of our approach based on imagery and derived products aiming at a more automated modeling of interior for a BIM building model. We examine the building parts and objects visible in the collected images using Deep Learning Methods based on Convolutional Neural Networks. For localization and classification of building parts we apply the FCN8s-Model for pixel-wise Semantic Segmentation. We, so far, reach a Pixel Accuracy of 77.2 % and a mean Intersection over Union of 44.2 %. We finally use the network for further reasoning on the images of the interior room. We combine the segmented images with the original images and use photogrammetric methods to produce a three-dimensional point cloud. We code the extracted object types as colours of the 3D-points. We thus are able to uniquely classify the points in three-dimensional space. We preliminary investigate a simple extraction method for colour and material of building parts. It is shown, that the combined images are very well suited to further extract more semantic information for the BIM-Model. With the presented methods we see a sound basis for further automation of acquisition and modeling of semantic and geometric information of interior rooms for a BIM-Model.

  8. Does the Sound of a Barking Dog Activate its Corresponding Visual Form? An fMRI Investigation of Modality-Specific Semantic Access

    PubMed Central

    Reilly, Jamie; Garcia, Amanda; Binney, Richard J.

    2016-01-01

    Much remains to be learned about the neural architecture underlying word meaning. Fully distributed models of semantic memory predict that the sound of a barking dog will conjointly engage a network of distributed sensorimotor spokes. An alternative framework holds that modality-specific features additionally converge within transmodal hubs. Participants underwent functional MRI while covertly naming familiar objects versus newly learned novel objects from only one of their constituent semantic features (visual form, characteristic sound, or point-light motion representation). Relative to the novel object baseline, familiar concepts elicited greater activation within association regions specific to that presentation modality. Furthermore, visual form elicited activation within high-level auditory association cortex. Conversely, environmental sounds elicited activation in regions proximal to visual association cortex. Both conditions commonly engaged a putative hub region within lateral anterior temporal cortex. These results support hybrid semantic models in which local hubs and distributed spokes are dually engaged in service of semantic memory. PMID:27289210

  9. Nouns, verbs, objects, actions, and abstractions: Local fMRI activity indexes semantics, not lexical categories

    PubMed Central

    Moseley, Rachel L.; Pulvermüller, Friedemann

    2014-01-01

    Noun/verb dissociations in the literature defy interpretation due to the confound between lexical category and semantic meaning; nouns and verbs typically describe concrete objects and actions. Abstract words, pertaining to neither, are a critical test case: dissociations along lexical-grammatical lines would support models purporting lexical category as the principle governing brain organisation, whilst semantic models predict dissociation between concrete words but not abstract items. During fMRI scanning, participants read orthogonalised word categories of nouns and verbs, with or without concrete, sensorimotor meaning. Analysis of inferior frontal/insula, precentral and central areas revealed an interaction between lexical class and semantic factors with clear category differences between concrete nouns and verbs but not abstract ones. Though the brain stores the combinatorial and lexical-grammatical properties of words, our data show that topographical differences in brain activation, especially in the motor system and inferior frontal cortex, are driven by semantics and not by lexical class. PMID:24727103

  10. Nouns, verbs, objects, actions, and abstractions: local fMRI activity indexes semantics, not lexical categories.

    PubMed

    Moseley, Rachel L; Pulvermüller, Friedemann

    2014-05-01

    Noun/verb dissociations in the literature defy interpretation due to the confound between lexical category and semantic meaning; nouns and verbs typically describe concrete objects and actions. Abstract words, pertaining to neither, are a critical test case: dissociations along lexical-grammatical lines would support models purporting lexical category as the principle governing brain organisation, whilst semantic models predict dissociation between concrete words but not abstract items. During fMRI scanning, participants read orthogonalised word categories of nouns and verbs, with or without concrete, sensorimotor meaning. Analysis of inferior frontal/insula, precentral and central areas revealed an interaction between lexical class and semantic factors with clear category differences between concrete nouns and verbs but not abstract ones. Though the brain stores the combinatorial and lexical-grammatical properties of words, our data show that topographical differences in brain activation, especially in the motor system and inferior frontal cortex, are driven by semantics and not by lexical class. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  11. Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access

    NASA Astrophysics Data System (ADS)

    Yang, C.; Huang, T.; Armstrong, E. M.; Moroni, D. F.; Liu, K.; Gui, Z.

    2013-12-01

    Abstract: We present a Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access across the NASA data systems, Global Earth Observation System of Systems (GEOSS) Clearinghouse and Data.gov to facilitate scientists to select Earth observation data that fit better their needs in four aspects: 1. Integrating and interfacing the SMART system to include the functionality of a) semantic reasoning based on Jena, an open source semantic reasoning engine, b) semantic similarity calculation, c) recommendation based on spatiotemporal, semantic, and user workflow patterns, and d) ranking results based on similarity between search terms and data ontology. 2. Collaborating with data user communities to a) capture science data ontology and record relevant ontology triple stores, b) analyze and mine user search and download patterns, c) integrate SMART into metadata-centric discovery system for community-wide usage and feedback, and d) customizing data discovery, search and access user interface to include the ranked results, recommendation components, and semantic based navigations. 3. Laying the groundwork to interface the SMART system with other data search and discovery systems as an open source data search and discovery solution. The SMART systems leverages NASA, GEO, FGDC data discovery, search and access for the Earth science community by enabling scientists to readily discover and access data appropriate to their endeavors, increasing the efficiency of data exploration and decreasing the time that scientists must spend on searching, downloading, and processing the datasets most applicable to their research. By incorporating the SMART system, it is a likely aim that the time being devoted to discovering the most applicable dataset will be substantially reduced, thereby reducing the number of user inquiries and likewise reducing the time and resources expended by a data center in addressing user inquiries. Keywords: EarthCube; ECHO, DAACs, GeoPlatform; Geospatial Cyberinfrastructure References: 1. Yang, P., Evans, J., Cole, M., Alameh, N., Marley, S., & Bambacus, M., (2007). The Emerging Concepts and Applications of the Spatial Web Portal. Photogrammetry Engineering &Remote Sensing,73(6):691-698. 2. Zhang, C, Zhao, T. and W. Li. (2010). The Framework of a Geospatial Semantic Web based Spatial Decision Support System for Digital Earth. International Journal of Digital Earth. 3(2):111-134. 3. Yang C., Raskin R., Goodchild M.F., Gahegan M., 2010, Geospatial Cyberinfrastructure: Past, Present and Future,Computers, Environment, and Urban Systems, 34(4):264-277. 4. Liu K., Yang C., Li W., Gui Z., Xu C., Xia J., 2013. Using ontology and similarity calculations to rank Earth science data searching results, International Journal of Geospatial Information Applications. (in press)

  12. Extraction of CYP chemical interactions from biomedical literature using natural language processing methods.

    PubMed

    Jiao, Dazhi; Wild, David J

    2009-02-01

    This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in our system, such as part of speech (POS) tagging, Named Entity Recognition (NER), dependency parsing, and relation extraction. An evaluation of the system is conducted at the end, yielding very promising results: The POS, dependency parsing, and NER components in our system have achieved a very high level of accuracy as measured by precision, ranging from 85.9% to 98.5%, and the precision and the recall of the interaction extraction component are 76.0% and 82.6%, and for the overall system are 68.4% and 72.2%, respectively.

  13. BacillOndex: an integrated data resource for systems and synthetic biology.

    PubMed

    Misirli, Goksel; Wipat, Anil; Mullen, Joseph; James, Katherine; Pocock, Matthew; Smith, Wendy; Allenby, Nick; Hallinan, Jennifer S

    2013-04-10

    BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.

  14. BacillOndex: An Integrated Data Resource for Systems and Synthetic Biology.

    PubMed

    Misirli, Goksel; Wipat, Anil; Mullen, Joseph; James, Katherine; Pocock, Matthew; Smith, Wendy; Allenby, Nick; Hallinan, Jennifer S

    2013-06-01

    BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.

  15. Semantically-Sensitive Macroprocessing

    DTIC Science & Technology

    1989-12-15

    constr uct for protecting critical regions. Given the synchronization primitives P and V, we might implement the following transformation, where...By this we mean that the semantic model for the base language provides a primitive set of concepts, represented by data types and operations...the gener- ation of a (dynamic-) semantically equivalent program fragment ultimately expressible in terms of built-in primitives . Note that static

  16. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth

    ERIC Educational Resources Information Center

    Steyvers, Mark; Tenenbaum, Joshua B.

    2005-01-01

    We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of…

  17. Exploiting semantics for sensor re-calibration in event detection systems

    NASA Astrophysics Data System (ADS)

    Vaisenberg, Ronen; Ji, Shengyue; Hore, Bijit; Mehrotra, Sharad; Venkatasubramanian, Nalini

    2008-01-01

    Event detection from a video stream is becoming an important and challenging task in surveillance and sentient systems. While computer vision has been extensively studied to solve different kinds of detection problems over time, it is still a hard problem and even in a controlled environment only simple events can be detected with a high degree of accuracy. Instead of struggling to improve event detection using image processing only, we bring in semantics to direct traditional image processing. Semantics are the underlying facts that hide beneath video frames, which can not be "seen" directly by image processing. In this work we demonstrate that time sequence semantics can be exploited to guide unsupervised re-calibration of the event detection system. We present an instantiation of our ideas by using an appliance as an example--Coffee Pot level detection based on video data--to show that semantics can guide the re-calibration of the detection model. This work exploits time sequence semantics to detect when re-calibration is required to automatically relearn a new detection model for the newly evolved system state and to resume monitoring with a higher rate of accuracy.

  18. Parallel State Space Construction for a Model Checking Based on Maximality Semantics

    NASA Astrophysics Data System (ADS)

    El Abidine Bouneb, Zine; Saīdouni, Djamel Eddine

    2009-03-01

    The main limiting factor of the model checker integrated in the concurrency verification environment FOCOVE [1, 2], which use the maximality based labeled transition system (noted MLTS) as a true concurrency model[3, 4], is currently the amount of available physical memory. Many techniques have been developed to reduce the size of a state space. An interesting technique among them is the alpha equivalence reduction. Distributed memory execution environment offers yet another choice. The main contribution of the paper is to show that the parallel state space construction algorithm proposed in [5], which is based on interleaving semantics using LTS as semantic model, may be adapted easily to the distributed implementation of the alpha equivalence reduction for the maximality based labeled transition systems.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rios Velazquez, E; Parmar, C; Narayan, V

    Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less

  20. Convergence of semantics and emotional expression within the IFG pars orbitalis.

    PubMed

    Belyk, Michel; Brown, Steven; Lim, Jessica; Kotz, Sonja A

    2017-08-01

    Humans communicate through a combination of linguistic and emotional channels, including propositional speech, writing, sign language, music, but also prosodic, facial, and gestural expression. These channels can be interpreted separately or they can be integrated to multimodally convey complex meanings. Neural models of the perception of semantics and emotion include nodes for both functions in the inferior frontal gyrus pars orbitalis (IFGorb). However, it is not known whether this convergence involves a common functional zone or instead specialized subregions that process semantics and emotion separately. To address this, we performed Kernel Density Estimation meta-analyses of published neuroimaging studies of the perception of semantics or emotion that reported activation in the IFGorb. The results demonstrated that the IFGorb contains two zones with distinct functional profiles. A lateral zone, situated immediately ventral to Broca's area, was implicated in both semantics and emotion. Another zone, deep within the ventral frontal operculum, was engaged almost exclusively by studies of emotion. Follow-up analysis using Meta-Analytic Connectivity Modeling demonstrated that both zones were frequently co-activated with a common network of sensory, motor, and limbic structures, although the lateral zone had a greater association with prefrontal cortical areas involved in executive function. The status of the lateral IFGorb as a point of convergence between the networks for processing semantic and emotional content across modalities of communication is intriguing since this structure is preserved across primates with limited semantic abilities. Hence, the IFGorb may have initially evolved to support the comprehension of emotional signals, being later co-opted to support semantic communication in humans by forming new connections with brain regions that formed the human semantic network. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Lexicality Effects in Word and Nonword Recall of Semantic Dementia and Progressive Nonfluent Aphasia

    PubMed Central

    Reilly, Jamie; Troche, Joshua; Chatel, Alison; Park, Hyejin; Kalinyak-Fliszar, Michelene; Antonucci, Sharon M.; Martin, Nadine

    2012-01-01

    Background Verbal working memory is an essential component of many language functions, including sentence comprehension and word learning. As such, working memory has emerged as a domain of intense research interest both in aphasiology and in the broader field of cognitive neuroscience. The integrity of verbal working memory encoding relies on a fluid interaction between semantic and phonological processes. That is, we encode verbal detail using many cues related to both the sound and meaning of words. Lesion models can provide an effective means of parsing the contributions of phonological or semantic impairment to recall performance. Methods and Procedures We employed the lesion model approach here by contrasting the nature of lexicality errors incurred during recall of word and nonword sequences by 3individuals with progressive nonfluent aphasia (a phonological dominant impairment) compared to that of 2 individuals with semantic dementia (a semantic dominant impairment). We focused on psycholinguistic attributes of correctly recalled stimuli relative to those that elicited a lexicality error (i.e., nonword → word OR word → nonword). Outcomes and results Patients with semantic dementia showed greater sensitivity to phonological attributes (e.g., phoneme length, wordlikeness) of the target items relative to semantic attributes (e.g., familiarity). Patients with PNFA showed the opposite pattern, marked by sensitivity to word frequency, age of acquisition, familiarity, and imageability. Conclusions We interpret these results in favor of a processing strategy such that in the context of a focal phonological impairment patients revert to an over-reliance on preserved semantic processing abilities. In contrast, a focal semantic impairment forces both reliance upon and hypersensitivity to phonological attributes of target words. We relate this interpretation to previous hypotheses about the nature of verbal short-term memory in progressive aphasia. PMID:23486736

  2. Interests diffusion on a semantic multiplex. Comparing Computer Science and American Physical Society communities

    NASA Astrophysics Data System (ADS)

    D'Agostino, Gregorio; De Nicola, Antonio

    2016-10-01

    Exploiting the information about members of a Social Network (SN) represents one of the most attractive and dwelling subjects for both academic and applied scientists. The community of Complexity Science and especially those researchers working on multiplex social systems are devoting increasing efforts to outline general laws, models, and theories, to the purpose of predicting emergent phenomena in SN's (e.g. success of a product). On the other side the semantic web community aims at engineering a new generation of advanced services tailored to specific people needs. This implies defining constructs, models and methods for handling the semantic layer of SNs. We combined models and techniques from both the former fields to provide a hybrid approach to understand a basic (yet complex) phenomenon: the propagation of individual interests along the social networks. Since information may move along different social networks, one should take into account a multiplex structure. Therefore we introduced the notion of "Semantic Multiplex". In this paper we analyse two different semantic social networks represented by authors publishing in the Computer Science and those in the American Physical Society Journals. The comparison allows to outline common and specific features.

  3. A development framework for semantically interoperable health information systems.

    PubMed

    Lopez, Diego M; Blobel, Bernd G M E

    2009-02-01

    Semantic interoperability is a basic challenge to be met for new generations of distributed, communicating and co-operating health information systems (HIS) enabling shared care and e-Health. Analysis, design, implementation and maintenance of such systems and intrinsic architectures have to follow a unified development methodology. The Generic Component Model (GCM) is used as a framework for modeling any system to evaluate and harmonize state of the art architecture development approaches and standards for health information systems as well as to derive a coherent architecture development framework for sustainable, semantically interoperable HIS and their components. The proposed methodology is based on the Rational Unified Process (RUP), taking advantage of its flexibility to be configured for integrating other architectural approaches such as Service-Oriented Architecture (SOA), Model-Driven Architecture (MDA), ISO 10746, and HL7 Development Framework (HDF). Existing architectural approaches have been analyzed, compared and finally harmonized towards an architecture development framework for advanced health information systems. Starting with the requirements for semantic interoperability derived from paradigm changes for health information systems, and supported in formal software process engineering methods, an appropriate development framework for semantically interoperable HIS has been provided. The usability of the framework has been exemplified in a public health scenario.

  4. Distributed semantic networks and CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Rodriguez, Tony

    1991-01-01

    Semantic networks of frames are commonly used as a method of reasoning in many problems. In most of these applications the semantic network exists as a single entity in a single process environment. Advances in workstation hardware provide support for more sophisticated applications involving multiple processes, interacting in a distributed environment. In these applications the semantic network may well be distributed over several concurrently executing tasks. This paper describes the design and implementation of a frame based, distributed semantic network in which frames are accessed both through C Language Integrated Production System (CLIPS) expert systems and procedural C++ language programs. The application area is a knowledge based, cooperative decision making model utilizing both rule based and procedural experts.

  5. Integrated Semantics Service Platform for the Internet of Things: A Case Study of a Smart Office

    PubMed Central

    Ryu, Minwoo; Kim, Jaeho; Yun, Jaeseok

    2015-01-01

    The Internet of Things (IoT) allows machines and devices in the world to connect with each other and generate a huge amount of data, which has a great potential to provide useful knowledge across service domains. Combining the context of IoT with semantic technologies, we can build integrated semantic systems to support semantic interoperability. In this paper, we propose an integrated semantic service platform (ISSP) to support ontological models in various IoT-based service domains of a smart city. In particular, we address three main problems for providing integrated semantic services together with IoT systems: semantic discovery, dynamic semantic representation, and semantic data repository for IoT resources. To show the feasibility of the ISSP, we develop a prototype service for a smart office using the ISSP, which can provide a preset, personalized office environment by interpreting user text input via a smartphone. We also discuss a scenario to show how the ISSP-based method would help build a smart city, where services in each service domain can discover and exploit IoT resources that are wanted across domains. We expect that our method could eventually contribute to providing people in a smart city with more integrated, comprehensive services based on semantic interoperability. PMID:25608216

  6. Integrated semantics service platform for the Internet of Things: a case study of a smart office.

    PubMed

    Ryu, Minwoo; Kim, Jaeho; Yun, Jaeseok

    2015-01-19

    The Internet of Things (IoT) allows machines and devices in the world to connect with each other and generate a huge amount of data, which has a great potential to provide useful knowledge across service domains. Combining the context of IoT with semantic technologies, we can build integrated semantic systems to support semantic interoperability. In this paper, we propose an integrated semantic service platform (ISSP) to support ontological models in various IoT-based service domains of a smart city. In particular, we address three main problems for providing integrated semantic services together with IoT systems: semantic discovery, dynamic semantic representation, and semantic data repository for IoT resources. To show the feasibility of the ISSP, we develop a prototype service for a smart office using the ISSP, which can provide a preset, personalized office environment by interpreting user text input via a smartphone. We also discuss a scenario to show how the ISSP-based method would help build a smart city, where services in each service domain can discover and exploit IoT resources that are wanted across domains. We expect that our method could eventually contribute to providing people in a smart city with more integrated, comprehensive services based on semantic interoperability.

  7. Modeling of cell signaling pathways in macrophages by semantic networks

    PubMed Central

    Hsing, Michael; Bellenson, Joel L; Shankey, Conor; Cherkasov, Artem

    2004-01-01

    Background Substantial amounts of data on cell signaling, metabolic, gene regulatory and other biological pathways have been accumulated in literature and electronic databases. Conventionally, this information is stored in the form of pathway diagrams and can be characterized as highly "compartmental" (i.e. individual pathways are not connected into more general networks). Current approaches for representing pathways are limited in their capacity to model molecular interactions in their spatial and temporal context. Moreover, the critical knowledge of cause-effect relationships among signaling events is not reflected by most conventional approaches for manipulating pathways. Results We have applied a semantic network (SN) approach to develop and implement a model for cell signaling pathways. The semantic model has mapped biological concepts to a set of semantic agents and relationships, and characterized cell signaling events and their participants in the hierarchical and spatial context. In particular, the available information on the behaviors and interactions of the PI3K enzyme family has been integrated into the SN environment and a cell signaling network in human macrophages has been constructed. A SN-application has been developed to manipulate the locations and the states of molecules and to observe their actions under different biological scenarios. The approach allowed qualitative simulation of cell signaling events involving PI3Ks and identified pathways of molecular interactions that led to known cellular responses as well as other potential responses during bacterial invasions in macrophages. Conclusions We concluded from our results that the semantic network is an effective method to model cell signaling pathways. The semantic model allows proper representation and integration of information on biological structures and their interactions at different levels. The reconstruction of the cell signaling network in the macrophage allowed detailed investigation of connections among various essential molecules and reflected the cause-effect relationships among signaling events. The simulation demonstrated the dynamics of the semantic network, where a change of states on a molecule can alter its function and potentially cause a chain-reaction effect in the system. PMID:15494071

  8. Supporting Semantic Annotation of Educational Content by Automatic Extraction of Hierarchical Domain Relationships

    ERIC Educational Resources Information Center

    Vrablecová, Petra; Šimko, Marián

    2016-01-01

    The domain model is an essential part of an adaptive learning system. For each educational course, it involves educational content and semantics, which is also viewed as a form of conceptual metadata about educational content. Due to the size of a domain model, manual domain model creation is a challenging and demanding task for teachers or…

  9. Impact of Machine-Translated Text on Entity and Relationship Extraction

    DTIC Science & Technology

    2014-12-01

    20 1 1. Introduction Using social network analysis tools is an important asset in...semantic modeling software to automatically build detailed network models from unstructured text. Contour imports unstructured text and then maps the text...onto an existing ontology of frames at the sentence level, using FrameNet, a structured language model, and through Semantic Role Labeling ( SRL

  10. A Metadata Model for E-Learning Coordination through Semantic Web Languages

    ERIC Educational Resources Information Center

    Elci, Atilla

    2005-01-01

    This paper reports on a study aiming to develop a metadata model for e-learning coordination based on semantic web languages. A survey of e-learning modes are done initially in order to identify content such as phases, activities, data schema, rules and relations, etc. relevant for a coordination model. In this respect, the study looks into the…

  11. Computational Modeling of Reading in Semantic Dementia: Comment on Woollams, Lambon Ralph, Plaut, and Patterson (2007)

    ERIC Educational Resources Information Center

    Coltheart, Max; Tree, Jeremy J.; Saunders, Steven J.

    2010-01-01

    Woollams, Lambon Ralph, Plaut, and Patterson (see record 2007-05396-004) reported detailed data on reading in 51 cases of semantic dementia. They simulated some aspects of these data using a connectionist parallel distributed processing (PDP) triangle model of reading. We argue here that a different model of reading, the dual route cascaded (DRC)…

  12. SemantEco: a semantically powered modular architecture for integrating distributed environmental and ecological data

    USGS Publications Warehouse

    Patton, Evan W.; Seyed, Patrice; Wang, Ping; Fu, Linyun; Dein, F. Joshua; Bristol, R. Sky; McGuinness, Deborah L.

    2014-01-01

    We aim to inform the development of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of linked open data. In previous work, we designed and implemented a semantically enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. Our previous system included foundational ontologies to support environmental regulation violations and relevant human health effects. In this work, we discuss SemantEco’s new architecture that supports modular extensions and makes it easier to support additional domains. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for more holistically examining the effects of environmental pollution on ecosystems. We conclude with a discussion of how, through the application of semantic technologies, modular designs will make it easier for resource managers to bring in new sources of data to support more complex use cases.

  13. Combinatorial semantics strengthens angular-anterior temporal coupling.

    PubMed

    Molinaro, Nicola; Paz-Alonso, Pedro M; Duñabeitia, Jon Andoni; Carreiras, Manuel

    2015-04-01

    The human semantic combinatorial system allows us to create a wide number of new meanings from a finite number of existing representations. The present study investigates the neural dynamics underlying the semantic processing of different conceptual constructions based on predictions from previous neuroanatomical models of the semantic processing network. In two experiments, participants read sentences for comprehension containing noun-adjective pairs in three different conditions: prototypical (Redundant), nonsense (Anomalous) and low-typical but composable (Contrastive). In Experiment 1 we examined the processing costs associated to reading these sentences and found a processing dissociation between Anomalous and Contrastive word pairs, compared to prototypical (Redundant) stimuli. In Experiment 2, functional connectivity results showed strong co-activation across conditions between inferior frontal gyrus (IFG) and posterior middle temporal gyrus (MTG), as well as between these two regions and middle frontal gyrus (MFG), anterior temporal cortex (ATC) and fusiform gyrus (FG), consistent with previous neuroanatomical models. Importantly, processing of low-typical (but composable) meanings relative to prototypical and anomalous constructions was associated with a stronger positive coupling between ATC and angular gyrus (AG). Our results underscore the critical role of IFG-MTG co-activation during semantic processing and how other relevant nodes within the semantic processing network come into play to handle visual-orthographic information, to maintain multiple lexical-semantic representations in working memory and to combine existing representations while creatively constructing meaning. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Convergence of Health Level Seven Version 2 Messages to Semantic Web Technologies for Software-Intensive Systems in Telemedicine Trauma Care.

    PubMed

    Menezes, Pedro Monteiro; Cook, Timothy Wayne; Cavalini, Luciana Tricai

    2016-01-01

    To present the technical background and the development of a procedure that enriches the semantics of Health Level Seven version 2 (HL7v2) messages for software-intensive systems in telemedicine trauma care. This study followed a multilevel model-driven approach for the development of semantically interoperable health information systems. The Pre-Hospital Trauma Life Support (PHTLS) ABCDE protocol was adopted as the use case. A prototype application embedded the semantics into an HL7v2 message as an eXtensible Markup Language (XML) file, which was validated against an XML schema that defines constraints on a common reference model. This message was exchanged with a second prototype application, developed on the Mirth middleware, which was also used to parse and validate both the original and the hybrid messages. Both versions of the data instance (one pure XML, one embedded in the HL7v2 message) were equally validated and the RDF-based semantics recovered by the receiving side of the prototype from the shared XML schema. This study demonstrated the semantic enrichment of HL7v2 messages for intensive-software telemedicine systems for trauma care, by validating components of extracts generated in various computing environments. The adoption of the method proposed in this study ensures the compliance of the HL7v2 standard in Semantic Web technologies.

  15. Lessons learned in detailed clinical modeling at Intermountain Healthcare

    PubMed Central

    Oniki, Thomas A; Coyle, Joseph F; Parker, Craig G; Huff, Stanley M

    2014-01-01

    Background and objective Intermountain Healthcare has a long history of using coded terminology and detailed clinical models (DCMs) to govern storage of clinical data to facilitate decision support and semantic interoperability. The latest iteration of DCMs at Intermountain is called the clinical element model (CEM). We describe the lessons learned from our CEM efforts with regard to subjective decisions a modeler frequently needs to make in creating a CEM. We present insights and guidelines, but also describe situations in which use cases conflict with the guidelines. We propose strategies that can help reconcile the conflicts. The hope is that these lessons will be helpful to others who are developing and maintaining DCMs in order to promote sharing and interoperability. Methods We have used the Clinical Element Modeling Language (CEML) to author approximately 5000 CEMs. Results Based on our experience, we have formulated guidelines to lead our modelers through the subjective decisions they need to make when authoring models. Reported here are guidelines regarding precoordination/postcoordination, dividing content between the model and the terminology, modeling logical attributes, and creating iso-semantic models. We place our lessons in context, exploring the potential benefits of an implementation layer, an iso-semantic modeling framework, and ontologic technologies. Conclusions We assert that detailed clinical models can advance interoperability and sharing, and that our guidelines, an implementation layer, and an iso-semantic framework will support our progress toward that goal. PMID:24993546

  16. Semantic Data Integration and Knowledge Management to Represent Biological Network Associations.

    PubMed

    Losko, Sascha; Heumann, Klaus

    2017-01-01

    The vast quantities of information generated by academic and industrial research groups are reflected in a rapidly growing body of scientific literature and exponentially expanding resources of formalized data, including experimental data, originating from a multitude of "-omics" platforms, phenotype information, and clinical data. For bioinformatics, the challenge remains to structure this information so that scientists can identify relevant information, to integrate this information as specific "knowledge bases," and to formalize this knowledge across multiple scientific domains to facilitate hypothesis generation and validation. Here we report on progress made in building a generic knowledge management environment capable of representing and mining both explicit and implicit knowledge and, thus, generating new knowledge. Risk management in drug discovery and clinical research is used as a typical example to illustrate this approach. In this chapter we introduce techniques and concepts (such as ontologies, semantic objects, typed relationships, contexts, graphs, and information layers) that are used to represent complex biomedical networks. The BioXM™ Knowledge Management Environment is used as an example to demonstrate how a domain such as oncology is represented and how this representation is utilized for research.

  17. NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources

    PubMed Central

    Jonquet, Clement; LePendu, Paea; Falconer, Sean; Coulet, Adrien; Noy, Natalya F.; Musen, Mark A.; Shah, Nigam H.

    2011-01-01

    The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index—a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics “under the hood.” PMID:21918645

  18. NCBO Resource Index: Ontology-Based Search and Mining of Biomedical Resources.

    PubMed

    Jonquet, Clement; Lependu, Paea; Falconer, Sean; Coulet, Adrien; Noy, Natalya F; Musen, Mark A; Shah, Nigam H

    2011-09-01

    The volume of publicly available data in biomedicine is constantly increasing. However, these data are stored in different formats and on different platforms. Integrating these data will enable us to facilitate the pace of medical discoveries by providing scientists with a unified view of this diverse information. Under the auspices of the National Center for Biomedical Ontology (NCBO), we have developed the Resource Index-a growing, large-scale ontology-based index of more than twenty heterogeneous biomedical resources. The resources come from a variety of repositories maintained by organizations from around the world. We use a set of over 200 publicly available ontologies contributed by researchers in various domains to annotate the elements in these resources. We use the semantics that the ontologies encode, such as different properties of classes, the class hierarchies, and the mappings between ontologies, in order to improve the search experience for the Resource Index user. Our user interface enables scientists to search the multiple resources quickly and efficiently using domain terms, without even being aware that there is semantics "under the hood."

  19. Bim-Gis Integrated Geospatial Information Model Using Semantic Web and Rdf Graphs

    NASA Astrophysics Data System (ADS)

    Hor, A.-H.; Jadidi, A.; Sohn, G.

    2016-06-01

    In recent years, 3D virtual indoor/outdoor urban modelling becomes a key spatial information framework for many civil and engineering applications such as evacuation planning, emergency and facility management. For accomplishing such sophisticate decision tasks, there is a large demands for building multi-scale and multi-sourced 3D urban models. Currently, Building Information Model (BIM) and Geographical Information Systems (GIS) are broadly used as the modelling sources. However, data sharing and exchanging information between two modelling domains is still a huge challenge; while the syntactic or semantic approaches do not fully provide exchanging of rich semantic and geometric information of BIM into GIS or vice-versa. This paper proposes a novel approach for integrating BIM and GIS using semantic web technologies and Resources Description Framework (RDF) graphs. The novelty of the proposed solution comes from the benefits of integrating BIM and GIS technologies into one unified model, so-called Integrated Geospatial Information Model (IGIM). The proposed approach consists of three main modules: BIM-RDF and GIS-RDF graphs construction, integrating of two RDF graphs, and query of information through IGIM-RDF graph using SPARQL. The IGIM generates queries from both the BIM and GIS RDF graphs resulting a semantically integrated model with entities representing both BIM classes and GIS feature objects with respect to the target-client application. The linkage between BIM-RDF and GIS-RDF is achieved through SPARQL endpoints and defined by a query using set of datasets and entity classes with complementary properties, relationships and geometries. To validate the proposed approach and its performance, a case study was also tested using IGIM system design.

  20. CityGML - Interoperable semantic 3D city models

    NASA Astrophysics Data System (ADS)

    Gröger, Gerhard; Plümer, Lutz

    2012-07-01

    CityGML is the international standard of the Open Geospatial Consortium (OGC) for the representation and exchange of 3D city models. It defines the three-dimensional geometry, topology, semantics and appearance of the most relevant topographic objects in urban or regional contexts. These definitions are provided in different, well-defined Levels-of-Detail (multiresolution model). The focus of CityGML is on the semantical aspects of 3D city models, its structures, taxonomies and aggregations, allowing users to employ virtual 3D city models for advanced analysis and visualization tasks in a variety of application domains such as urban planning, indoor/outdoor pedestrian navigation, environmental simulations, cultural heritage, or facility management. This is in contrast to purely geometrical/graphical models such as KML, VRML, or X3D, which do not provide sufficient semantics. CityGML is based on the Geography Markup Language (GML), which provides a standardized geometry model. Due to this model and its well-defined semantics and structures, CityGML facilitates interoperable data exchange in the context of geo web services and spatial data infrastructures. Since its standardization in 2008, CityGML has become used on a worldwide scale: tools from notable companies in the geospatial field provide CityGML interfaces. Many applications and projects use this standard. CityGML is also having a strong impact on science: numerous approaches use CityGML, particularly its semantics, for disaster management, emergency responses, or energy-related applications as well as for visualizations, or they contribute to CityGML, improving its consistency and validity, or use CityGML, particularly its different Levels-of-Detail, as a source or target for generalizations. This paper gives an overview of CityGML, its underlying concepts, its Levels-of-Detail, how to extend it, its applications, its likely future development, and the role it plays in scientific research. Furthermore, its relationship to other standards from the fields of computer graphics and computer-aided architectural design and to the prospective INSPIRE model are discussed, as well as the impact CityGML has and is having on the software industry, on applications of 3D city models, and on science generally.

  1. Modulation of the inter-hemispheric processing of semantic information during normal aging. A divided visual field experiment.

    PubMed

    Hoyau, E; Cousin, E; Jaillard, A; Baciu, M

    2016-12-01

    We evaluated the effect of normal aging on the inter-hemispheric processing of semantic information by using the divided visual field (DVF) method, with words and pictures. Two main theoretical models have been considered, (a) the HAROLD model which posits that aging is associated with supplementary recruitment of the right hemisphere (RH) and decreased hemispheric specialization, and (b) the RH decline theory, which assumes that the RH becomes less efficient with aging, associated with increased LH specialization. Two groups of subjects were examined, a Young Group (YG) and an Old Group (OG), while participants performed a semantic categorization task (living vs. non-living) in words and pictures. The DVF was realized in two steps: (a) unilateral DVF presentation with stimuli presented separately in each visual field, left or right, allowing for their initial processing by only one hemisphere, right or left, respectively; (b) bilateral DVF presentation (BVF) with stimuli presented simultaneously in both visual fields, followed by their processing by both hemispheres. These two types of presentation permitted the evaluation of two main characteristics of the inter-hemispheric processing of information, the hemispheric specialization (HS) and the inter-hemispheric cooperation (IHC). Moreover, the BVF allowed determining the driver-hemisphere for processing information presented in BVF. Results obtained in OG indicated that: (a) semantic categorization was performed as accurately as YG, even if more slowly, (b) a non-semantic RH decline was observed, and (c) the LH controls the semantic processing during the BVF, suggesting an increased role of the LH in aging. However, despite the stronger involvement of the LH in OG, the RH is not completely devoid of semantic abilities. As discussed in the paper, neither the HAROLD nor the RH decline does fully explain this pattern of results. We rather suggest that the effect of aging on the hemispheric specialization and inter-hemispheric cooperation during semantic processing is explained not by only one model, but by an interaction between several complementary mechanisms and models. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. An approach to development of ontological knowledge base in the field of scientific and research activity in Russia

    NASA Astrophysics Data System (ADS)

    Murtazina, M. Sh; Avdeenko, T. V.

    2018-05-01

    The state of art and the progress in application of semantic technologies in the field of scientific and research activity have been analyzed. Even elementary empirical comparison has shown that the semantic search engines are superior in all respects to conventional search technologies. However, semantic information technologies are insufficiently used in the field of scientific and research activity in Russia. In present paper an approach to construction of ontological model of knowledge base is proposed. The ontological model is based on the upper-level ontology and the RDF mechanism for linking several domain ontologies. The ontological model is implemented in the Protégé environment.

  3. A Case Study on Sepsis Using PubMed and Deep Learning for Ontology Learning.

    PubMed

    Arguello Casteleiro, Mercedes; Maseda Fernandez, Diego; Demetriou, George; Read, Warren; Fernandez Prieto, Maria Jesus; Des Diz, Julio; Nenadic, Goran; Keane, John; Stevens, Robert

    2017-01-01

    We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision.

  4. Process mining routinely collected electronic health records to define real-life clinical pathways during chemotherapy.

    PubMed

    Baker, Karl; Dunwoodie, Elaine; Jones, Richard G; Newsham, Alex; Johnson, Owen; Price, Christopher P; Wolstenholme, Jane; Leal, Jose; McGinley, Patrick; Twelves, Chris; Hall, Geoff

    2017-07-01

    There is growing interest in the use of routinely collected electronic health records to enhance service delivery and facilitate clinical research. It should be possible to detect and measure patterns of care and use the data to monitor improvements but there are methodological and data quality challenges. Driven by the desire to model the impact of a patient self-test blood count monitoring service in patients on chemotherapy, we aimed to (i) establish reproducible methods of process-mining electronic health records, (ii) use the outputs derived to define and quantify patient pathways during chemotherapy, and (iii) to gather robust data which is structured to be able to inform a cost-effectiveness decision model of home monitoring of neutropenic status during chemotherapy. Electronic Health Records at a UK oncology centre were included if they had (i) a diagnosis of metastatic breast cancer and received adjuvant epirubicin and cyclosphosphamide chemotherapy or (ii) colorectal cancer and received palliative oxaliplatin and infusional 5-fluorouracil chemotherapy, and (iii) were first diagnosed with cancer between January 2004 and February 2013. Software and a Markov model were developed, producing a schematic of patient pathways during chemotherapy. Significant variance from the assumed care pathway was evident from the data. Of the 535 patients with breast cancer and 420 with colorectal cancer there were 474 and 329 pathway variants respectively. Only 27 (5%) and 26 (6%) completed the planned six cycles of chemotherapy without having unplanned hospital contact. Over the six cycles, 169 (31.6%) patients with breast cancer and 190 (45.2%) patients with colorectal cancer were admitted to hospital. The pathways of patients on chemotherapy are complex. An iterative approach to addressing semantic and data quality issues enabled the effective use of routinely collected patient records to produce accurate models of the real-life experiences of chemotherapy patients and generate clinically useful information. Very few patients experience the idealised patient pathway that is used to plan their care. A better understanding of real-life clinical pathways through process mining can contribute to care and data quality assurance, identifying unmet needs, facilitating quantification of innovation impact, communicating with stakeholders, and ultimately improving patient care and outcomes. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Fine-coarse semantic processing in schizophrenia: a reversed pattern of hemispheric dominance.

    PubMed

    Zeev-Wolf, Maor; Goldstein, Abraham; Levkovitz, Yechiel; Faust, Miriam

    2014-04-01

    Left lateralization for language processing is a feature of neurotypical brains. In individuals with schizophrenia, lack of left lateralization is associated with the language impairments manifested in this population. Beeman׳s fine-coarse semantic coding model asserts left hemisphere specialization in fine (i.e., conventionalized) semantic coding and right hemisphere specialization in coarse (i.e., non-conventionalized) semantic coding. Applying this model to schizophrenia would suggest that language impairments in this population are a result of greater reliance on coarse semantic coding. We investigated this hypothesis and examined whether a reversed pattern of hemispheric involvement in fine-coarse semantic coding along the time course of activation could be detected in individuals with schizophrenia. Seventeen individuals with schizophrenia and 30 neurotypical participants were presented with two word expressions of four types: literal, conventional metaphoric, unrelated (exemplars of fine semantic coding) and novel metaphoric (an exemplar of coarse semantic coding). Expressions were separated by either a short (250 ms) or long (750 ms) delay. Findings indicate that whereas during novel metaphor processing, controls displayed a left hemisphere advantage at 250 ms delay and right hemisphere advantage at 750 ms, individuals with schizophrenia displayed the opposite. For conventional metaphoric and unrelated expressions, controls showed left hemisphere advantage across times, while individuals with schizophrenia showed a right hemisphere advantage. Furthermore, whereas individuals with schizophrenia were less accurate than control at judging literal, conventional metaphoric and unrelated expressions they were more accurate when judging novel metaphors. Results suggest that individuals with schizophrenia display a reversed pattern of lateralization for semantic coding which causes them to rely more heavily on coarse semantic coding. Thus, for individuals with schizophrenia, speech situation are always non-conventional, compelling them to constantly seek for meanings and prejudicing them toward novel or atypical speech acts. This, in turn, may disadvantage them in conventionalized communication and result in language impairment. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Semi-automated ontology generation and evolution

    NASA Astrophysics Data System (ADS)

    Stirtzinger, Anthony P.; Anken, Craig S.

    2009-05-01

    Extending the notion of data models or object models, ontology can provide rich semantic definition not only to the meta-data but also to the instance data of domain knowledge, making these semantic definitions available in machine readable form. However, the generation of an effective ontology is a difficult task involving considerable labor and skill. This paper discusses an Ontology Generation and Evolution Processor (OGEP) aimed at automating this process, only requesting user input when un-resolvable ambiguous situations occur. OGEP directly attacks the main barrier which prevents automated (or self learning) ontology generation: the ability to understand the meaning of artifacts and the relationships the artifacts have to the domain space. OGEP leverages existing lexical to ontological mappings in the form of WordNet, and Suggested Upper Merged Ontology (SUMO) integrated with a semantic pattern-based structure referred to as the Semantic Grounding Mechanism (SGM) and implemented as a Corpus Reasoner. The OGEP processing is initiated by a Corpus Parser performing a lexical analysis of the corpus, reading in a document (or corpus) and preparing it for processing by annotating words and phrases. After the Corpus Parser is done, the Corpus Reasoner uses the parts of speech output to determine the semantic meaning of a word or phrase. The Corpus Reasoner is the crux of the OGEP system, analyzing, extrapolating, and evolving data from free text into cohesive semantic relationships. The Semantic Grounding Mechanism provides a basis for identifying and mapping semantic relationships. By blending together the WordNet lexicon and SUMO ontological layout, the SGM is given breadth and depth in its ability to extrapolate semantic relationships between domain entities. The combination of all these components results in an innovative approach to user assisted semantic-based ontology generation. This paper will describe the OGEP technology in the context of the architectural components referenced above and identify a potential technology transition path to Scott AFB's Tanker Airlift Control Center (TACC) which serves as the Air Operations Center (AOC) for the Air Mobility Command (AMC).

  7. BioProspecting: novel marker discovery obtained by mining the bibleome.

    PubMed

    Elkin, Peter L; Tuttle, Mark S; Trusko, Brett E; Brown, Steven H

    2009-02-05

    BioProspecting is a novel approach that enabled our team to mine data related to genetic markers from the New England Journal of Medicine (NEJM) utilizing SNOMED CT and the Human Gene Onotology (HUGO). The Biomedical Informatics Research Collaborative was able to link genes and disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) and natural language processing engine, whose output creates an ontology-network using the semantic encodings of the literature that is organized by these two terminologies. We identified relationships between (genes or proteins) and (diseases or drugs) as linked by metabolic functions and identified potentially novel functional relationships between, for example, genes and diseases (e.g. Article #1 ([Gene - IL27] = > {Enzyme - Dipeptidyl Carboxypeptidase 1}) and Article #2 ({Enzyme - Dipeptidyl Carboxypeptidase 1} < = [Disorder - Type II DM]) showing a metabolic link between IL27 and Type II DM). In this manuscript we describe our method for developing the database and its content as well as its potential to assist in the discovery of novel markers and drugs.

  8. The Effect of Normalization in Violence Video Classification Performance

    NASA Astrophysics Data System (ADS)

    Ali, Ashikin; Senan, Norhalina

    2017-08-01

    Basically, data pre-processing is an important part of data mining. Normalization is a pre-processing stage for any type of problem statement, especially in video classification. Challenging problems that arises in video classification is because of the heterogeneous content, large variations in video quality and complex semantic meanings of the concepts involved. Therefore, to regularize this problem, it is thoughtful to ensure normalization or basically involvement of thorough pre-processing stage aids the robustness of classification performance. This process is to scale all the numeric variables into certain range to make it more meaningful for further phases in available data mining techniques. Thus, this paper attempts to examine the effect of 2 normalization techniques namely Min-max normalization and Z-score in violence video classifications towards the performance of classification rate using Multi-layer perceptron (MLP) classifier. Using Min-Max Normalization range of [0,1] the result shows almost 98% of accuracy, meanwhile Min-Max Normalization range of [-1,1] accuracy is 59% and for Z-score the accuracy is 50%.

  9. Distinct loci of lexical and semantic access deficits in aphasia: Evidence from voxel-based lesion-symptom mapping and diffusion tensor imaging.

    PubMed

    Harvey, Denise Y; Schnur, Tatiana T

    2015-06-01

    Naming pictures and matching words to pictures belonging to the same semantic category negatively affects language production and comprehension. By most accounts, semantic interference arises when accessing lexical representations in naming (e.g., Damian, Vigliocco, & Levelt, 2001) and semantic representations in comprehension (e.g., Forde & Humphreys, 1997). Further, damage to the left inferior frontal gyrus (LIFG), a region implicated in cognitive control, results in increasing semantic interference when items repeat across cycles in both language production and comprehension (Jefferies, Baker, Doran, & Lambon Ralph, 2007). This generates the prediction that the LIFG via white matter connections supports resolution of semantic interference arising from different loci (lexical vs semantic) in the temporal lobe. However, it remains unclear whether the cognitive and neural mechanisms that resolve semantic interference are the same across tasks. Thus, we examined which gray matter structures [using whole brain and region of interest (ROI) approaches] and white matter connections (using deterministic tractography) when damaged impact semantic interference and its increase across cycles when repeatedly producing and understanding words in 15 speakers with varying lexical-semantic deficits from left hemisphere stroke. We found that damage to distinct brain regions, the posterior versus anterior temporal lobe, was associated with semantic interference (collapsed across cycles) in naming and comprehension, respectively. Further, those with LIFG damage compared to those without exhibited marginally larger increases in semantic interference across cycles in naming but not comprehension. Lastly, the inferior fronto-occipital fasciculus, connecting the LIFG with posterior temporal lobe, related to semantic interference in naming, whereas the inferior longitudinal fasciculus (ILF), connecting posterior with anterior temporal regions related to semantic interference in comprehension. These neuroanatomical-behavioral findings have implications for models of the lexical-semantic language network by demonstrating that semantic interference in language production and comprehension involves different representations which differentially recruit a cognitive control mechanism for interference resolution. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Structural Similarities between Brain and Linguistic Data Provide Evidence of Semantic Relations in the Brain

    PubMed Central

    Crangle, Colleen E.; Perreau-Guimaraes, Marcos; Suppes, Patrick

    2013-01-01

    This paper presents a new method of analysis by which structural similarities between brain data and linguistic data can be assessed at the semantic level. It shows how to measure the strength of these structural similarities and so determine the relatively better fit of the brain data with one semantic model over another. The first model is derived from WordNet, a lexical database of English compiled by language experts. The second is given by the corpus-based statistical technique of latent semantic analysis (LSA), which detects relations between words that are latent or hidden in text. The brain data are drawn from experiments in which statements about the geography of Europe were presented auditorily to participants who were asked to determine their truth or falsity while electroencephalographic (EEG) recordings were made. The theoretical framework for the analysis of the brain and semantic data derives from axiomatizations of theories such as the theory of differences in utility preference. Using brain-data samples from individual trials time-locked to the presentation of each word, ordinal relations of similarity differences are computed for the brain data and for the linguistic data. In each case those relations that are invariant with respect to the brain and linguistic data, and are correlated with sufficient statistical strength, amount to structural similarities between the brain and linguistic data. Results show that many more statistically significant structural similarities can be found between the brain data and the WordNet-derived data than the LSA-derived data. The work reported here is placed within the context of other recent studies of semantics and the brain. The main contribution of this paper is the new method it presents for the study of semantics and the brain and the focus it permits on networks of relations detected in brain data and represented by a semantic model. PMID:23799009

  11. Improving Semantic Updating Method on 3d City Models Using Hybrid Semantic-Geometric 3d Segmentation Technique

    NASA Astrophysics Data System (ADS)

    Sharkawi, K.-H.; Abdul-Rahman, A.

    2013-09-01

    Cities and urban areas entities such as building structures are becoming more complex as the modern human civilizations continue to evolve. The ability to plan and manage every territory especially the urban areas is very important to every government in the world. Planning and managing cities and urban areas based on printed maps and 2D data are getting insufficient and inefficient to cope with the complexity of the new developments in big cities. The emergence of 3D city models have boosted the efficiency in analysing and managing urban areas as the 3D data are proven to represent the real world object more accurately. It has since been adopted as the new trend in buildings and urban management and planning applications. Nowadays, many countries around the world have been generating virtual 3D representation of their major cities. The growing interest in improving the usability of 3D city models has resulted in the development of various tools for analysis based on the 3D city models. Today, 3D city models are generated for various purposes such as for tourism, location-based services, disaster management and urban planning. Meanwhile, modelling 3D objects are getting easier with the emergence of the user-friendly tools for 3D modelling available in the market. Generating 3D buildings with high accuracy also has become easier with the availability of airborne Lidar and terrestrial laser scanning equipments. The availability and accessibility to this technology makes it more sensible to analyse buildings in urban areas using 3D data as it accurately represent the real world objects. The Open Geospatial Consortium (OGC) has accepted CityGML specifications as one of the international standards for representing and exchanging spatial data, making it easier to visualize, store and manage 3D city models data efficiently. CityGML able to represents the semantics, geometry, topology and appearance of 3D city models in five well-defined Level-of-Details (LoD), namely LoD0 to LoD4. The accuracy and structural complexity of the 3D objects increases with the LoD level where LoD0 is the simplest LoD (2.5D; Digital Terrain Model (DTM) + building or roof print) while LoD4 is the most complex LoD (architectural details with interior structures). Semantic information is one of the main components in CityGML and 3D City Models, and provides important information for any analyses. However, more often than not, the semantic information is not available for the 3D city model due to the unstandardized modelling process. One of the examples is where a building is normally generated as one object (without specific feature layers such as Roof, Ground floor, Level 1, Level 2, Block A, Block B, etc). This research attempts to develop a method to improve the semantic data updating process by segmenting the 3D building into simpler parts which will make it easier for the users to select and update the semantic information. The methodology is implemented for 3D buildings in LoD2 where the buildings are generated without architectural details but with distinct roof structures. This paper also introduces hybrid semantic-geometric 3D segmentation method that deals with hierarchical segmentation of a 3D building based on its semantic value and surface characteristics, fitted by one of the predefined primitives. For future work, the segmentation method will be implemented as part of the change detection module that can detect any changes on the 3D buildings, store and retrieve semantic information of the changed structure, automatically updates the 3D models and visualize the results in a userfriendly graphical user interface (GUI).

  12. Information Warfare: Evaluation of Operator Information Processing Models

    DTIC Science & Technology

    1997-10-01

    that people can describe or report, including both episodic and semantic information. Declarative memory contains a network of knowledge represented...second dimension corresponds roughly to the distinction between episodic and semantic memory that is commonly made in cognitive psychology. Episodic ...3 is long-term memory for the discourse, a subset of episodic memory . Partition 4 is long-term semantic memory , or the knowledge-base. According to

  13. Towards Automatic Semantic Labelling of 3D City Models

    NASA Astrophysics Data System (ADS)

    Rook, M.; Biljecki, F.; Diakité, A. A.

    2016-10-01

    The lack of semantic information in many 3D city models is a considerable limiting factor in their use, as a lot of applications rely on semantics. Such information is not always available, since it is not collected at all times, it might be lost due to data transformation, or its lack may be caused by non-interoperability in data integration from other sources. This research is a first step in creating an automatic workflow that semantically labels plain 3D city model represented by a soup of polygons, with semantic and thematic information, as defined in the CityGML standard. The first step involves the reconstruction of the topology, which is used in a region growing algorithm that clusters upward facing adjacent triangles. Heuristic rules, embedded in a decision tree, are used to compute a likeliness score for these regions that either represent the ground (terrain) or a RoofSurface. Regions with a high likeliness score, to one of the two classes, are used to create a decision space, which is used in a support vector machine (SVM). Next, topological relations are utilised to select seeds that function as a start in a region growing algorithm, to create regions of triangles of other semantic classes. The topological relationships of the regions are used in the aggregation of the thematic building features. Finally, the level of detail is detected to generate the correct output in CityGML. The results show an accuracy between 85 % and 99 % in the automatic semantic labelling on four different test datasets. The paper is concluded by indicating problems and difficulties implying the next steps in the research.

  14. Semantic policy and adversarial modeling for cyber threat identification and avoidance

    NASA Astrophysics Data System (ADS)

    DeFrancesco, Anton; McQueary, Bruce

    2009-05-01

    Today's enterprise networks undergo a relentless barrage of attacks from foreign and domestic adversaries. These attacks may be perpetrated with little to no funding, but may wreck incalculable damage upon the enterprises security, network infrastructure, and services. As more services come online, systems that were once in isolation now provide information that may be combined dynamically with information from other systems to create new meaning on the fly. Security issues are compounded by the potential to aggregate individual pieces of information and infer knowledge at a higher classification than any of its constituent parts. To help alleviate these challenges, in this paper we introduce the notion of semantic policy and discuss how it's use is evolving from a robust approach to access control to preempting and combating attacks in the cyber domain, The introduction of semantic policy and adversarial modeling to network security aims to ask 'where is the network most vulnerable', 'how is the network being attacked', and 'why is the network being attacked'. The first aspect of our approach is integration of semantic policy into enterprise security to augment traditional network security with an overall awareness of policy access and violations. This awareness allows the semantic policy to look at the big picture - analyzing trends and identifying critical relations in system wide data access. The second aspect of our approach is to couple adversarial modeling with semantic policy to move beyond reactive security measures and into a proactive identification of system weaknesses and areas of vulnerability. By utilizing Bayesian-based methodologies, the enterprise wide meaning of data and semantic policy is applied to probability and high-level risk identification. This risk identification will help mitigate potential harm to enterprise networks by enabling resources to proactively isolate, lock-down, and secure systems that are most vulnerable.

  15. Collocational Processing in Light of the Phraseological Continuum Model: Does Semantic Transparency Matter?

    ERIC Educational Resources Information Center

    Gyllstad, Henrik; Wolter, Brent

    2016-01-01

    The present study investigates whether two types of word combinations (free combinations and collocations) differ in terms of processing by testing Howarth's Continuum Model based on word combination typologies from a phraseological tradition. A visual semantic judgment task was administered to advanced Swedish learners of English (n = 27) and…

  16. An Interactive Multimedia Learning Environment for VLSI Built with COSMOS

    ERIC Educational Resources Information Center

    Angelides, Marios C.; Agius, Harry W.

    2002-01-01

    This paper presents Bigger Bits, an interactive multimedia learning environment that teaches students about VLSI within the context of computer electronics. The system was built with COSMOS (Content Oriented semantic Modelling Overlay Scheme), which is a modelling scheme that we developed for enabling the semantic content of multimedia to be used…

  17. Reciprocal Effects of Self-Regulation, Semantic Knowledge, and Reading Comprehension in Early Elementary School

    ERIC Educational Resources Information Center

    Connor, Carol McDonald; Day, Stephanie L.; Phillips, Beth; Sparapani, Nicole; Ingebrand, Sarah W.; McLean, Leigh; Barrus, Angela; Kaschak, Michael P.

    2016-01-01

    Many assume that cognitive and linguistic processes, such as semantic knowledge (SK) and self-regulation (SR), subserve learned skills like reading. However, complex models of interacting and bootstrapping effects of SK, SR, instruction, and reading hypothesize reciprocal effects. Testing this "lattice" model with children (n = 852)…

  18. Lexical Selection Is Competitive: Evidence from Indirectly Activated Semantic Associates during Picture Naming

    ERIC Educational Resources Information Center

    Melinger, Alissa; Rahman, Rasha Abdel

    2013-01-01

    In this study, we present 3 picture-word interference (PWI) experiments designed to investigate whether lexical selection processes are competitive. We focus on semantic associative relations, which should interfere according to competitive models but not according to certain noncompetitive models. In a modified version of the PWI paradigm,…

  19. Meta-Theoretical Contributions to the Constitution of a Model-Based Didactics of Science

    NASA Astrophysics Data System (ADS)

    Ariza, Yefrin; Lorenzano, Pablo; Adúriz-Bravo, Agustín

    2016-10-01

    There is nowadays consensus in the community of didactics of science (i.e. science education understood as an academic discipline) regarding the need to include the philosophy of science in didactical research, science teacher education, curriculum design, and the practice of science education in all educational levels. Some authors have identified an ever-increasing use of the concept of `theoretical model', stemming from the so-called semantic view of scientific theories. However, it can be recognised that, in didactics of science, there are over-simplified transpositions of the idea of model (and of other meta-theoretical ideas). In this sense, contemporary philosophy of science is often blurred or distorted in the science education literature. In this paper, we address the discussion around some meta-theoretical concepts that are introduced into didactics of science due to their perceived educational value. We argue for the existence of a `semantic family', and we characterise four different versions of semantic views existing within the family. In particular, we seek to contribute to establishing a model-based didactics of science mainly supported in this semantic family.

  20. A Bayesian generative model for learning semantic hierarchies

    PubMed Central

    Mittelman, Roni; Sun, Min; Kuipers, Benjamin; Savarese, Silvio

    2014-01-01

    Building fine-grained visual recognition systems that are capable of recognizing tens of thousands of categories, has received much attention in recent years. The well known semantic hierarchical structure of categories and concepts, has been shown to provide a key prior which allows for optimal predictions. The hierarchical organization of various domains and concepts has been subject to extensive research, and led to the development of the WordNet domains hierarchy (Fellbaum, 1998), which was also used to organize the images in the ImageNet (Deng et al., 2009) dataset, in which the category count approaches the human capacity. Still, for the human visual system, the form of the hierarchy must be discovered with minimal use of supervision or innate knowledge. In this work, we propose a new Bayesian generative model for learning such domain hierarchies, based on semantic input. Our model is motivated by the super-subordinate organization of domain labels and concepts that characterizes WordNet, and accounts for several important challenges: maintaining context information when progressing deeper into the hierarchy, learning a coherent semantic concept for each node, and modeling uncertainty in the perception process. PMID:24904452

  1. Topographic mapping data semantics through data conversion and enhancement: Chapter 7

    USGS Publications Warehouse

    Varanka, Dalia; Carter, Jonathan; Usery, E. Lynn; Shoberg, Thomas; Edited by Ashish, Naveen; Sheth, Amit P.

    2011-01-01

    This paper presents research on the semantics of topographic data for triples and ontologies to blend the capabilities of the Semantic Web and The National Map of the U.S. Geological Survey. Automated conversion of relational topographic data of several geographic sample areas to the triple data model standard resulted in relatively poor semantic associations. Further research employed vocabularies of feature type and spatial relation terms. A user interface was designed to model the capture of non-standard terms relevant to public users and to map those terms to existing data models of The National Map through the use of ontology. Server access for the study area triple stores was made publicly available, illustrating how the development of linked data may transform institutional policies to open government data resources to the public. This paper presents these data conversion and research techniques that were tested as open linked data concepts leveraged through a user-centered interface and open USGS server access to the public.

  2. Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering.

    PubMed

    Endert, A; Fiaux, P; North, C

    2012-12-01

    Visual analytic tools aim to support the cognitively demanding task of sensemaking. Their success often depends on the ability to leverage capabilities of mathematical models, visualization, and human intuition through flexible, usable, and expressive interactions. Spatially clustering data is one effective metaphor for users to explore similarity and relationships between information, adjusting the weighting of dimensions or characteristics of the dataset to observe the change in the spatial layout. Semantic interaction is an approach to user interaction in such spatializations that couples these parametric modifications of the clustering model with users' analytic operations on the data (e.g., direct document movement in the spatialization, highlighting text, search, etc.). In this paper, we present results of a user study exploring the ability of semantic interaction in a visual analytic prototype, ForceSPIRE, to support sensemaking. We found that semantic interaction captures the analytical reasoning of the user through keyword weighting, and aids the user in co-creating a spatialization based on the user's reasoning and intuition.

  3. Using semantic technologies and the OSU ontology for modelling context and activities in multi-sensory surveillance systems

    NASA Astrophysics Data System (ADS)

    Gómez A, Héctor F.; Martínez-Tomás, Rafael; Arias Tapia, Susana A.; Rincón Zamorano, Mariano

    2014-04-01

    Automatic systems that monitor human behaviour for detecting security problems are a challenge today. Previously, our group defined the Horus framework, which is a modular architecture for the integration of multi-sensor monitoring stages. In this work, structure and technologies required for high-level semantic stages of Horus are proposed, and the associated methodological principles established with the aim of recognising specific behaviours and situations. Our methodology distinguishes three semantic levels of events: low level (compromised with sensors), medium level (compromised with context), and high level (target behaviours). The ontology for surveillance and ubiquitous computing has been used to integrate ontologies from specific domains and together with semantic technologies have facilitated the modelling and implementation of scenes and situations by reusing components. A home context and a supermarket context were modelled following this approach, where three suspicious activities were monitored via different virtual sensors. The experiments demonstrate that our proposals facilitate the rapid prototyping of this kind of systems.

  4. Determining similarity of scientific entities in annotation datasets

    PubMed Central

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  5. Determining similarity of scientific entities in annotation datasets.

    PubMed

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ © The Author(s) 2015. Published by Oxford University Press.

  6. Informatics Support for Basic Research in Biomedicine

    PubMed Central

    Rindflesch, Thomas C.; Blake, Catherine L.; Fiszman, Marcelo; Kilicoglu, Halil; Rosemblat, Graciela; Schneider, Jodi; Zeiss, Caroline J.

    2017-01-01

    Abstract Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text. We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call “discovery browsing,” which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details. As a use case, we examine current developments in basic research on mechanisms of Alzheimer’s disease. Out of the nearly 90 000 citations returned by the PubMed query “Alzheimer’s disease,” discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein. PMID:28838071

  7. Archetype-based semantic integration and standardization of clinical data.

    PubMed

    Moner, David; Maldonado, Jose A; Bosca, Diego; Fernandez, Jesualdo T; Angulo, Carlos; Crespo, Pere; Vivancos, Pedro J; Robles, Montserrat

    2006-01-01

    One of the basic needs for any healthcare professional is to be able to access to clinical information of patients in an understandable and normalized way. The lifelong clinical information of any person supported by electronic means configures his/her Electronic Health Record (EHR). This information is usually distributed among several independent and heterogeneous systems that may be syntactically or semantically incompatible. The Dual Model architecture has appeared as a new proposal for maintaining a homogeneous representation of the EHR with a clear separation between information and knowledge. Information is represented by a Reference Model which describes common data structures with minimal semantics. Knowledge is specified by archetypes, which are formal representations of clinical concepts built upon a particular Reference Model. This kind of architecture is originally thought for implantation of new clinical information systems, but archetypes can be also used for integrating data of existing and not normalized systems, adding at the same time a semantic meaning to the integrated data. In this paper we explain the possible use of a Dual Model approach for semantic integration and standardization of heterogeneous clinical data sources and present LinkEHR-Ed, a tool for developing archetypes as elements for integration purposes. LinkEHR-Ed has been designed to be easily used by the two main participants of the creation process of archetypes for clinical data integration: the Health domain expert and the Information Technologies domain expert.

  8. Extracting semantically enriched events from biomedical literature

    PubMed Central

    2012-01-01

    Background Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Results Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP’09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP’09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. Conclusions We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare. PMID:22621266

  9. Extracting semantically enriched events from biomedical literature.

    PubMed

    Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia

    2012-05-23

    Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.

  10. The influence of speech rate and accent on access and use of semantic information.

    PubMed

    Sajin, Stanislav M; Connine, Cynthia M

    2017-04-01

    Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.

  11. Semantic transparency affects morphological priming . . . eventually.

    PubMed

    Heyer, Vera; Kornishova, Dana

    2018-05-01

    Semantic transparency has been in the focus of psycholinguistic research for decades, with the controversy about the time course of the application of morpho-semantic information during the processing of morphologically complex words not yet resolved. This study reports two masked priming studies with English - ness and Russian - ost' nominalisations, investigating how semantic transparency modulates native speakers' morphological priming effects at short and long stimulus onset asynchronies (SOAs). In both languages, we found increased morphological priming for nominalisations at the transparent end of the scale (e.g. paleness - pale) in comparison to items at the opaque end of the scale (e.g. business - busy) but only at longer prime durations. The present findings are in line with models that posit an initial phase of morpho-orthographic (semantically blind) decomposition.

  12. What is in a contour map? A region-based logical formalization of contour semantics

    USGS Publications Warehouse

    Usery, E. Lynn; Hahmann, Torsten

    2015-01-01

    This paper analyses and formalizes contour semantics in a first-order logic ontology that forms the basis for enabling computational common sense reasoning about contour information. The elicited contour semantics comprises four key concepts – contour regions, contour lines, contour values, and contour sets – and their subclasses and associated relations, which are grounded in an existing qualitative spatial ontology. All concepts and relations are illustrated and motivated by physical-geographic features identifiable on topographic contour maps. The encoding of the semantics of contour concepts in first-order logic and a derived conceptual model as basis for an OWL ontology lay the foundation for fully automated, semantically-aware qualitative and quantitative reasoning about contours.

  13. Semantics-Based Interoperability Framework for the Geosciences

    NASA Astrophysics Data System (ADS)

    Sinha, A.; Malik, Z.; Raskin, R.; Barnes, C.; Fox, P.; McGuinness, D.; Lin, K.

    2008-12-01

    Interoperability between heterogeneous data, tools and services is required to transform data to knowledge. To meet geoscience-oriented societal challenges such as forcing of climate change induced by volcanic eruptions, we suggest the need to develop semantic interoperability for data, services, and processes. Because such scientific endeavors require integration of multiple data bases associated with global enterprises, implicit semantic-based integration is impossible. Instead, explicit semantics are needed to facilitate interoperability and integration. Although different types of integration models are available (syntactic or semantic) we suggest that semantic interoperability is likely to be the most successful pathway. Clearly, the geoscience community would benefit from utilization of existing XML-based data models, such as GeoSciML, WaterML, etc to rapidly advance semantic interoperability and integration. We recognize that such integration will require a "meanings-based search, reasoning and information brokering", which will be facilitated through inter-ontology relationships (ontologies defined for each discipline). We suggest that Markup languages (MLs) and ontologies can be seen as "data integration facilitators", working at different abstraction levels. Therefore, we propose to use an ontology-based data registration and discovery approach to compliment mark-up languages through semantic data enrichment. Ontologies allow the use of formal and descriptive logic statements which permits expressive query capabilities for data integration through reasoning. We have developed domain ontologies (EPONT) to capture the concept behind data. EPONT ontologies are associated with existing ontologies such as SUMO, DOLCE and SWEET. Although significant efforts have gone into developing data (object) ontologies, we advance the idea of developing semantic frameworks for additional ontologies that deal with processes and services. This evolutionary step will facilitate the integrative capabilities of scientists as we examine the relationships between data and external factors such as processes that may influence our understanding of "why" certain events happen. We emphasize the need to go from analysis of data to concepts related to scientific principles of thermodynamics, kinetics, heat flow, mass transfer, etc. Towards meeting these objectives, we report on a pair of related service engines: DIA (Discovery, integration and analysis), and SEDRE (Semantically-Enabled Data Registration Engine) that utilize ontologies for semantic interoperability and integration.

  14. Understanding Nomophobia: Structural Equation Modeling and Semantic Network Analysis of Smartphone Separation Anxiety.

    PubMed

    Han, Seunghee; Kim, Ki Joon; Kim, Jang Hyun

    2017-07-01

    This study explicates nomophobia by developing a research model that identifies several determinants of smartphone separation anxiety and by conducting semantic network analyses on smartphone users' verbal descriptions of the meaning of their smartphones. Structural equation modeling of the proposed model indicates that personal memories evoked by smartphones encourage users to extend their identity onto their devices. When users perceive smartphones as their extended selves, they are more likely to get attached to the devices, which, in turn, leads to nomophobia by heightening the phone proximity-seeking tendency. This finding is also supplemented by the results of the semantic network analyses revealing that the words related to memory, self, and proximity-seeking are indeed more frequently used in the high, compared with low, nomophobia group.

  15. Development of Semantic Description for Multiscale Models of Thermo-Mechanical Treatment of Metal Alloys

    NASA Astrophysics Data System (ADS)

    Macioł, Piotr; Regulski, Krzysztof

    2016-08-01

    We present a process of semantic meta-model development for data management in an adaptable multiscale modeling framework. The main problems in ontology design are discussed, and a solution achieved as a result of the research is presented. The main concepts concerning the application and data management background for multiscale modeling were derived from the AM3 approach—object-oriented Agile multiscale modeling methodology. The ontological description of multiscale models enables validation of semantic correctness of data interchange between submodels. We also present a possibility of using the ontological model as a supervisor in conjunction with a multiscale model controller and a knowledge base system. Multiscale modeling formal ontology (MMFO), designed for describing multiscale models' data and structures, is presented. A need for applying meta-ontology in the MMFO development process is discussed. Examples of MMFO application in describing thermo-mechanical treatment of metal alloys are discussed. Present and future applications of MMFO are described.

  16. Novel metaphor comprehension: Semantic neighbourhood density interacts with concreteness.

    PubMed

    Al-Azary, Hamad; Buchanan, Lori

    2017-02-01

    Previous research suggests that metaphor comprehension is affected both by the concreteness of the topic and vehicle and their semantic neighbours (Kintsch, 2000; Xu, 2010). However, studies have yet to manipulate these 2 variables simultaneously. To that end, we composed novel metaphors manipulated on topic concreteness and semantic neighbourhood density (SND) of topic and vehicle. In Experiment 1, participants rated the metaphors on the suitability (e.g. sensibility) of their topic-vehicle pairings. Topic concreteness interacted with SND such that participants rated metaphors from sparse semantic spaces to be more sensible than those from dense semantic spaces and preferred abstract topics over concrete topics only for metaphors from dense semantic spaces. In Experiments 2 and 3, we used presentation deadlines and found that topic concreteness and SND affect the online processing stages associated with metaphor comprehension. We discuss how the results are aligned with established psycholinguistic models of metaphor comprehension.

  17. Sealife: a semantic grid browser for the life sciences applied to the study of infectious diseases.

    PubMed

    Schroeder, Michael; Burger, Albert; Kostkova, Patty; Stevens, Robert; Habermann, Bianca; Dieng-Kuntz, Rose

    2006-01-01

    The objective of Sealife is the conception and realisation of a semantic Grid browser for the life sciences, which will link the existing Web to the currently emerging eScience infrastructure. The SeaLife Browser will allow users to automatically link a host of Web servers and Web/Grid services to the Web content he/she is visiting. This will be accomplished using eScience's growing number of Web/Grid Services and its XML-based standards and ontologies. The browser will identify terms in the pages being browsed through the background knowledge held in ontologies. Through the use of Semantic Hyperlinks, which link identified ontology terms to servers and services, the SeaLife Browser will offer a new dimension of context-based information integration. In this paper, we give an overview over the different components of the browser and their interplay. This SeaLife Browser will be demonstrated within three application scenarios in evidence-based medicine, literature & patent mining, and molecular biology, all relating to the study of infectious diseases. The three applications vertically integrate the molecule/cell, the tissue/organ and the patient/population level by covering the analysis of high-throughput screening data for endocytosis (the molecular entry pathway into the cell), the expression of proteins in the spatial context of tissue and organs, and a high-level library on infectious diseases designed for clinicians and their patients. For more information see http://www.biote.ctu-dresden.de/sealife.

  18. Developing a kidney and urinary pathway knowledge base

    PubMed Central

    2011-01-01

    Background Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. Results We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. Conclusions The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. Availability The KUPKB may be accessed via http://www.e-lico.eu/kupkb. PMID:21624162

  19. Dissociation of lexical syntax and semantics: evidence from focal cortical degeneration.

    PubMed

    Garrard, P; Carroll, E; Vinson, D; Vigliocco, G

    2004-10-01

    The question of whether information relevant to meaning (semantics) and structure (syntax) relies on a common language processor or on separate subsystems has proved difficult to address definitively because of the confounds involved in comparing the two types of information. At the sentence level syntactic and semantic judgments make different cognitive demands, while at the single word level, the most commonly used syntactic distinction (between nouns and verbs) is confounded with a fundamental semantic difference (between objects and actions). The present study employs a different syntactic contrast (between count nouns and mass nouns), which is crossed with a semantic difference (between naturally occurring and man-made substances) applying to words within a circumscribed semantic field (foodstuffs). We show, first, that grammaticality judgments of a patient with semantic dementia are indistinguishable from those of a group of age-matched controls, and are similar regardless of the status of his semantic knowledge about the item. In a second experiment we use the triadic task in a group of age-matched controls to show that similarity judgments are influenced not only by meaning (natural vs. manmade), but also implicitly by syntactic information (count vs. mass). Using the same task in a patient with semantic dementia we show that the semantic influences on the syntactic dimension are unlikely to account for this pattern in normals. These data are discussed in relation to modular vs. nonmodular models of language processing, and in particular to the semantic-syntactic distinction.

  20. A federated semantic metadata registry framework for enabling interoperability across clinical research and care domains.

    PubMed

    Sinaci, A Anil; Laleci Erturkmen, Gokce B

    2013-10-01

    In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Facilitation and interference in naming: A consequence of the same learning process?

    PubMed

    Hughes, Julie W; Schnur, Tatiana T

    2017-08-01

    Our success with naming depends on what we have named previously, a phenomenon thought to reflect learning processes. Repeatedly producing the same name facilitates language production (i.e., repetition priming), whereas producing semantically related names hinders subsequent performance (i.e., semantic interference). Semantic interference is found whether naming categorically related items once (continuous naming) or multiple times (blocked cyclic naming). A computational model suggests that the same learning mechanism responsible for facilitation in repetition creates semantic interference in categorical naming (Oppenheim, Dell, & Schwartz, 2010). Accordingly, we tested the predictions that variability in semantic interference is correlated across categorical naming tasks and is caused by learning, as measured by two repetition priming tasks (picture-picture repetition priming, Exp. 1; definition-picture repetition priming, Exp. 2, e.g., Wheeldon & Monsell, 1992). In Experiment 1 (77 subjects) semantic interference and repetition priming effects were robust, but the results revealed no relationship between semantic interference effects across contexts. Critically, learning (picture-picture repetition priming) did not predict semantic interference effects in either task. We replicated these results in Experiment 2 (81 subjects), finding no relationship between semantic interference effects across tasks or between semantic interference effects and learning (definition-picture repetition priming). We conclude that the changes underlying facilitatory and interfering effects inherent to lexical access are the result of distinct learning processes where multiple mechanisms contribute to semantic interference in naming. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Semantic framework for mapping object-oriented model to semantic web languages

    PubMed Central

    Ježek, Petr; Mouček, Roman

    2015-01-01

    The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework. PMID:25762923

  3. Semantic framework for mapping object-oriented model to semantic web languages.

    PubMed

    Ježek, Petr; Mouček, Roman

    2015-01-01

    The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework.

  4. Cognitive modeling as an interface between brain and behavior: Measuring the semantic decline in mild cognitive impairment.

    PubMed

    Johns, Brendan T; Taler, Vanessa; Pisoni, David B; Farlow, Martin R; Hake, Ann Marie; Kareken, David A; Unverzagt, Frederick W; Jones, Michael N

    2018-06-01

    Mild cognitive impairment (MCI) is characterised by subjective and objective memory impairment in the absence of dementia. MCI is a strong predictor for the development of Alzheimer's disease, and may represent an early stage in the disease course in many cases. A standard task used in the diagnosis of MCI is verbal fluency, where participants produce as many items from a specific category (e.g., animals) as possible. Verbal fluency performance is typically analysed by counting the number of items produced. However, analysis of the semantic path of the items produced can provide valuable additional information. We introduce a cognitive model that uses multiple types of lexical information in conjunction with a standard memory search process. The model used a semantic representation derived from a standard semantic space model in conjunction with a memory searching mechanism derived from the Luce choice rule (Luce, 1977). The model was able to detect differences in the memory searching process of patients who were developing MCI, suggesting that the formal analysis of verbal fluency data is a promising avenue to examine the underlying changes occurring in the development of cognitive impairment. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  5. The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation

    DOE PAGES

    Buttigieg, Pier Luigi; Pafilis, Evangelos; Lewis, Suzanna E.; ...

    2016-09-23

    Background: The Environment Ontology (ENVO; http://www.environmentontology.org/), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications. Methods: We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVOmore » in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO. Results: Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl. Conclusions: ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, 'omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO's growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.« less

  6. The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buttigieg, Pier Luigi; Pafilis, Evangelos; Lewis, Suzanna E.

    Background: The Environment Ontology (ENVO; http://www.environmentontology.org/), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications. Methods: We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVOmore » in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO. Results: Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl. Conclusions: ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, 'omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO's growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.« less

  7. The SADI Personal Health Lens: A Web Browser-Based System for Identifying Personally Relevant Drug Interactions.

    PubMed

    Vandervalk, Ben; McCarthy, E Luke; Cruz-Toledo, José; Klein, Artjom; Baker, Christopher J O; Dumontier, Michel; Wilkinson, Mark D

    2013-04-05

    The Web provides widespread access to vast quantities of health-related information that can improve quality-of-life through better understanding of personal symptoms, medical conditions, and available treatments. Unfortunately, identifying a credible and personally relevant subset of information can be a time-consuming and challenging task for users without a medical background. The objective of the Personal Health Lens system is to aid users when reading health-related webpages by providing warnings about personally relevant drug interactions. More broadly, we wish to present a prototype for a novel, generalizable approach to facilitating interactions between a patient, their practitioner(s), and the Web. We utilized a distributed, Semantic Web-based architecture for recognizing personally dangerous drugs consisting of: (1) a private, local triple store of personal health information, (2) Semantic Web services, following the Semantic Automated Discovery and Integration (SADI) design pattern, for text mining and identifying substance interactions, (3) a bookmarklet to trigger analysis of a webpage and annotate it with personalized warnings, and (4) a semantic query that acts as an abstract template of the analytical workflow to be enacted by the system. A prototype implementation of the system is provided in the form of a Java standalone executable JAR file. The JAR file bundles all components of the system: the personal health database, locally-running versions of the SADI services, and a javascript bookmarklet that triggers analysis of a webpage. In addition, the demonstration includes a hypothetical personal health profile, allowing the system to be used immediately without configuration. Usage instructions are provided. The main strength of the Personal Health Lens system is its ability to organize medical information and to present it to the user in a personalized and contextually relevant manner. While this prototype was limited to a single knowledge domain (drug/drug interactions), the proposed architecture is generalizable, and could act as the foundation for much richer personalized-health-Web clients, while importantly providing a novel and personalizable mechanism for clinical experts to inject their expertise into the browsing experience of their patients in the form of customized semantic queries and ontologies.

  8. The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation.

    PubMed

    Buttigieg, Pier Luigi; Pafilis, Evangelos; Lewis, Suzanna E; Schildhauer, Mark P; Walls, Ramona L; Mungall, Christopher J

    2016-09-23

    The Environment Ontology (ENVO; http://www.environmentontology.org/ ), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications. We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVO in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO. Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl . ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, 'omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO's growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.

  9. The SADI Personal Health Lens: A Web Browser-Based System for Identifying Personally Relevant Drug Interactions

    PubMed Central

    Vandervalk, Ben; McCarthy, E Luke; Cruz-Toledo, José; Klein, Artjom; Baker, Christopher J O; Dumontier, Michel

    2013-01-01

    Background The Web provides widespread access to vast quantities of health-related information that can improve quality-of-life through better understanding of personal symptoms, medical conditions, and available treatments. Unfortunately, identifying a credible and personally relevant subset of information can be a time-consuming and challenging task for users without a medical background. Objective The objective of the Personal Health Lens system is to aid users when reading health-related webpages by providing warnings about personally relevant drug interactions. More broadly, we wish to present a prototype for a novel, generalizable approach to facilitating interactions between a patient, their practitioner(s), and the Web. Methods We utilized a distributed, Semantic Web-based architecture for recognizing personally dangerous drugs consisting of: (1) a private, local triple store of personal health information, (2) Semantic Web services, following the Semantic Automated Discovery and Integration (SADI) design pattern, for text mining and identifying substance interactions, (3) a bookmarklet to trigger analysis of a webpage and annotate it with personalized warnings, and (4) a semantic query that acts as an abstract template of the analytical workflow to be enacted by the system. Results A prototype implementation of the system is provided in the form of a Java standalone executable JAR file. The JAR file bundles all components of the system: the personal health database, locally-running versions of the SADI services, and a javascript bookmarklet that triggers analysis of a webpage. In addition, the demonstration includes a hypothetical personal health profile, allowing the system to be used immediately without configuration. Usage instructions are provided. Conclusions The main strength of the Personal Health Lens system is its ability to organize medical information and to present it to the user in a personalized and contextually relevant manner. While this prototype was limited to a single knowledge domain (drug/drug interactions), the proposed architecture is generalizable, and could act as the foundation for much richer personalized-health-Web clients, while importantly providing a novel and personalizable mechanism for clinical experts to inject their expertise into the browsing experience of their patients in the form of customized semantic queries and ontologies. PMID:23612187

  10. Modeling semantic aspects for cross-media image indexing.

    PubMed

    Monay, Florent; Gatica-Perez, Daniel

    2007-10-01

    To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.

  11. Recommendation of standardized health learning contents using archetypes and semantic web technologies.

    PubMed

    Legaz-García, María del Carmen; Martínez-Costa, Catalina; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

    2012-01-01

    Linking Electronic Healthcare Records (EHR) content to educational materials has been considered a key international recommendation to enable clinical engagement and to promote patient safety. This would suggest citizens to access reliable information available on the web and to guide them properly. In this paper, we describe an approach in that direction, based on the use of dual model EHR standards and standardized educational contents. The recommendation method will be based on the semantic coverage of the learning content repository for a particular archetype, which will be calculated by applying semantic web technologies like ontologies and semantic annotations.

  12. Diagnosis of Cognitive Impairment Compatible with Early Diagnosis of Alzheimer's Disease. A Bayesian Network Model based on the Analysis of Oral Definitions of Semantic Categories.

    PubMed

    Guerrero, J M; Martínez-Tomás, R; Rincón, M; Peraita, H

    2016-01-01

    Early detection of Alzheimer's disease (AD) has become one of the principal focuses of research in medicine, particularly when the disease is incipient or even prodromic, because treatments are more effective in these stages. Lexical-semantic-conceptual deficit (LSCD) in the oral definitions of semantic categories for basic objects is an important early indicator in the evaluation of the cognitive state of patients. The objective of this research is to define an economic procedure for cognitive impairment (CI) diagnosis, which may be associated with early stages of AD, by analysing cognitive alterations affecting declarative semantic memory. Because of its low cost, it could be used for routine clinical evaluations or screenings, leading to more expensive and selective tests that confirm or rule out the disease accurately. It should necessarily be an explanatory procedure, which would allow us to study the evolution of the disease in relation to CI, the irregularities in different semantic categories, and other neurodegenerative diseases. On the basis of these requirements, we hypothesise that Bayesian networks (BNs) are the most appropriate tool for this purpose. We have developed a BN for CI diagnosis in mild and moderate AD patients by analysing the oral production of semantic features. The BN causal model represents LSCD in certain semantic categories, both of living things (dog, pine, and apple) and non-living things (chair, car, and trousers), as symptoms of CI. The model structure, the qualitative part of the model, uses domain knowledge obtained from psychology experts and epidemiological studies. Further, the model parameters, the quantitative part of the model, are learnt automatically from epidemiological studies and Peraita and Grasso's linguistic corpus of oral definitions. This corpus was prepared with an incidental sampling and included the analysis of the oral linguistic production of 81 participants (42 cognitively healthy elderly people and 39 mild and moderate AD patients) from Madrid region's hospitals. Experienced neurologists diagnosed these cases following the National Institute of Neurological and Communicative Disorders and Stroke/Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA)'s Alzheimer's criteria, performing, among other explorations and tests, a minimum neuropsychological exploration that included the Mini-Mental State Examination test. BN's classification performance is remarkable compared with other machine learning methods, achieving 91% accuracy and 94% precision in mild and moderate AD patients. Apart from this, the BN model facilitates the explanation of the reasoning process and the validation of the conclusions and allows the study of uncommon declarative semantic memory impairments. Our method is able to analyse LSCD in a wide set of semantic categories throughout the progression of CI, being a valuable first screening method in AD diagnosis in its early stages. Because of its low cost, it can be used for routine clinical evaluations or screenings to detect AD in its early stages. Besides, due to its knowledge-based structure, it can be easily extended to provide an explanation of the diagnosis and to the study of other neurodegenerative diseases. Further, this is a key advantage of BNs over other machine learning methods with similar performance: it is a recognisable and explanatory model that allows one to study irregularities in different semantic categories.

  13. Connecting long distance: semantic distance in analogical reasoning modulates frontopolar cortex activity.

    PubMed

    Green, Adam E; Kraemer, David J M; Fugelsang, Jonathan A; Gray, Jeremy R; Dunbar, Kevin N

    2010-01-01

    Solving problems often requires seeing new connections between concepts or events that seemed unrelated at first. Innovative solutions of this kind depend on analogical reasoning, a relational reasoning process that involves mapping similarities between concepts. Brain-based evidence has implicated the frontal pole of the brain as important for analogical mapping. Separately, cognitive research has identified semantic distance as a key characteristic of the kind of analogical mapping that can support innovation (i.e., identifying similarities across greater semantic distance reveals connections that support more innovative solutions and models). However, the neural substrates of semantically distant analogical mapping are not well understood. Here, we used functional magnetic resonance imaging (fMRI) to measure brain activity during an analogical reasoning task, in which we parametrically varied the semantic distance between the items in the analogies. Semantic distance was derived quantitatively from latent semantic analysis. Across 23 participants, activity in an a priori region of interest (ROI) in left frontopolar cortex covaried parametrically with increasing semantic distance, even after removing effects of task difficulty. This ROI was centered on a functional peak that we previously associated with analogical mapping. To our knowledge, these data represent a first empirical characterization of how the brain mediates semantically distant analogical mapping.

  14. Semantic e-Science: From Microformats to Models

    NASA Astrophysics Data System (ADS)

    Lumb, L. I.; Freemantle, J. R.; Aldridge, K. D.

    2009-05-01

    A platform has been developed to transform semi-structured ASCII data into a representation based on the eXtensible Markup Language (XML). A subsequent transformation allows the XML-based representation to be rendered in the Resource Description Format (RDF). Editorial metadata, expressed as external annotations (via XML Pointer Language), also survives this transformation process (e.g., Lumb et al., http://dx.doi.org/10.1016/j.cageo.2008.03.009). Because the XML-to-RDF transformation uses XSLT (eXtensible Stylesheet Language Transformations), semantic microformats ultimately encode the scientific data (Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26). In building the relationship-centric representation in RDF, a Semantic Model of the scientific data is extracted. The systematic enhancement in the expressivity and richness of the scientific data results in representations of knowledge that are readily understood and manipulated by intelligent software agents. Thus scientists are able to draw upon various resources within and beyond their discipline to use in their scientific applications. Since the resulting Semantic Models are independent conceptualizations of the science itself, the representation of scientific knowledge and interaction with the same can stimulate insight from different perspectives. Using the Global Geodynamics Project (GGP) for the purpose of illustration, the introduction of GGP microformats enable a Semantic Model for the GGP that can be semantically queried (e.g., via SPARQL, http://www.w3.org/TR/rdf-sparql-query). Although the present implementation uses the Open Source Redland RDF Libraries (http://librdf.org/), the approach is generalizable to other platforms and to projects other than the GGP (e.g., Baker et al., Informatics and the 2007-2008 Electronic Geophysical Year, Eos Trans. Am. Geophys. Un., 89(48), 485-486, 2008).

  15. Hybrid ontology for semantic information retrieval model using keyword matching indexing system.

    PubMed

    Uthayan, K R; Mala, G S Anandha

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology.

  16. Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System

    PubMed Central

    Uthayan, K. R.; Anandha Mala, G. S.

    2015-01-01

    Ontology is the process of growth and elucidation of concepts of an information domain being common for a group of users. Establishing ontology into information retrieval is a normal method to develop searching effects of relevant information users require. Keywords matching process with historical or information domain is significant in recent calculations for assisting the best match for specific input queries. This research presents a better querying mechanism for information retrieval which integrates the ontology queries with keyword search. The ontology-based query is changed into a primary order to predicate logic uncertainty which is used for routing the query to the appropriate servers. Matching algorithms characterize warm area of researches in computer science and artificial intelligence. In text matching, it is more dependable to study semantics model and query for conditions of semantic matching. This research develops the semantic matching results between input queries and information in ontology field. The contributed algorithm is a hybrid method that is based on matching extracted instances from the queries and information field. The queries and information domain is focused on semantic matching, to discover the best match and to progress the executive process. In conclusion, the hybrid ontology in semantic web is sufficient to retrieve the documents when compared to standard ontology. PMID:25922851

  17. Processing of visual semantic information to concrete words: temporal dynamics and neural mechanisms indicated by event-related brain potentials( ).

    PubMed

    van Schie, Hein T; Wijers, Albertus A; Mars, Rogier B; Benjamins, Jeroen S; Stowe, Laurie A

    2005-05-01

    Event-related brain potentials were used to study the retrieval of visual semantic information to concrete words, and to investigate possible structural overlap between visual object working memory and concreteness effects in word processing. Subjects performed an object working memory task that involved 5 s retention of simple 4-angled polygons (load 1), complex 10-angled polygons (load 2), and a no-load baseline condition. During the polygon retention interval subjects were presented with a lexical decision task to auditory presented concrete (imageable) and abstract (nonimageable) words, and pseudowords. ERP results are consistent with the use of object working memory for the visualisation of concrete words. Our data indicate a two-step processing model of visual semantics in which visual descriptive information of concrete words is first encoded in semantic memory (indicated by an anterior N400 and posterior occipital positivity), and is subsequently visualised via the network for object working memory (reflected by a left frontal positive slow wave and a bilateral occipital slow wave negativity). Results are discussed in the light of contemporary models of semantic memory.

  18. Semantic Enrichment of Movement Behavior with Foursquare--A Visual Analytics Approach.

    PubMed

    Krueger, Robert; Thom, Dennis; Ertl, Thomas

    2015-08-01

    In recent years, many approaches have been developed that efficiently and effectively visualize movement data, e.g., by providing suitable aggregation strategies to reduce visual clutter. Analysts can use them to identify distinct movement patterns, such as trajectories with similar direction, form, length, and speed. However, less effort has been spent on finding the semantics behind movements, i.e. why somebody or something is moving. This can be of great value for different applications, such as product usage and consumer analysis, to better understand urban dynamics, and to improve situational awareness. Unfortunately, semantic information often gets lost when data is recorded. Thus, we suggest to enrich trajectory data with POI information using social media services and show how semantic insights can be gained. Furthermore, we show how to handle semantic uncertainties in time and space, which result from noisy, unprecise, and missing data, by introducing a POI decision model in combination with highly interactive visualizations. Finally, we evaluate our approach with two case studies on a large electric scooter data set and test our model on data with known ground truth.

  19. Face (and Nose) Priming for Book: The Malleability of Semantic Memory

    PubMed Central

    Coane, Jennifer H.; Balota, David A.

    2010-01-01

    There are two general classes of models of semantic structure that support semantic priming effects. Feature-overlap models of semantic priming assume that shared features between primes and targets are critical (e.g., cat-DOG). Associative accounts assume that contextual co-occurrence is critical and that the system is organized along associations independent of featural overlap (e.g., leash-DOG). If unrelated concepts can become related as a result of contextual co-occurrence, this would be more supportive of associative accounts and provide insight into the nature of the network underlying “semantic” priming effects. Naturally co-occurring recent associations (e.g., face-BOOK) were tested under conditions that minimize strategic influences (i.e., short stimulus onset asynchrony, low relatedness proportion) in a semantic priming paradigm. Priming for new associations did not differ from the priming found for pre-existing relations (e.g., library-BOOK). Mediated priming (e.g., nose-BOOK) was also found. These results suggest that contextual associations can result in the reorganization of the network that subserves “semantic” priming effects. PMID:20494866

  20. A fusion network for semantic segmentation using RGB-D data

    NASA Astrophysics Data System (ADS)

    Yuan, Jiahui; Zhang, Kun; Xia, Yifan; Qi, Lin; Dong, Junyu

    2018-04-01

    Semantic scene parsing is considerable in many intelligent field, including perceptual robotics. For the past few years, pixel-wise prediction tasks like semantic segmentation with RGB images has been extensively studied and has reached very remarkable parsing levels, thanks to convolutional neural networks (CNNs) and large scene datasets. With the development of stereo cameras and RGBD sensors, it is expected that additional depth information will help improving accuracy. In this paper, we propose a semantic segmentation framework incorporating RGB and complementary depth information. Motivated by the success of fully convolutional networks (FCN) in semantic segmentation field, we design a fully convolutional networks consists of two branches which extract features from both RGB and depth data simultaneously and fuse them as the network goes deeper. Instead of aggregating multiple model, our goal is to utilize RGB data and depth data more effectively in a single model. We evaluate our approach on the NYU-Depth V2 dataset, which consists of 1449 cluttered indoor scenes, and achieve competitive results with the state-of-the-art methods.

  1. Informatics in radiology: radiology gamuts ontology: differential diagnosis for the Semantic Web.

    PubMed

    Budovec, Joseph J; Lam, Cesar A; Kahn, Charles E

    2014-01-01

    The Semantic Web is an effort to add semantics, or "meaning," to empower automated searching and processing of Web-based information. The overarching goal of the Semantic Web is to enable users to more easily find, share, and combine information. Critical to this vision are knowledge models called ontologies, which define a set of concepts and formalize the relations between them. Ontologies have been developed to manage and exploit the large and rapidly growing volume of information in biomedical domains. In diagnostic radiology, lists of differential diagnoses of imaging observations, called gamuts, provide an important source of knowledge. The Radiology Gamuts Ontology (RGO) is a formal knowledge model of differential diagnoses in radiology that includes 1674 differential diagnoses, 19,017 terms, and 52,976 links between terms. Its knowledge is used to provide an interactive, freely available online reference of radiology gamuts ( www.gamuts.net ). A Web service allows its content to be discovered and consumed by other information systems. The RGO integrates radiologic knowledge with other biomedical ontologies as part of the Semantic Web. © RSNA, 2014.

  2. Dynamic information processing states revealed through neurocognitive models of object semantics

    PubMed Central

    Clarke, Alex

    2015-01-01

    Recognising objects relies on highly dynamic, interactive brain networks to process multiple aspects of object information. To fully understand how different forms of information about objects are represented and processed in the brain requires a neurocognitive account of visual object recognition that combines a detailed cognitive model of semantic knowledge with a neurobiological model of visual object processing. Here we ask how specific cognitive factors are instantiated in our mental processes and how they dynamically evolve over time. We suggest that coarse semantic information, based on generic shared semantic knowledge, is rapidly extracted from visual inputs and is sufficient to drive rapid category decisions. Subsequent recurrent neural activity between the anterior temporal lobe and posterior fusiform supports the formation of object-specific semantic representations – a conjunctive process primarily driven by the perirhinal cortex. These object-specific representations require the integration of shared and distinguishing object properties and support the unique recognition of objects. We conclude that a valuable way of understanding the cognitive activity of the brain is though testing the relationship between specific cognitive measures and dynamic neural activity. This kind of approach allows us to move towards uncovering the information processing states of the brain and how they evolve over time. PMID:25745632

  3. Tableau Calculus for the Logic of Comparative Similarity over Arbitrary Distance Spaces

    NASA Astrophysics Data System (ADS)

    Alenda, Régis; Olivetti, Nicola

    The logic CSL (first introduced by Sheremet, Tishkovsky, Wolter and Zakharyaschev in 2005) allows one to reason about distance comparison and similarity comparison within a modal language. The logic can express assertions of the kind "A is closer/more similar to B than to C" and has a natural application to spatial reasoning, as well as to reasoning about concept similarity in ontologies. The semantics of CSL is defined in terms of models based on different classes of distance spaces and it generalizes the logic S4 u of topological spaces. In this paper we consider CSL defined over arbitrary distance spaces. The logic comprises a binary modality to represent comparative similarity and a unary modality to express the existence of the minimum of a set of distances. We first show that the semantics of CSL can be equivalently defined in terms of preferential models. As a consequence we obtain the finite model property of the logic with respect to its preferential semantic, a property that does not hold with respect to the original distance-space semantics. Next we present an analytic tableau calculus based on its preferential semantics. The calculus provides a decision procedure for the logic, its termination is obtained by imposing suitable blocking restrictions.

  4. Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data

    PubMed Central

    2013-01-01

    Background The World Wide Web has become a dissemination platform for scientific and non-scientific publications. However, most of the information remains locked up in discrete documents that are not always interconnected or machine-readable. The connectivity tissue provided by RDF technology has not yet been widely used to support the generation of self-describing, machine-readable documents. Results In this paper, we present our approach to the generation of self-describing machine-readable scholarly documents. We understand the scientific document as an entry point and interface to the Web of Data. We have semantically processed the full-text, open-access subset of PubMed Central. Our RDF model and resulting dataset make extensive use of existing ontologies and semantic enrichment services. We expose our model, services, prototype, and datasets at http://biotea.idiginfo.org/ Conclusions The semantic processing of biomedical literature presented in this paper embeds documents within the Web of Data and facilitates the execution of concept-based queries against the entire digital library. Our approach delivers a flexible and adaptable set of tools for metadata enrichment and semantic processing of biomedical documents. Our model delivers a semantically rich and highly interconnected dataset with self-describing content so that software can make effective use of it. PMID:23734622

  5. Convergence of Health Level Seven Version 2 Messages to Semantic Web Technologies for Software-Intensive Systems in Telemedicine Trauma Care

    PubMed Central

    Cook, Timothy Wayne; Cavalini, Luciana Tricai

    2016-01-01

    Objectives To present the technical background and the development of a procedure that enriches the semantics of Health Level Seven version 2 (HL7v2) messages for software-intensive systems in telemedicine trauma care. Methods This study followed a multilevel model-driven approach for the development of semantically interoperable health information systems. The Pre-Hospital Trauma Life Support (PHTLS) ABCDE protocol was adopted as the use case. A prototype application embedded the semantics into an HL7v2 message as an eXtensible Markup Language (XML) file, which was validated against an XML schema that defines constraints on a common reference model. This message was exchanged with a second prototype application, developed on the Mirth middleware, which was also used to parse and validate both the original and the hybrid messages. Results Both versions of the data instance (one pure XML, one embedded in the HL7v2 message) were equally validated and the RDF-based semantics recovered by the receiving side of the prototype from the shared XML schema. Conclusions This study demonstrated the semantic enrichment of HL7v2 messages for intensive-software telemedicine systems for trauma care, by validating components of extracts generated in various computing environments. The adoption of the method proposed in this study ensures the compliance of the HL7v2 standard in Semantic Web technologies. PMID:26893947

  6. The Semantic Network Model of Creativity: Analysis of Online Social Media Data

    ERIC Educational Resources Information Center

    Yu, Feng; Peng, Theodore; Peng, Kaiping; Zheng, Sam Xianjun; Liu, Zhiyuan

    2016-01-01

    The central hypothesis of Semantic Network Model of Creativity is that creative people, who are exposed to more information that are both novel and useful, will have more interconnections between event schemas in their associations. The networks of event schemas in creative people's minds were expected to be wider and denser than those in less…

  7. A Semantic-Oriented Approach for Organizing and Developing Annotation for E-Learning

    ERIC Educational Resources Information Center

    Brut, Mihaela M.; Sedes, Florence; Dumitrescu, Stefan D.

    2011-01-01

    This paper presents a solution to extend the IEEE LOM standard with ontology-based semantic annotations for efficient use of learning objects outside Learning Management Systems. The data model corresponding to this approach is first presented. The proposed indexing technique for this model development in order to acquire a better annotation of…

  8. Lexico-Semantic Structure and the Word-Frequency Effect in Recognition Memory

    ERIC Educational Resources Information Center

    Monaco, Joseph D.; Abbott, L. F.; Kahana, Michael J.

    2007-01-01

    The word-frequency effect (WFE) in recognition memory refers to the finding that more rare words are better recognized than more common words. We demonstrate that a familiarity-discrimination model operating on data from a semantic word-association space yields a robust WFE in data on both hit rates and false-alarm rates. Our modeling results…

  9. Overcoming an obstacle in expanding a UMLS semantic type extent.

    PubMed

    Chen, Yan; Gu, Huanying; Perl, Yehoshua; Geller, James

    2012-02-01

    This paper strives to overcome a major problem encountered by a previous expansion methodology for discovering concepts highly likely to be missing a specific semantic type assignment in the UMLS. This methodology is the basis for an algorithm that presents the discovered concepts to a human auditor for review and possible correction. We analyzed the problem of the previous expansion methodology and discovered that it was due to an obstacle constituted by one or more concepts assigned the UMLS Semantic Network semantic type Classification. A new methodology was designed that bypasses such an obstacle without a combinatorial explosion in the number of concepts presented to the human auditor for review. The new expansion methodology with obstacle avoidance was tested with the semantic type Experimental Model of Disease and found over 500 concepts missed by the previous methodology that are in need of this semantic type assignment. Furthermore, other semantic types suffering from the same major problem were discovered, indicating that the methodology is of more general applicability. The algorithmic discovery of concepts that are likely missing a semantic type assignment is possible even in the face of obstacles, without an explosion in the number of processed concepts. Copyright © 2011 Elsevier Inc. All rights reserved.

  10. Overcoming an Obstacle in Expanding a UMLS Semantic Type Extent

    PubMed Central

    Chen, Yan; Gu, Huanying; Perl, Yehoshua; Geller, James

    2011-01-01

    This paper strives to overcome a major problem encountered by a previous expansion methodology for discovering concepts highly likely to be missing a specific semantic type assignment in the UMLS. This methodology is the basis for an algorithm that presents the discovered concepts to a human auditor for review and possible correction. We analyzed the problem of the previous expansion methodology and discovered that it was due to an obstacle constituted by one or more concepts assigned the UMLS Semantic Network semantic type Classification. A new methodology was designed that bypasses such an obstacle without a combinatorial explosion in the number of concepts presented to the human auditor for review. The new expansion methodology with obstacle avoidance was tested with the semantic type Experimental Model of Disease and found over 500 concepts missed by the previous methodology that are in need of this semantic type assignment. Furthermore, other semantic types suffering from the same major problem were discovered, indicating that the methodology is of more general applicability. The algorithmic discovery of concepts that are likely missing a semantic type assignment is possible even in the face of obstacles, without an explosion in the number of processed concepts. PMID:21925287

  11. The effects of bilingual language proficiency on recall accuracy and semantic clustering in free recall output: evidence for shared semantic associations across languages.

    PubMed

    Francis, Wendy S; Taylor, Randolph S; Gutiérrez, Marisela; Liaño, Mary K; Manzanera, Diana G; Penalver, Renee M

    2018-05-19

    Two experiments investigated how well bilinguals utilise long-standing semantic associations to encode and retrieve semantic clusters in verbal episodic memory. In Experiment 1, Spanish-English bilinguals (N = 128) studied and recalled word and picture sets. Word recall was equivalent in L1 and L2, picture recall was better in L1 than in L2, and the picture superiority effect was stronger in L1 than in L2. Semantic clustering in word and picture recall was equivalent in L1 and L2. In Experiment 2, Spanish-English bilinguals (N = 128) and English-speaking monolinguals (N = 128) studied and recalled word sequences that contained semantically related pairs. Data were analyzed using a multinomial processing tree approach, the pair-clustering model. Cluster formation was more likely for semantically organised than for randomly ordered word sequences. Probabilities of cluster formation, cluster retrieval, and retrieval of unclustered items did not differ across languages or language groups. Language proficiency has little if any impact on the utilisation of long-standing semantic associations, which are language-general.

  12. The relationship between semantic short-term memory and immediate serial recall of known and unknown words and nonwords: data from two Chinese individuals with aphasia.

    PubMed

    Wong, Winsy; Low, Sam-Po

    2008-07-01

    The present study investigated verbal recall of semantically preserved and degraded words and nonwords by taking into consideration the status of one's semantic short-term memory (STM). Two experiments were conducted on 2 Chinese individuals with aphasia. The first experiment showed that they had largely preserved phonological processing abilities accompanied by mild but comparable semantic processing deficits; however, their performance on STM tasks revealed a double dissociation. The second experiment found that the participant with more preserved semantic STM had better recall of known words and nonwords than of their unknown counterparts, whereas such effects were absent in the patient with severe semantic STM deficit. The results are compatible with models that assume separate phonological and semantic STM components, such as that of R. C. Martin, M. Lesch, and M. Bartha (1999). In addition, the distribution of error types was different from previous studies. This is discussed in terms of the methodology of the authors' experiments and current views regarding the nature of semantic STM and representations in the Chinese mental lexicon. (c) 2008 APA

  13. Supervised guiding long-short term memory for image caption generation based on object classes

    NASA Astrophysics Data System (ADS)

    Wang, Jian; Cao, Zhiguo; Xiao, Yang; Qi, Xinyuan

    2018-03-01

    The present models of image caption generation have the problems of image visual semantic information attenuation and errors in guidance information. In order to solve these problems, we propose a supervised guiding Long Short Term Memory model based on object classes, named S-gLSTM for short. It uses the object detection results from R-FCN as supervisory information with high confidence, and updates the guidance word set by judging whether the last output matches the supervisory information. S-gLSTM learns how to extract the current interested information from the image visual se-mantic information based on guidance word set. The interested information is fed into the S-gLSTM at each iteration as guidance information, to guide the caption generation. To acquire the text-related visual semantic information, the S-gLSTM fine-tunes the weights of the network through the back-propagation of the guiding loss. Complementing guidance information at each iteration solves the problem of visual semantic information attenuation in the traditional LSTM model. Besides, the supervised guidance information in our model can reduce the impact of the mismatched words on the caption generation. We test our model on MSCOCO2014 dataset, and obtain better performance than the state-of-the- art models.

  14. Linking Somatic and Symbolic Representation in Semantic Memory: The Dynamic Multilevel Reactivation Framework

    PubMed Central

    Reilly, Jamie; Peelle, Jonathan E; Garcia, Amanda; Crutch, Sebastian J

    2016-01-01

    Biological plausibility is an essential constraint for any viable model of semantic memory. Yet, we have only the most rudimentary understanding of how the human brain conducts abstract symbolic transformations that underlie word and object meaning. Neuroscience has evolved a sophisticated arsenal of techniques for elucidating the architecture of conceptual representation. Nevertheless, theoretical convergence remains elusive. Here we describe several contrastive approaches to the organization of semantic knowledge, and in turn we offer our own perspective on two recurring questions in semantic memory research: 1) to what extent are conceptual representations mediated by sensorimotor knowledge (i.e., to what degree is semantic memory embodied)? 2) How might an embodied semantic system represent abstract concepts such as modularity, symbol, or proposition? To address these questions, we review the merits of sensorimotor (i.e., embodied) and amodal (i.e., disembodied) semantic theories and address the neurobiological constraints underlying each. We conclude that the shortcomings of both perspectives in their extreme forms necessitate a hybrid middle ground. We accordingly propose the Dynamic Multilevel Reactivation Framework, an integrative model premised upon flexible interplay between sensorimotor and amodal symbolic representations mediated by multiple cortical hubs. We discuss applications of the Dynamic Multilevel Reactivation Framework to abstract and concrete concept representation and describe how a multidimensional conceptual topography based on emotion, sensation, and magnitude can successfully frame a semantic space containing meanings for both abstract and concrete words. The consideration of ‘abstract conceptual features’ does not diminish the role of logical and/or executive processing in activating, manipulating and using information stored in conceptual representations. Rather, it proposes that the material on which these processes operate necessarily combine pure sensorimotor information and higher-order cognitive dimensions involved in symbolic representation. PMID:27294419

  15. Actively learning human gaze shifting paths for semantics-aware photo cropping.

    PubMed

    Zhang, Luming; Gao, Yue; Ji, Rongrong; Xia, Yingjie; Dai, Qionghai; Li, Xuelong

    2014-05-01

    Photo cropping is a widely used tool in printing industry, photography, and cinematography. Conventional cropping models suffer from the following three challenges. First, the deemphasized role of semantic contents that are many times more important than low-level features in photo aesthetics. Second, the absence of a sequential ordering in the existing models. In contrast, humans look at semantically important regions sequentially when viewing a photo. Third, the difficulty of leveraging inputs from multiple users. Experience from multiple users is particularly critical in cropping as photo assessment is quite a subjective task. To address these challenges, this paper proposes semantics-aware photo cropping, which crops a photo by simulating the process of humans sequentially perceiving semantically important regions of a photo. We first project the local features (graphlets in this paper) onto the semantic space, which is constructed based on the category information of the training photos. An efficient learning algorithm is then derived to sequentially select semantically representative graphlets of a photo, and the selecting process can be interpreted by a path, which simulates humans actively perceiving semantics in a photo. Furthermore, we learn a prior distribution of such active graphlet paths from training photos that are marked as aesthetically pleasing by multiple users. The learned priors enforce the corresponding active graphlet path of a test photo to be maximally similar to those from the training photos. Experimental results show that: 1) the active graphlet path accurately predicts human gaze shifting, and thus is more indicative for photo aesthetics than conventional saliency maps and 2) the cropped photos produced by our approach outperform its competitors in both qualitative and quantitative comparisons.

  16. Linking somatic and symbolic representation in semantic memory: the dynamic multilevel reactivation framework.

    PubMed

    Reilly, Jamie; Peelle, Jonathan E; Garcia, Amanda; Crutch, Sebastian J

    2016-08-01

    Biological plausibility is an essential constraint for any viable model of semantic memory. Yet, we have only the most rudimentary understanding of how the human brain conducts abstract symbolic transformations that underlie word and object meaning. Neuroscience has evolved a sophisticated arsenal of techniques for elucidating the architecture of conceptual representation. Nevertheless, theoretical convergence remains elusive. Here we describe several contrastive approaches to the organization of semantic knowledge, and in turn we offer our own perspective on two recurring questions in semantic memory research: (1) to what extent are conceptual representations mediated by sensorimotor knowledge (i.e., to what degree is semantic memory embodied)? (2) How might an embodied semantic system represent abstract concepts such as modularity, symbol, or proposition? To address these questions, we review the merits of sensorimotor (i.e., embodied) and amodal (i.e., disembodied) semantic theories and address the neurobiological constraints underlying each. We conclude that the shortcomings of both perspectives in their extreme forms necessitate a hybrid middle ground. We accordingly propose the Dynamic Multilevel Reactivation Framework-an integrative model predicated upon flexible interplay between sensorimotor and amodal symbolic representations mediated by multiple cortical hubs. We discuss applications of the dynamic multilevel reactivation framework to abstract and concrete concept representation and describe how a multidimensional conceptual topography based on emotion, sensation, and magnitude can successfully frame a semantic space containing meanings for both abstract and concrete words. The consideration of 'abstract conceptual features' does not diminish the role of logical and/or executive processing in activating, manipulating and using information stored in conceptual representations. Rather, it proposes that the materials upon which these processes operate necessarily combine pure sensorimotor information and higher-order cognitive dimensions involved in symbolic representation.

  17. Systematic identification of latent disease-gene associations from PubMed articles.

    PubMed

    Zhang, Yuji; Shen, Feichen; Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang

    2018-01-01

    Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.

  18. Systematic identification of latent disease-gene associations from PubMed articles

    PubMed Central

    Mojarad, Majid Rastegar; Li, Dingcheng; Liu, Sijia; Tao, Cui; Yu, Yue; Liu, Hongfang

    2018-01-01

    Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research. PMID:29373609

  19. Modelling and approaching pragmatic interoperability of distributed geoscience data

    NASA Astrophysics Data System (ADS)

    Ma, Xiaogang

    2010-05-01

    Interoperability of geodata, which is essential for sharing information and discovering insights within a cyberinfrastructure, is receiving increasing attention. A key requirement of interoperability in the context of geodata sharing is that data provided by local sources can be accessed, decoded, understood and appropriately used by external users. Various researchers have discussed that there are four levels in data interoperability issues: system, syntax, schematics and semantics, which respectively relate to the platform, encoding, structure and meaning of geodata. Ontology-driven approaches have been significantly studied addressing schematic and semantic interoperability issues of geodata in the last decade. There are different types, e.g. top-level ontologies, domain ontologies and application ontologies and display forms, e.g. glossaries, thesauri, conceptual schemas and logical theories. Many geodata providers are maintaining their identified local application ontologies in order to drive standardization in local databases. However, semantic heterogeneities often exist between these local ontologies, even though they are derived from equivalent disciplines. In contrast, common ontologies are being studied in different geoscience disciplines (e.g., NAMD, SWEET, etc.) as a standardization procedure to coordinate diverse local ontologies. Semantic mediation, e.g. mapping between local ontologies, or mapping local ontologies to common ontologies, has been studied as an effective way of achieving semantic interoperability between local ontologies thus reconciling semantic heterogeneities in multi-source geodata. Nevertheless, confusion still exists in the research field of semantic interoperability. One problem is caused by eliminating elements of local pragmatic contexts in semantic mediation. Comparing to the context-independent feature of a common domain ontology, local application ontologies are closely related to elements (e.g., people, time, location, intention, procedure, consequence, etc.) of local pragmatic contexts and thus context-dependent. Elimination of these elements will inevitably lead to information loss in semantic mediation between local ontologies. Correspondingly, understanding and effect of exchanged data in a new context may differ from that in its original context. Another problem is the dilemma on how to find a balance between flexibility and standardization of local ontologies, because ontologies are not fixed, but continuously evolving. It is commonly realized that we cannot use a unified ontology to replace all local ontologies because they are context-dependent and need flexibility. However, without coordination of standards, freely developed local ontologies and databases will bring enormous work of mediation between them. Finding a balance between standardization and flexibility for evolving ontologies, in a practical sense, requires negotiations (i.e. conversations, agreements and collaborations) between different local pragmatic contexts. The purpose of this work is to set up a computer-friendly model representing local pragmatic contexts (i.e. geodata sources), and propose a practical semantic negotiation procedure for approaching pragmatic interoperability between local pragmatic contexts. Information agents, objective facts and subjective dimensions are reviewed as elements of a conceptual model for representing pragmatic contexts. The author uses them to draw a practical semantic negotiation procedure approaching pragmatic interoperability of distributed geodata. The proposed conceptual model and semantic negotiation procedure were encoded with Description Logic, and then applied to analyze and manipulate semantic negotiations between different local ontologies within the National Mineral Resources Assessment (NMRA) project of China, which involves multi-source and multi-subject geodata sharing.

  20. Reference architecture and interoperability model for data mining and fusion in scientific cross-domain infrastructures

    NASA Astrophysics Data System (ADS)

    Haener, Rainer; Waechter, Joachim; Grellet, Sylvain; Robida, Francois

    2017-04-01

    Interoperability is the key factor in establishing scientific research environments and infrastructures, as well as in bringing together heterogeneous, geographically distributed risk management, monitoring, and early warning systems. Based on developments within the European Plate Observing System (EPOS), a reference architecture has been devised that comprises architectural blue-prints and interoperability models regarding the specification of business processes and logic as well as the encoding of data, metadata, and semantics. The architectural blueprint is developed on the basis of the so called service-oriented architecture (SOA) 2.0 paradigm, which combines intelligence and proactiveness of event-driven with service-oriented architectures. SOA 2.0 supports analysing (Data Mining) both, static and real-time data in order to find correlations of disparate information that do not at first appear to be intuitively obvious: Analysed data (e.g., seismological monitoring) can be enhanced with relationships discovered by associating them (Data Fusion) with other data (e.g., creepmeter monitoring), with digital models of geological structures, or with the simulation of geological processes. The interoperability model describes the information, communication (conversations) and the interactions (choreographies) of all participants involved as well as the processes for registering, providing, and retrieving information. It is based on the principles of functional integration, implemented via dedicated services, communicating via service-oriented and message-driven infrastructures. The services provide their functionality via standardised interfaces: Instead of requesting data directly, users share data via services that are built upon specific adapters. This approach replaces the tight coupling at data level by a flexible dependency on loosely coupled services. The main component of the interoperability model is the comprehensive semantic description of the information, business logic and processes on the basis of a minimal set of well-known, established standards. It implements the representation of knowledge with the application of domain-controlled vocabularies to statements about resources, information, facts, and complex matters (ontologies). Seismic experts for example, would be interested in geological models or borehole measurements at a certain depth, based on which it is possible to correlate and verify seismic profiles. The entire model is built upon standards from the Open Geospatial Consortium (Dictionaries, Service Layer), the International Organisation for Standardisation (Registries, Metadata), and the World Wide Web Consortium (Resource Description Framework, Spatial Data on the Web Best Practices). It has to be emphasised that this approach is scalable to the greatest possible extent: All information, necessary in the context of cross-domain infrastructures is referenced via vocabularies and knowledge bases containing statements that provide either the information itself or resources (service-endpoints), the information can be retrieved from. The entire infrastructure communication is subject to a broker-based business logic integration platform where the information exchanged between involved participants, is managed on the basis of standardised dictionaries, repositories, and registries. This approach also enables the development of Systems-of-Systems (SoS), which allow the collaboration of autonomous, large scale concurrent, and distributed systems, yet cooperatively interacting as a collective in a common environment.

Top