Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images
Cao, Jianfang; Chen, Lichao
2015-01-01
With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP) neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance. PMID:25838818
Modeling loosely annotated images using both given and imagined annotations
NASA Astrophysics Data System (ADS)
Tang, Hong; Boujemaa, Nozha; Chen, Yunhao; Deng, Lei
2011-12-01
In this paper, we present an approach to learn latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: 1. ambiguous correspondences between visual features and annotated keywords; 2. incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning a topic model. In particular, some ``imagined'' keywords are poured into the incomplete annotation through measuring similarity between keywords in terms of their co-occurrence. Then, both given and imagined annotations are employed to learn probabilistic topic models for automatically annotating new images. We conduct experiments on two image databases (i.e., Corel and ESP) coupled with their loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods. The proposed method improves word-driven probability latent semantic analysis (PLSA-words) up to a comparable performance with the best discrete annotation method, while a merit of PLSA-words is still kept, i.e., a wider semantic range.
Real-time image annotation by manifold-based biased Fisher discriminant analysis
NASA Astrophysics Data System (ADS)
Ji, Rongrong; Yao, Hongxun; Wang, Jicheng; Sun, Xiaoshuai; Liu, Xianming
2008-01-01
Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval. However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S) problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords) selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
Ontology modularization to improve semantic medical image annotation.
Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul
2011-02-01
Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results. Copyright © 2010 Elsevier Inc. All rights reserved.
iPad: Semantic annotation and markup of radiological images.
Rubin, Daniel L; Rodriguez, Cesar; Shah, Priyanka; Beaulieu, Chris
2008-11-06
Radiological images contain a wealth of information,such as anatomy and pathology, which is often not explicit and computationally accessible. Information schemes are being developed to describe the semantic content of images, but such schemes can be unwieldy to operationalize because there are few tools to enable users to capture structured information easily as part of the routine research workflow. We have created iPad, an open source tool enabling researchers and clinicians to create semantic annotations on radiological images. iPad hides the complexity of the underlying image annotation information model from users, permitting them to describe images and image regions using a graphical interface that maps their descriptions to structured ontologies semi-automatically. Image annotations are saved in a variety of formats,enabling interoperability among medical records systems, image archives in hospitals, and the Semantic Web. Tools such as iPad can help reduce the burden of collecting structured information from images, and it could ultimately enable researchers and physicians to exploit images on a very large scale and glean the biological and physiological significance of image content.
SemVisM: semantic visualizer for medical image
NASA Astrophysics Data System (ADS)
Landaeta, Luis; La Cruz, Alexandra; Baranya, Alexander; Vidal, María.-Esther
2015-01-01
SemVisM is a toolbox that combines medical informatics and computer graphics tools for reducing the semantic gap between low-level features and high-level semantic concepts/terms in the images. This paper presents a novel strategy for visualizing medical data annotated semantically, combining rendering techniques, and segmentation algorithms. SemVisM comprises two main components: i) AMORE (A Modest vOlume REgister) to handle input data (RAW, DAT or DICOM) and to initially annotate the images using terms defined on medical ontologies (e.g., MesH, FMA or RadLex), and ii) VOLPROB (VOlume PRObability Builder) for generating the annotated volumetric data containing the classified voxels that belong to a particular tissue. SemVisM is built on top of the semantic visualizer ANISE.1
Kurtz, Camille; Depeursinge, Adrien; Napel, Sandy; Beaulieu, Christopher F.; Rubin, Daniel L.
2014-01-01
Computer-assisted image retrieval applications can assist radiologists by identifying similar images in archives as a means to providing decision support. In the classical case, images are described using low-level features extracted from their contents, and an appropriate distance is used to find the best matches in the feature space. However, using low-level image features to fully capture the visual appearance of diseases is challenging and the semantic gap between these features and the high-level visual concepts in radiology may impair the system performance. To deal with this issue, the use of semantic terms to provide high-level descriptions of radiological image contents has recently been advocated. Nevertheless, most of the existing semantic image retrieval strategies are limited by two factors: they require manual annotation of the images using semantic terms and they ignore the intrinsic visual and semantic relationships between these annotations during the comparison of the images. Based on these considerations, we propose an image retrieval framework based on semantic features that relies on two main strategies: (1) automatic “soft” prediction of ontological terms that describe the image contents from multi-scale Riesz wavelets and (2) retrieval of similar images by evaluating the similarity between their annotations using a new term dissimilarity measure, which takes into account both image-based and ontological term relations. The combination of these strategies provides a means of accurately retrieving similar images in databases based on image annotations and can be considered as a potential solution to the semantic gap problem. We validated this approach in the context of the retrieval of liver lesions from computed tomographic (CT) images and annotated with semantic terms of the RadLex ontology. The relevance of the retrieval results was assessed using two protocols: evaluation relative to a dissimilarity reference standard defined for pairs of images on a 25-images dataset, and evaluation relative to the diagnoses of the retrieved images on a 72-images dataset. A normalized discounted cumulative gain (NDCG) score of more than 0.92 was obtained with the first protocol, while AUC scores of more than 0.77 were obtained with the second protocol. This automatical approach could provide real-time decision support to radiologists by showing them similar images with associated diagnoses and, where available, responses to therapies. PMID:25036769
Comparative analysis of semantic localization accuracies between adult and pediatric DICOM CT images
NASA Astrophysics Data System (ADS)
Robertson, Duncan; Pathak, Sayan D.; Criminisi, Antonio; White, Steve; Haynor, David; Chen, Oliver; Siddiqui, Khan
2012-02-01
Existing literature describes a variety of techniques for semantic annotation of DICOM CT images, i.e. the automatic detection and localization of anatomical structures. Semantic annotation facilitates enhanced image navigation, linkage of DICOM image content and non-image clinical data, content-based image retrieval, and image registration. A key challenge for semantic annotation algorithms is inter-patient variability. However, while the algorithms described in published literature have been shown to cope adequately with the variability in test sets comprising adult CT scans, the problem presented by the even greater variability in pediatric anatomy has received very little attention. Most existing semantic annotation algorithms can only be extended to work on scans of both adult and pediatric patients by adapting parameters heuristically in light of patient size. In contrast, our approach, which uses random regression forests ('RRF'), learns an implicit model of scale variation automatically using training data. In consequence, anatomical structures can be localized accurately in both adult and pediatric CT studies without the need for parameter adaptation or additional information about patient scale. We show how the RRF algorithm is able to learn scale invariance from a combined training set containing a mixture of pediatric and adult scans. Resulting localization accuracy for both adult and pediatric data remains comparable with that obtained using RRFs trained and tested using only adult data.
Linking DICOM pixel data with radiology reports using automatic semantic annotation
NASA Astrophysics Data System (ADS)
Pathak, Sayan D.; Kim, Woojin; Munasinghe, Indeera; Criminisi, Antonio; White, Steve; Siddiqui, Khan
2012-02-01
Improved access to DICOM studies to both physicians and patients is changing the ways medical imaging studies are visualized and interpreted beyond the confines of radiologists' PACS workstations. While radiologists are trained for viewing and image interpretation, a non-radiologist physician relies on the radiologists' reports. Consequently, patients historically have been typically informed about their imaging findings via oral communication with their physicians, even though clinical studies have shown that patients respond to physician's advice significantly better when the individual patients are shown their own actual data. Our previous work on automated semantic annotation of DICOM Computed Tomography (CT) images allows us to further link radiology report with the corresponding images, enabling us to bridge the gap between image data with the human interpreted textual description of the corresponding imaging studies. The mapping of radiology text is facilitated by natural language processing (NLP) based search application. When combined with our automated semantic annotation of images, it enables navigation in large DICOM studies by clicking hyperlinked text in the radiology reports. An added advantage of using semantic annotation is the ability to render the organs to their default window level setting thus eliminating another barrier to image sharing and distribution. We believe such approaches would potentially enable the consumer to have access to their imaging data and navigate them in an informed manner.
Semantic-gap-oriented active learning for multilabel image annotation.
Tang, Jinhui; Zha, Zheng-Jun; Tao, Dacheng; Chua, Tat-Seng
2012-04-01
User interaction is an effective way to handle the semantic gap problem in image annotation. To minimize user effort in the interactions, many active learning methods were proposed. These methods treat the semantic concepts individually or correlatively. However, they still neglect the key motivation of user feedback: to tackle the semantic gap. The size of the semantic gap of each concept is an important factor that affects the performance of user feedback. User should pay more efforts to the concepts with large semantic gaps, and vice versa. In this paper, we propose a semantic-gap-oriented active learning method, which incorporates the semantic gap measure into the information-minimization-based sample selection strategy. The basic learning model used in the active learning framework is an extended multilabel version of the sparse-graph-based semisupervised learning method that incorporates the semantic correlation. Extensive experiments conducted on two benchmark image data sets demonstrated the importance of bringing the semantic gap measure into the active learning process.
A neotropical Miocene pollen database employing image-based search and semantic modeling.
Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W; Jaramillo, Carlos; Shyu, Chi-Ren
2014-08-01
Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.
Semantically Interoperable XML Data
Vergara-Niedermayr, Cristobal; Wang, Fusheng; Pan, Tony; Kurc, Tahsin; Saltz, Joel
2013-01-01
XML is ubiquitously used as an information exchange platform for web-based applications in healthcare, life sciences, and many other domains. Proliferating XML data are now managed through latest native XML database technologies. XML data sources conforming to common XML schemas could be shared and integrated with syntactic interoperability. Semantic interoperability can be achieved through semantic annotations of data models using common data elements linked to concepts from ontologies. In this paper, we present a framework and software system to support the development of semantic interoperable XML based data sources that can be shared through a Grid infrastructure. We also present our work on supporting semantic validated XML data through semantic annotations for XML Schema, semantic validation and semantic authoring of XML data. We demonstrate the use of the system for a biomedical database of medical image annotations and markups. PMID:25298789
A memory learning framework for effective image retrieval.
Han, Junwei; Ngan, King N; Li, Mingjing; Zhang, Hong-Jiang
2005-04-01
Most current content-based image retrieval systems are still incapable of providing users with their desired results. The major difficulty lies in the gap between low-level image features and high-level image semantics. To address the problem, this study reports a framework for effective image retrieval by employing a novel idea of memory learning. It forms a knowledge memory model to store the semantic information by simply accumulating user-provided interactions. A learning strategy is then applied to predict the semantic relationships among images according to the memorized knowledge. Image queries are finally performed based on a seamless combination of low-level features and learned semantics. One important advantage of our framework is its ability to efficiently annotate images and also propagate the keyword annotation from the labeled images to unlabeled images. The presented algorithm has been integrated into a practical image retrieval system. Experiments on a collection of 10,000 general-purpose images demonstrate the effectiveness of the proposed framework.
A neotropical Miocene pollen database employing image-based search and semantic modeling1
Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren
2014-01-01
• Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648
Semantics-Based Intelligent Indexing and Retrieval of Digital Images - A Case Study
NASA Astrophysics Data System (ADS)
Osman, Taha; Thakker, Dhavalkumar; Schaefer, Gerald
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they typically rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this chapter we present a semantically enabled image annotation and retrieval engine that is designed to satisfy the requirements of commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as presenting our initial thoughts on exploiting lexical databases for explicit semantic-based query expansion.
Modeling semantic aspects for cross-media image indexing.
Monay, Florent; Gatica-Perez, Daniel
2007-10-01
To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.
NASA Astrophysics Data System (ADS)
Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.
2009-02-01
In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.
Automatic medical image annotation and keyword-based image retrieval using relevance feedback.
Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal
2012-08-01
This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes
NASA Astrophysics Data System (ADS)
Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei
2016-03-01
Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Automatic textual annotation of video news based on semantic visual object extraction
NASA Astrophysics Data System (ADS)
Boujemaa, Nozha; Fleuret, Francois; Gouet, Valerie; Sahbi, Hichem
2003-12-01
In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
NASA Astrophysics Data System (ADS)
Frikha, Mayssa; Fendri, Emna; Hammami, Mohamed
2017-09-01
Using semantic attributes such as gender, clothes, and accessories to describe people's appearance is an appealing modeling method for video surveillance applications. We proposed a midlevel appearance signature based on extracting a list of nameable semantic attributes describing the body in uncontrolled acquisition conditions. Conventional approaches extract the same set of low-level features to learn the semantic classifiers uniformly. Their critical limitation is the inability to capture the dominant visual characteristics for each trait separately. The proposed approach consists of extracting low-level features in an attribute-adaptive way by automatically selecting the most relevant features for each attribute separately. Furthermore, relying on a small training-dataset would easily lead to poor performance due to the large intraclass and interclass variations. We annotated large scale people images collected from different person reidentification benchmarks covering a large attribute sample and reflecting the challenges of uncontrolled acquisition conditions. These annotations were gathered into an appearance semantic attribute dataset that contains 3590 images annotated with 14 attributes. Various experiments prove that carefully designed features for learning the visual characteristics for an attribute provide an improvement of the correct classification accuracy and a reduction of both spatial and temporal complexities against state-of-the-art approaches.
Kurtz, Camille; Beaulieu, Christopher F.; Napel, Sandy; Rubin, Daniel L.
2014-01-01
Computer-assisted image retrieval applications could assist radiologist interpretations by identifying similar images in large archives as a means to providing decision support. However, the semantic gap between low-level image features and their high level semantics may impair the system performances. Indeed, it can be challenging to comprehensively characterize the images using low-level imaging features to fully capture the visual appearance of diseases on images, and recently the use of semantic terms has been advocated to provide semantic descriptions of the visual contents of images. However, most of the existing image retrieval strategies do not consider the intrinsic properties of these terms during the comparison of the images beyond treating them as simple binary (presence/absence) features. We propose a new framework that includes semantic features in images and that enables retrieval of similar images in large databases based on their semantic relations. It is based on two main steps: (1) annotation of the images with semantic terms extracted from an ontology, and (2) evaluation of the similarity of image pairs by computing the similarity between the terms using the Hierarchical Semantic-Based Distance (HSBD) coupled to an ontological measure. The combination of these two steps provides a means of capturing the semantic correlations among the terms used to characterize the images that can be considered as a potential solution to deal with the semantic gap problem. We validate this approach in the context of the retrieval and the classification of 2D regions of interest (ROIs) extracted from computed tomographic (CT) images of the liver. Under this framework, retrieval accuracy of more than 0.96 was obtained on a 30-images dataset using the Normalized Discounted Cumulative Gain (NDCG) index that is a standard technique used to measure the effectiveness of information retrieval algorithms when a separate reference standard is available. Classification results of more than 95% were obtained on a 77-images dataset. For comparison purpose, the use of the Earth Mover's Distance (EMD), which is an alternative distance metric that considers all the existing relations among the terms, led to results retrieval accuracy of 0.95 and classification results of 93% with a higher computational cost. The results provided by the presented framework are competitive with the state-of-the-art and emphasize the usefulness of the proposed methodology for radiology image retrieval and classification. PMID:24632078
Semantator: semantic annotator for converting biomedical text to linked data.
Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G
2013-10-01
More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.
Semantic annotation in biomedicine: the current landscape.
Jovanović, Jelena; Bagheri, Ebrahim
2017-09-22
The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.
Semantic labeling of digital photos by classification
NASA Astrophysics Data System (ADS)
Ciocca, Gianluigi; Cusano, Claudio; Schettini, Raimondo; Brambilla, Carla
2003-01-01
The paper addresses the problem of annotating photographs with broad semantic labels. To cope with the great variety of photos available on the WEB we have designed a hierarchical classification strategy which first classifies images as pornographic or not-pornographic. Not-pornographic images are then classified as indoor, outdoor, or close-up. On a database of over 9000 images, mostly downloaded from the web, our method achieves an average accuracy of close to 90%.
Dugas, Martin; Meidt, Alexandra; Neuhaus, Philipp; Storck, Michael; Varghese, Julian
2016-06-01
The volume and complexity of patient data - especially in personalised medicine - is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare. Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don't provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository. A web-based tool called ODMedit ( https://odmeditor.uni-muenster.de/ ) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal. Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.
RysannMD: A biomedical semantic annotator balancing speed and accuracy.
Cuzzola, John; Jovanović, Jelena; Bagheri, Ebrahim
2017-07-01
Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F 1 measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F 1 measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously. Copyright © 2017 Elsevier Inc. All rights reserved.
Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert
2012-01-01
Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786
Game-powered machine learning.
Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert
2012-04-24
Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.
NASA Astrophysics Data System (ADS)
Johnson, Matthew; Brostow, G. J.; Shotton, J.; Kwatra, V.; Cipolla, R.
2007-02-01
Composite images are synthesized from existing photographs by artists who make concept art, e.g. storyboards for movies or architectural planning. Current techniques allow an artist to fabricate such an image by digitally splicing parts of stock photographs. While these images serve mainly to "quickly" convey how a scene should look, their production is laborious. We propose a technique that allows a person to design a new photograph with substantially less effort. This paper presents a method that generates a composite image when a user types in nouns, such as "boat" and "sand." The artist can optionally design an intended image by specifying other constraints. Our algorithm formulates the constraints as queries to search an automatically annotated image database. The desired photograph, not a collage, is then synthesized using graph-cut optimization, optionally allowing for further user interaction to edit or choose among alternative generated photos. Our results demonstrate our contributions of (1) a method of creating specific images with minimal human effort, and (2) a combined algorithm for automatically building an image library with semantic annotations from any photo collection.
Wide coverage biomedical event extraction using multiple partially overlapping corpora
2013-01-01
Background Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. Results We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. Conclusions The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. PMID:23731785
NASA Astrophysics Data System (ADS)
Vega, Francisco; Pérez, Wilson; Tello, Andrés.; Saquicela, Victor; Espinoza, Mauricio; Solano-Quinde, Lizandro; Vidal, Maria-Esther; La Cruz, Alexandra
2015-12-01
Advances in medical imaging have fostered medical diagnosis based on digital images. Consequently, the number of studies by medical images diagnosis increases, thus, collaborative work and tele-radiology systems are required to effectively scale up to this diagnosis trend. We tackle the problem of the collaborative access of medical images, and present WebMedSA, a framework to manage large datasets of medical images. WebMedSA relies on a PACS and supports the ontological annotation, as well as segmentation and visualization of the images based on their semantic description. Ontological annotations can be performed directly on the volumetric image or at different image planes (e.g., axial, coronal, or sagittal); furthermore, annotations can be complemented after applying a segmentation technique. WebMedSA is based on three main steps: (1) RDF-ization process for extracting, anonymizing, and serializing metadata comprised in DICOM medical images into RDF/XML; (2) Integration of different biomedical ontologies (using L-MOM library), making this approach ontology independent; and (3) segmentation and visualization of annotated data which is further used to generate new annotations according to expert knowledge, and validation. Initial user evaluations suggest that WebMedSA facilitates the exchange of knowledge between radiologists, and provides the basis for collaborative work among them.
A semantic model for multimodal data mining in healthcare information systems.
Iakovidis, Dimitris; Smailis, Christos
2012-01-01
Electronic health records (EHRs) are representative examples of multimodal/multisource data collections; including measurements, images and free texts. The diversity of such information sources and the increasing amounts of medical data produced by healthcare institutes annually, pose significant challenges in data mining. In this paper we present a novel semantic model that describes knowledge extracted from the lowest-level of a data mining process, where information is represented by multiple features i.e. measurements or numerical descriptors extracted from measurements, images, texts or other medical data, forming multidimensional feature spaces. Knowledge collected by manual annotation or extracted by unsupervised data mining from one or more feature spaces is modeled through generalized qualitative spatial semantics. This model enables a unified representation of knowledge across multimodal data repositories. It contributes to bridging the semantic gap, by enabling direct links between low-level features and higher-level concepts e.g. describing body parts, anatomies and pathological findings. The proposed model has been developed in web ontology language based on description logics (OWL-DL) and can be applied to a variety of data mining tasks in medical informatics. It utility is demonstrated for automatic annotation of medical data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rios Velazquez, E; Parmar, C; Narayan, V
Purpose: To compare the complementary value of quantitative radiomic features to that of radiologist-annotated semantic features in predicting EGFR mutations in lung adenocarcinomas. Methods: Pre-operative CT images of 258 lung adenocarcinoma patients were available. Tumors were segmented using the sing-click ensemble segmentation algorithm. A set of radiomic features was extracted using 3D-Slicer. Test-retest reproducibility and unsupervised dimensionality reduction were applied to select a subset of reproducible and independent radiomic features. Twenty semantic annotations were scored by an expert radiologist, describing the tumor, surrounding tissue and associated findings. Minimum-redundancy-maximum-relevance (MRMR) was used to identify the most informative radiomic and semantic featuresmore » in 172 patients (training-set, temporal split). Radiomic, semantic and combined radiomic-semantic logistic regression models to predict EGFR mutations were evaluated in and independent validation dataset of 86 patients using the area under the receiver operating curve (AUC). Results: EGFR mutations were found in 77/172 (45%) and 39/86 (45%) of the training and validation sets, respectively. Univariate AUCs showed a similar range for both feature types: radiomics median AUC = 0.57 (range: 0.50 – 0.62); semantic median AUC = 0.53 (range: 0.50 – 0.64, Wilcoxon p = 0.55). After MRMR feature selection, the best-performing radiomic, semantic, and radiomic-semantic logistic regression models, for EGFR mutations, showed a validation AUC of 0.56 (p = 0.29), 0.63 (p = 0.063) and 0.67 (p = 0.004), respectively. Conclusion: Quantitative volumetric and textural Radiomic features complement the qualitative and semi-quantitative radiologist annotations. The prognostic value of informative qualitative semantic features such as cavitation and lobulation is increased with the addition of quantitative textural features from the tumor region.« less
A graph-based semantic similarity measure for the gene ontology.
Alvarez, Marco A; Yan, Changhui
2011-12-01
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator
NASA Astrophysics Data System (ADS)
Seyed, P.; Chastain, K.; McGuinness, D. L.
2013-12-01
Use of Semantic Web technologies for data management in the Earth sciences (and beyond) has great potential but is still in its early stages, since the challenges of translating data into a more explicit or semantic form for immediate use within applications has not been fully addressed. In this abstract we help address this challenge by introducing the SemantEco Annotator, which enables anyone, regardless of expertise, to semantically annotate tabular Earth Science data and translate it into linked data format, while applying the logic inherent in community-standard vocabularies to guide the process. The Annotator was conceived under a desire to unify dataset content from a variety of sources under common vocabularies, for use in semantically-enabled web applications. Our current use case employs linked data generated by the Annotator for use in the SemantEco environment, which utilizes semantics to help users explore, search, and visualize water or air quality measurement and species occurrence data through a map-based interface. The generated data can also be used immediately to facilitate discovery and search capabilities within 'big data' environments. The Annotator provides a method for taking information about a dataset, that may only be known to its maintainers, and making it explicit, in a uniform and machine-readable fashion, such that a person or information system can more easily interpret the underlying structure and meaning. Its primary mechanism is to enable a user to formally describe how columns of a tabular dataset relate and/or describe entities. For example, if a user identifies columns for latitude and longitude coordinates, we can infer the data refers to a point that can be plotted on a map. Further, it can be made explicit that measurements of 'nitrate' and 'NO3-' are of the same entity through vocabulary assignments, thus more easily utilizing data sets that use different nomenclatures. The Annotator provides an extensive and searchable library of vocabularies to assist the user in locating terms to describe observed entities, their properties, and relationships. The Annotator leverages vocabulary definitions of these concepts to guide the user in describing data in a logically consistent manner. The vocabularies made available through the Annotator are open, as is the Annotator itself. We have taken a step towards making semantic annotation/translation of data more accessible. Our vision for the Annotator is as a tool that can be integrated into a semantic data 'workbench' environment, which would allow semantic annotation of a variety of data formats, using standard vocabularies. These vocabularies involved enable search for similar datasets, and integration with any semantically-enabled applications for analysis and visualization.
Semantic orchestration of image processing services for environmental analysis
NASA Astrophysics Data System (ADS)
Ranisavljević, Élisabeth; Devin, Florent; Laffly, Dominique; Le Nir, Yannick
2013-09-01
In order to analyze environmental dynamics, a major process is the classification of the different phenomena of the site (e.g. ice and snow for a glacier). When using in situ pictures, this classification requires data pre-processing. Not all the pictures need the same sequence of processes depending on the disturbances. Until now, these sequences have been done manually, which restricts the processing of large amount of data. In this paper, we present how to realize a semantic orchestration to automate the sequencing for the analysis. It combines two advantages: solving the problem of the amount of processing, and diversifying the possibilities in the data processing. We define a BPEL description to express the sequences. This BPEL uses some web services to run the data processing. Each web service is semantically annotated using an ontology of image processing. The dynamic modification of the BPEL is done using SPARQL queries on these annotated web services. The results obtained by a prototype implementing this method validate the construction of the different workflows that can be applied to a large number of pictures.
Towards comprehensive syntactic and semantic annotations of the clinical narrative
Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F; Warner, Colin; Hwang, Jena D; Choi, Jinho D; Dligach, Dmitriy; Nielsen, Rodney D; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana K
2013-01-01
Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible. PMID:23355458
Corpus annotation for mining biomedical events from literature
Kim, Jin-Dong; Ohta, Tomoko; Tsujii, Jun'ichi
2008-01-01
Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain. PMID:18182099
Validating automatic semantic annotation of anatomy in DICOM CT images
NASA Astrophysics Data System (ADS)
Pathak, Sayan D.; Criminisi, Antonio; Shotton, Jamie; White, Steve; Robertson, Duncan; Sparks, Bobbi; Munasinghe, Indeera; Siddiqui, Khan
2011-03-01
In the current health-care environment, the time available for physicians to browse patients' scans is shrinking due to the rapid increase in the sheer number of images. This is further aggravated by mounting pressure to become more productive in the face of decreasing reimbursement. Hence, there is an urgent need to deliver technology which enables faster and effortless navigation through sub-volume image visualizations. Annotating image regions with semantic labels such as those derived from the RADLEX ontology can vastly enhance image navigation and sub-volume visualization. This paper uses random regression forests for efficient, automatic detection and localization of anatomical structures within DICOM 3D CT scans. A regression forest is a collection of decision trees which are trained to achieve direct mapping from voxels to organ location and size in a single pass. This paper focuses on comparing automated labeling with expert-annotated ground-truth results on a database of 50 highly variable CT scans. Initial investigations show that regression forest derived localization errors are smaller and more robust than those achieved by state-of-the-art global registration approaches. The simplicity of the algorithm's context-rich visual features yield typical runtimes of less than 10 seconds for a 5123 voxel DICOM CT series on a single-threaded, single-core machine running multiple trees; each tree taking less than a second. Furthermore, qualitative evaluation demonstrates that using the detected organs' locations as index into the image volume improves the efficiency of the navigational workflow in all the CT studies.
Yang, Liu; Jin, Rong; Mummert, Lily; Sukthankar, Rahul; Goode, Adam; Zheng, Bin; Hoi, Steven C H; Satyanarayanan, Mahadev
2010-01-01
Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, "similarity" can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one goal without consideration of the other. This is problematic for medical image retrieval where the goal is to assist doctors in decision making. In these applications, given a query image, the goal is to retrieve similar images from a reference library whose semantic annotations could provide the medical professional with greater insight into the possible interpretations of the query image. If the system were to retrieve images that did not look like the query, then users would be less likely to trust the system; on the other hand, retrieving images that appear superficially similar to the query but are semantically unrelated is undesirable because that could lead users toward an incorrect diagnosis. Hence, learning a distance metric that preserves both visual resemblance and semantic similarity is important. We emphasize that, although our study is focused on medical image retrieval, the problem addressed in this work is critical to many image retrieval systems. We present a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities. The boosting framework first learns a binary representation using side information, in the form of labeled pairs, and then computes the distance as a weighted Hamming distance using the learned binary representation. A boosting algorithm is presented to efficiently learn the distance function. We evaluate the proposed algorithm on a mammographic image reference library with an Interactive Search-Assisted Decision Support (ISADS) system and on the medical image data set from ImageCLEF. Our results show that the boosting framework compares favorably to state-of-the-art approaches for distance metric learning in retrieval accuracy, with much lower computational cost. Additional evaluation with the COREL collection shows that our algorithm works well for regular image data sets.
Transductive multi-view zero-shot learning.
Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Gong, Shaogang
2015-11-01
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.
Current and future trends in marine image annotation software
NASA Astrophysics Data System (ADS)
Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.
2016-12-01
Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images. Integration into available MIAS is currently limited to semi-automated processes of pixel recognition through computer-vision modules that compile expert-based knowledge. Important topics aiding the choice of a specific software are outlined, the ideal software is discussed and future trends are presented.
Image annotation by deep neural networks with attention shaping
NASA Astrophysics Data System (ADS)
Zheng, Kexin; Lv, Shaohe; Ma, Fang; Chen, Fei; Jin, Chi; Dou, Yong
2017-07-01
Image annotation is a task of assigning semantic labels to an image. Recently, deep neural networks with visual attention have been utilized successfully in many computer vision tasks. In this paper, we show that conventional attention mechanism is easily misled by the salient class, i.e., the attended region always contains part of the image area describing the content of salient class at different attention iterations. To this end, we propose a novel attention shaping mechanism, which aims to maximize the non-overlapping area between consecutive attention processes by taking into account the history of previous attention vectors. Several weighting polices are studied to utilize the history information in different manners. In two benchmark datasets, i.e., PASCAL VOC2012 and MIRFlickr-25k, the average precision is improved by up to 10% in comparison with the state-of-the-art annotation methods.
Reliability in content analysis: The case of semantic feature norms classification.
Bolognesi, Marianna; Pilgram, Roosmaryn; van den Heerik, Romy
2017-12-01
Semantic feature norms (e.g., STIMULUS: car → RESPONSE:
Atmosphere-based image classification through luminance and hue
NASA Astrophysics Data System (ADS)
Xu, Feng; Zhang, Yujin
2005-07-01
In this paper a novel image classification system is proposed. Atmosphere serves an important role in generating the scene"s topic or in conveying the message behind the scene"s story, which belongs to abstract attribute level in semantic levels. At first, five atmosphere semantic categories are defined according to rules of photo and film grammar, followed by global luminance and hue features. Then the hierarchical SVM classifiers are applied. In each classification stage, corresponding features are extracted and the trained linear SVM is implemented, resulting in two classes. After three stages of classification, five atmosphere categories are obtained. At last, the text annotation of the atmosphere semantics and the corresponding features by Extensible Markup Language (XML) in MPEG-7 is defined, which can be integrated into more multimedia applications (such as searching, indexing and accessing of multimedia content). The experiment is performed on Corel images and film frames. The classification results prove the effectiveness of the definition of atmosphere semantic classes and the corresponding features.
Annotating images by mining image search results.
Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying
2008-11-01
Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.
Nonlinear Deep Kernel Learning for Image Annotation.
Jiu, Mingyuan; Sahbi, Hichem
2017-02-08
Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.
SAS- Semantic Annotation Service for Geoscience resources on the web
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.
2015-12-01
There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
Effective user guidance in online interactive semantic segmentation
NASA Astrophysics Data System (ADS)
Petersen, Jens; Bendszus, Martin; Debus, Jürgen; Heiland, Sabine; Maier-Hein, Klaus H.
2017-03-01
With the recent success of machine learning based solutions for automatic image parsing, the availability of reference image annotations for algorithm training is one of the major bottlenecks in medical image segmentation. We are interested in interactive semantic segmentation methods that can be used in an online fashion to generate expert segmentations. These can be used to train automated segmentation techniques or, from an application perspective, for quick and accurate tumor progression monitoring. Using simulated user interactions in a MRI glioblastoma segmentation task, we show that if the user possesses knowledge of the correct segmentation it is significantly (p <= 0.009) better to present data and current segmentation to the user in such a manner that they can easily identify falsely classified regions compared to guiding the user to regions where the classifier exhibits high uncertainty, resulting in differences of mean Dice scores between +0.070 (Whole tumor) and +0.136 (Tumor Core) after 20 iterations. The annotation process should cover all classes equally, which results in a significant (p <= 0.002) improvement compared to completely random annotations anywhere in falsely classified regions for small tumor regions such as the necrotic tumor core (mean Dice +0.151 after 20 it.) and non-enhancing abnormalities (mean Dice +0.069 after 20 it.). These findings provide important insights for the development of efficient interactive segmentation systems and user interfaces.
Ontology design patterns to disambiguate relations between genes and gene products in GENIA
2011-01-01
Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/. PMID:22166341
Attribute-based classification for zero-shot visual object categorization.
Lampert, Christoph H; Nickisch, Hannes; Harmeling, Stefan
2014-03-01
We study the problem of object recognition for categories for which we have no training examples, a task also called zero--data or zero-shot learning. This situation has hardly been studied in computer vision research, even though it occurs frequently; the world contains tens of thousands of different object classes, and image collections have been formed and suitably annotated for only a few of them. To tackle the problem, we introduce attribute-based classification: Objects are identified based on a high-level description that is phrased in terms of semantic attributes, such as the object's color or shape. Because the identification of each such property transcends the specific learning task at hand, the attribute classifiers can be prelearned independently, for example, from existing image data sets unrelated to the current task. Afterward, new classes can be detected based on their attribute representation, without the need for a new training phase. In this paper, we also introduce a new data set, Animals with Attributes, of over 30,000 images of 50 animal classes, annotated with 85 semantic attributes. Extensive experiments on this and two more data sets show that attribute-based classification indeed is able to categorize images without access to any training images of the target classes.
Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction
Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong
2010-01-01
Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696
Zhao, Jian; Glueck, Michael; Breslav, Simon; Chevalier, Fanny; Khan, Azam
2017-01-01
User-authored annotations of data can support analysts in the activity of hypothesis generation and sensemaking, where it is not only critical to document key observations, but also to communicate insights between analysts. We present annotation graphs, a dynamic graph visualization that enables meta-analysis of data based on user-authored annotations. The annotation graph topology encodes annotation semantics, which describe the content of and relations between data selections, comments, and tags. We present a mixed-initiative approach to graph layout that integrates an analyst's manual manipulations with an automatic method based on similarity inferred from the annotation semantics. Various visual graph layout styles reveal different perspectives on the annotation semantics. Annotation graphs are implemented within C8, a system that supports authoring annotations during exploratory analysis of a dataset. We apply principles of Exploratory Sequential Data Analysis (ESDA) in designing C8, and further link these to an existing task typology in the visualization literature. We develop and evaluate the system through an iterative user-centered design process with three experts, situated in the domain of analyzing HCI experiment data. The results suggest that annotation graphs are effective as a method of visually extending user-authored annotations to data meta-analysis for discovery and organization of ideas.
A Semantic-Oriented Approach for Organizing and Developing Annotation for E-Learning
ERIC Educational Resources Information Center
Brut, Mihaela M.; Sedes, Florence; Dumitrescu, Stefan D.
2011-01-01
This paper presents a solution to extend the IEEE LOM standard with ontology-based semantic annotations for efficient use of learning objects outside Learning Management Systems. The data model corresponding to this approach is first presented. The proposed indexing technique for this model development in order to acquire a better annotation of…
Depeursinge, Adrien; Kurtz, Camille; Beaulieu, Christopher; Napel, Sandy; Rubin, Daniel
2014-08-01
We describe a framework to model visual semantics of liver lesions in CT images in order to predict the visual semantic terms (VST) reported by radiologists in describing these lesions. Computational models of VST are learned from image data using linear combinations of high-order steerable Riesz wavelets and support vector machines (SVM). In a first step, these models are used to predict the presence of each semantic term that describes liver lesions. In a second step, the distances between all VST models are calculated to establish a nonhierarchical computationally-derived ontology of VST containing inter-term synonymy and complementarity. A preliminary evaluation of the proposed framework was carried out using 74 liver lesions annotated with a set of 18 VSTs from the RadLex ontology. A leave-one-patient-out cross-validation resulted in an average area under the ROC curve of 0.853 for predicting the presence of each VST. The proposed framework is expected to foster human-computer synergies for the interpretation of radiological images while using rotation-covariant computational models of VSTs to 1) quantify their local likelihood and 2) explicitly link them with pixel-based image content in the context of a given imaging domain.
DEVA: An extensible ontology-based annotation model for visual document collections
NASA Astrophysics Data System (ADS)
Jelmini, Carlo; Marchand-Maillet, Stephane
2003-01-01
The description of visual documents is a fundamental aspect of any efficient information management system, but the process of manually annotating large collections of documents is tedious and far from being perfect. The need for a generic and extensible annotation model therefore arises. In this paper, we present DEVA, an open, generic and expressive multimedia annotation framework. DEVA is an extension of the Dublin Core specification. The model can represent the semantic content of any visual document. It is described in the ontology language DAML+OIL and can easily be extended with external specialized ontologies, adapting the vocabulary to the given application domain. In parallel, we present the Magritte annotation tool, which is an early prototype that validates the DEVA features. Magritte allows to manually annotating image collections. It is designed with a modular and extensible architecture, which enables the user to dynamically adapt the user interface to specialized ontologies merged into DEVA.
A Novel Approach to Semantic and Coreference Annotation at LLNL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Firpo, M
A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have amore » system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.« less
A Semantic Web-based System for Mining Genetic Mutations in Cancer Clinical Trials.
Priya, Sambhawa; Jiang, Guoqian; Dasari, Surendra; Zimmermann, Michael T; Wang, Chen; Heflin, Jeff; Chute, Christopher G
2015-01-01
Textual eligibility criteria in clinical trial protocols contain important information about potential clinically relevant pharmacogenomic events. Manual curation for harvesting this evidence is intractable as it is error prone and time consuming. In this paper, we develop and evaluate a Semantic Web-based system that captures and manages mutation evidences and related contextual information from cancer clinical trials. The system has 2 main components: an NLP-based annotator and a Semantic Web ontology-based annotation manager. We evaluated the performance of the annotator in terms of precision and recall. We demonstrated the usefulness of the system by conducting case studies in retrieving relevant clinical trials using a collection of mutations identified from TCGA Leukemia patients and Atlas of Genetics and Cytogenetics in Oncology and Haematology. In conclusion, our system using Semantic Web technologies provides an effective framework for extraction, annotation, standardization and management of genetic mutations in cancer clinical trials.
Combining rules, background knowledge and change patterns to maintain semantic annotations.
Cardoso, Silvio Domingos; Chantal, Reynaud-Delaître; Da Silveira, Marcos; Pruski, Cédric
2017-01-01
Knowledge Organization Systems (KOS) play a key role in enriching biomedical information in order to make it machine-understandable and shareable. This is done by annotating medical documents, or more specifically, associating concept labels from KOS with pieces of digital information, e.g., images or texts. However, the dynamic nature of KOS may impact the annotations, thus creating a mismatch between the evolved concept and the associated information. To solve this problem, methods to maintain the quality of the annotations are required. In this paper, we define a framework based on rules, background knowledge and change patterns to drive the annotation adaption process. We evaluate experimentally the proposed approach in realistic cases-studies and demonstrate the overall performance of our approach in different KOS considering the precision, recall, F1-score and AUC value of the system.
Combining rules, background knowledge and change patterns to maintain semantic annotations
Cardoso, Silvio Domingos; Chantal, Reynaud-Delaître; Da Silveira, Marcos; Pruski, Cédric
2017-01-01
Knowledge Organization Systems (KOS) play a key role in enriching biomedical information in order to make it machine-understandable and shareable. This is done by annotating medical documents, or more specifically, associating concept labels from KOS with pieces of digital information, e.g., images or texts. However, the dynamic nature of KOS may impact the annotations, thus creating a mismatch between the evolved concept and the associated information. To solve this problem, methods to maintain the quality of the annotations are required. In this paper, we define a framework based on rules, background knowledge and change patterns to drive the annotation adaption process. We evaluate experimentally the proposed approach in realistic cases-studies and demonstrate the overall performance of our approach in different KOS considering the precision, recall, F1-score and AUC value of the system. PMID:29854115
BDVC (Bimodal Database of Violent Content): A database of violent audio and video
NASA Astrophysics Data System (ADS)
Rivera Martínez, Jose Luis; Mijes Cruz, Mario Humberto; Rodríguez Vázqu, Manuel Antonio; Rodríguez Espejo, Luis; Montoya Obeso, Abraham; García Vázquez, Mireya Saraí; Ramírez Acosta, Alejandro Álvaro
2017-09-01
Nowadays there is a trend towards the use of unimodal databases for multimedia content description, organization and retrieval applications of a single type of content like text, voice and images, instead bimodal databases allow to associate semantically two different types of content like audio-video, image-text, among others. The generation of a bimodal database of audio-video implies the creation of a connection between the multimedia content through the semantic relation that associates the actions of both types of information. This paper describes in detail the used characteristics and methodology for the creation of the bimodal database of violent content; the semantic relationship is stablished by the proposed concepts that describe the audiovisual information. The use of bimodal databases in applications related to the audiovisual content processing allows an increase in the semantic performance only and only if these applications process both type of content. This bimodal database counts with 580 audiovisual annotated segments, with a duration of 28 minutes, divided in 41 classes. Bimodal databases are a tool in the generation of applications for the semantic web.
Sihong Chen; Jing Qin; Xing Ji; Baiying Lei; Tianfu Wang; Dong Ni; Jie-Zhi Cheng
2017-03-01
The gap between the computational and semantic features is the one of major factors that bottlenecks the computer-aided diagnosis (CAD) performance from clinical usage. To bridge this gap, we exploit three multi-task learning (MTL) schemes to leverage heterogeneous computational features derived from deep learning models of stacked denoising autoencoder (SDAE) and convolutional neural network (CNN), as well as hand-crafted Haar-like and HoG features, for the description of 9 semantic features for lung nodules in CT images. We regard that there may exist relations among the semantic features of "spiculation", "texture", "margin", etc., that can be explored with the MTL. The Lung Image Database Consortium (LIDC) data is adopted in this study for the rich annotation resources. The LIDC nodules were quantitatively scored w.r.t. 9 semantic features from 12 radiologists of several institutes in U.S.A. By treating each semantic feature as an individual task, the MTL schemes select and map the heterogeneous computational features toward the radiologists' ratings with cross validation evaluation schemes on the randomly selected 2400 nodules from the LIDC dataset. The experimental results suggest that the predicted semantic scores from the three MTL schemes are closer to the radiologists' ratings than the scores from single-task LASSO and elastic net regression methods. The proposed semantic attribute scoring scheme may provide richer quantitative assessments of nodules for better support of diagnostic decision and management. Meanwhile, the capability of the automatic association of medical image contents with the clinical semantic terms by our method may also assist the development of medical search engine.
ANALYTiC: An Active Learning System for Trajectory Classification.
Soares Junior, Amilcar; Renso, Chiara; Matwin, Stan
2017-01-01
The increasing availability and use of positioning devices has resulted in large volumes of trajectory data. However, semantic annotations for such data are typically added by domain experts, which is a time-consuming task. Machine-learning algorithms can help infer semantic annotations from trajectory data by learning from sets of labeled data. Specifically, active learning approaches can minimize the set of trajectories to be annotated while preserving good performance measures. The ANALYTiC web-based interactive tool visually guides users through this annotation process.
Computable visually observed phenotype ontological framework for plants
2011-01-01
Background The ability to search for and precisely compare similar phenotypic appearances within and across species has vast potential in plant science and genetic research. The difficulty in doing so lies in the fact that many visual phenotypic data, especially visually observed phenotypes that often times cannot be directly measured quantitatively, are in the form of text annotations, and these descriptions are plagued by semantic ambiguity, heterogeneity, and low granularity. Though several bio-ontologies have been developed to standardize phenotypic (and genotypic) information and permit comparisons across species, these semantic issues persist and prevent precise analysis and retrieval of information. A framework suitable for the modeling and analysis of precise computable representations of such phenotypic appearances is needed. Results We have developed a new framework called the Computable Visually Observed Phenotype Ontological Framework for plants. This work provides a novel quantitative view of descriptions of plant phenotypes that leverages existing bio-ontologies and utilizes a computational approach to capture and represent domain knowledge in a machine-interpretable form. This is accomplished by means of a robust and accurate semantic mapping module that automatically maps high-level semantics to low-level measurements computed from phenotype imagery. The framework was applied to two different plant species with semantic rules mined and an ontology constructed. Rule quality was evaluated and showed high quality rules for most semantics. This framework also facilitates automatic annotation of phenotype images and can be adopted by different plant communities to aid in their research. Conclusions The Computable Visually Observed Phenotype Ontological Framework for plants has been developed for more efficient and accurate management of visually observed phenotypes, which play a significant role in plant genomics research. The uniqueness of this framework is its ability to bridge the knowledge of informaticians and plant science researchers by translating descriptions of visually observed phenotypes into standardized, machine-understandable representations, thus enabling the development of advanced information retrieval and phenotype annotation analysis tools for the plant science community. PMID:21702966
Semantic annotation of consumer health questions.
Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina
2018-02-06
Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel
2013-04-15
In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
2013-01-01
Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394
Basic level scene understanding: categories, attributes and structures
Xiao, Jianxiong; Hays, James; Russell, Bryan C.; Patterson, Genevieve; Ehinger, Krista A.; Torralba, Antonio; Oliva, Aude
2013-01-01
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image. PMID:24009590
Discovering gene annotations in biomedical text databases
Cakmak, Ali; Ozsoyoglu, Gultekin
2008-01-01
Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values. PMID:18325104
Discovering gene annotations in biomedical text databases.
Cakmak, Ali; Ozsoyoglu, Gultekin
2008-03-06
Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan
2017-05-01
To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
Learning multiple relative attributes with humans in the loop.
Qian, Buyue; Wang, Xiang; Cao, Nan; Jiang, Yu-Gang; Davidson, Ian
2014-12-01
Semantic attributes have been recognized as a more spontaneous manner to describe and annotate image content. It is widely accepted that image annotation using semantic attributes is a significant improvement to the traditional binary or multiclass annotation due to its naturally continuous and relative properties. Though useful, existing approaches rely on an abundant supervision and high-quality training data, which limit their applicability. Two standard methods to overcome small amounts of guidance and low-quality training data are transfer and active learning. In the context of relative attributes, this would entail learning multiple relative attributes simultaneously and actively querying a human for additional information. This paper addresses the two main limitations in existing work: 1) it actively adds humans to the learning loop so that minimal additional guidance can be given and 2) it learns multiple relative attributes simultaneously and thereby leverages dependence amongst them. In this paper, we formulate a joint active learning to rank framework with pairwise supervision to achieve these two aims, which also has other benefits such as the ability to be kernelized. The proposed framework optimizes over a set of ranking functions (measuring the strength of the presence of attributes) simultaneously and dependently on each other. The proposed pairwise queries take the form of which one of these two pictures is more natural? These queries can be easily answered by humans. Extensive empirical study on real image data sets shows that our proposed method, compared with several state-of-the-art methods, achieves superior retrieval performance while requires significantly less human inputs.
LEARNING SEMANTICS-ENHANCED LANGUAGE MODELS APPLIED TO UNSUEPRVISED WSD
DOE Office of Scientific and Technical Information (OSTI.GOV)
VERSPOOR, KARIN; LIN, SHOU-DE
An N-gram language model aims at capturing statistical syntactic word order information from corpora. Although the concept of language models has been applied extensively to handle a variety of NLP problems with reasonable success, the standard model does not incorporate semantic information, and consequently limits its applicability to semantic problems such as word sense disambiguation. We propose a framework that integrates semantic information into the language model schema, allowing a system to exploit both syntactic and semantic information to address NLP problems. Furthermore, acknowledging the limited availability of semantically annotated data, we discuss how the proposed model can be learnedmore » without annotated training examples. Finally, we report on a case study showing how the semantics-enhanced language model can be applied to unsupervised word sense disambiguation with promising results.« less
Semantic Similarity in Biomedical Ontologies
Pesquita, Catia; Faria, Daniel; Falcão, André O.; Lord, Phillip; Couto, Francisco M.
2009-01-01
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies. Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research. PMID:19649320
Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life
Thessen, Anne E.; Parr, Cynthia Sims
2014-01-01
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics. PMID:24594988
Developing a knowledge base to support the annotation of ultrasound images of ectopic pregnancy.
Dhombres, Ferdinand; Maurice, Paul; Friszer, Stéphanie; Guilbaud, Lucie; Lelong, Nathalie; Khoshnood, Babak; Charlet, Jean; Perrot, Nicolas; Jauniaux, Eric; Jurkovic, Davor; Jouannic, Jean-Marie
2017-01-31
Ectopic pregnancy is a frequent early complication of pregnancy associated with significant rates of morbidly and mortality. The positive diagnosis of this condition is established through transvaginal ultrasound scanning. The timing of diagnosis depends on the operator expertise in identifying the signs of ectopic pregnancy, which varies dramatically among medical staff with heterogeneous training. Developing decision support systems in this context is expected to improve the identification of these signs and subsequently improve the quality of care. In this article, we present a new knowledge base for ectopic pregnancy, and we demonstrate its use on the annotation of clinical images. The knowledge base is supported by an application ontology, which provides the taxonomy, the vocabulary and definitions for 24 types and 81 signs of ectopic pregnancy, 484 anatomical structures and 32 technical elements for image acquisition. The knowledge base provides a sign-centric model of the domain, with the relations of signs to ectopic pregnancy types, anatomical structures and the technical elements. The evaluation of the ontology and knowledge base demonstrated a positive feedback from a panel of 17 medical users. Leveraging these semantic resources, we developed an application for the annotation of ultrasound images. Using this application, 6 operators achieved a precision of 0.83 for the identification of signs in 208 ultrasound images corresponding to 35 clinical cases of ectopic pregnancy. We developed a new ectopic pregnancy knowledge base for the annotation of ultrasound images. The use of this knowledge base for the annotation of ultrasound images of ectopic pregnancy showed promising results from the perspective of clinical decision support system development. Other gynecological disorders and fetal anomalies may benefit from our approach.
Towards Semantic Modelling of Business Processes for Networked Enterprises
NASA Astrophysics Data System (ADS)
Furdík, Karol; Mach, Marián; Sabol, Tomáš
The paper presents an approach to the semantic modelling and annotation of business processes and information resources, as it was designed within the FP7 ICT EU project SPIKE to support creation and maintenance of short-term business alliances and networked enterprises. A methodology for the development of the resource ontology, as a shareable knowledge model for semantic description of business processes, is proposed. Systematically collected user requirements, conceptual models implied by the selected implementation platform as well as available ontology resources and standards are employed in the ontology creation. The process of semantic annotation is described and illustrated using an example taken from a real application case.
Semantic Search of Web Services
ERIC Educational Resources Information Center
Hao, Ke
2013-01-01
This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.
2016-12-01
Hydrologists today have to integrate resources such as data and models, which originate and reside in multiple autonomous and heterogeneous repositories over the Web. Several resource management systems have emerged within geoscience communities for sharing long-tail data, which are collected by individual or small research groups, and long-tail models, which are developed by scientists or small modeling communities. While these systems have increased the availability of resources within geoscience domains, deficiencies remain due to the heterogeneity in the methods, which are used to describe, encode, and publish information about resources over the Web. This heterogeneity limits our ability to access the right information in the right context so that it can be efficiently retrieved and understood without the Hydrologist's mediation. A primary challenge of the Web today is the lack of the semantic interoperability among the massive number of resources, which already exist and are continually being generated at rapid rates. To address this challenge, we have developed a decentralized GeoSemantic (GS) framework, which provides three sets of micro-web services to support (i) semantic annotation of resources, (ii) semantic alignment between the metadata of two resources, and (iii) semantic mediation among Standard Names. Here we present the design of the framework and demonstrate its application for semantic integration between data and models used in the IML-CZO. First we show how the IML-CZO data are annotated using the Semantic Annotation Services. Then we illustrate how the Resource Alignment Services and Knowledge Integration Services are used to create a semantic workflow among TopoFlow model, which is a spatially-distributed hydrologic model and the annotated data. Results of this work are (i) a demonstration of how the GS framework advances the integration of heterogeneous data and models of water-related disciplines by seamless handling of their semantic heterogeneity, (ii) an introduction of new paradigm for reusing existing and new standards as well as tools and models without the need of their implementation in the Cyberinfrastructures of water-related disciplines, and (iii) an investigation of a methodology by which distributed models can be coupled in a workflow using the GS services.
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks
Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; ...
2015-02-14
Background: Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results: We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstratemore » that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions: Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited.« less
Construction of an annotated corpus to support biomedical information extraction
Thompson, Paul; Iqbal, Syed A; McNaught, John; Ananiadou, Sophia
2009-01-01
Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes. PMID:19852798
Semantic guidance of eye movements in real-world scenes
Hwang, Alex D.; Wang, Hsueh-Cheng; Pomplun, Marc
2011-01-01
The perception of objects in our visual world is influenced by not only their low-level visual features such as shape and color, but also their high-level features such as meaning and semantic relations among them. While it has been shown that low-level features in real-world scenes guide eye movements during scene inspection and search, the influence of semantic similarity among scene objects on eye movements in such situations has not been investigated. Here we study guidance of eye movements by semantic similarity among objects during real-world scene inspection and search. By selecting scenes from the LabelMe object-annotated image database and applying Latent Semantic Analysis (LSA) to the object labels, we generated semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target. An ROC analysis of these maps as predictors of subjects’ gaze transitions between objects during scene inspection revealed a preference for transitions to objects that were semantically similar to the currently inspected one. Furthermore, during the course of a scene search, subjects’ eye movements were progressively guided toward objects that were semantically similar to the search target. These findings demonstrate substantial semantic guidance of eye movements in real-world scenes and show its importance for understanding real-world attentional control. PMID:21426914
Semantic guidance of eye movements in real-world scenes.
Hwang, Alex D; Wang, Hsueh-Cheng; Pomplun, Marc
2011-05-25
The perception of objects in our visual world is influenced by not only their low-level visual features such as shape and color, but also their high-level features such as meaning and semantic relations among them. While it has been shown that low-level features in real-world scenes guide eye movements during scene inspection and search, the influence of semantic similarity among scene objects on eye movements in such situations has not been investigated. Here we study guidance of eye movements by semantic similarity among objects during real-world scene inspection and search. By selecting scenes from the LabelMe object-annotated image database and applying latent semantic analysis (LSA) to the object labels, we generated semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target. An ROC analysis of these maps as predictors of subjects' gaze transitions between objects during scene inspection revealed a preference for transitions to objects that were semantically similar to the currently inspected one. Furthermore, during the course of a scene search, subjects' eye movements were progressively guided toward objects that were semantically similar to the search target. These findings demonstrate substantial semantic guidance of eye movements in real-world scenes and show its importance for understanding real-world attentional control. Copyright © 2011 Elsevier Ltd. All rights reserved.
Semantic Annotations and Querying of Web Data Sources
NASA Astrophysics Data System (ADS)
Hornung, Thomas; May, Wolfgang
A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.
Determining Semantically Related Significant Genes.
Taha, Kamal
2014-01-01
GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
ERIC Educational Resources Information Center
Nesic, Sasa; Gasevic, Dragan; Jazayeri, Mehdi; Landoni, Monica
2011-01-01
Semantic web technologies have been applied to many aspects of learning content authoring including semantic annotation, semantic search, dynamic assembly, and personalization of learning content. At the same time, social networking services have started to play an important role in the authoring process by supporting authors' collaborative…
USI: a fast and accurate approach for conceptual document annotation.
Fiorini, Nicolas; Ranwez, Sylvie; Montmain, Jacky; Ranwez, Vincent
2015-03-14
Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC
Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-01-01
Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699
Teixeira, Marlon Amaro Coelho; Belloze, Kele Teixeira; Cavalcanti, Maria Cláudia; Silva-Junior, Floriano P
2018-04-01
Semantic text annotation enables the association of semantic information (ontology concepts) to text expressions (terms), which are readable by software agents. In the scientific scenario, this is particularly useful because it reveals a lot of scientific discoveries that are hidden within academic articles. The Biomedical area has more than 300 ontologies, most of them composed of over 500 concepts. These ontologies can be used to annotate scientific papers and thus, facilitate data extraction. However, in the context of a scientific research, a simple keyword-based query using the interface of a digital scientific texts library can return more than a thousand hits. The analysis of such a large set of texts, annotated with such numerous and large ontologies, is not an easy task. Therefore, the main objective of this work is to provide a method that could facilitate this task. This work describes a method called Text and Ontology ETL (TOETL), to build an analytical view over such texts. First, a corpus of selected papers is semantically annotated using distinct ontologies. Then, the annotation data is extracted, organized and aggregated into the dimensional schema of a data mart. Besides the TOETL method, this work illustrates its application through the development of the TaP DM (Target Prioritization data mart). This data mart has focus on the research of gene essentiality, a key concept to be considered when searching for genes showing potential as anti-infective drug targets. This work reveals that the proposed approach is a relevant tool to support decision making in the prioritization of new drug targets, being more efficient than the keyword-based traditional tools. Copyright © 2018 Elsevier B.V. All rights reserved.
Depeursinge, Adrien; Kurtz, Camille; Beaulieu, Christopher F.; Napel, Sandy; Rubin, Daniel L.
2014-01-01
We describe a framework to model visual semantics of liver lesions in CT images in order to predict the visual semantic terms (VST) reported by radiologists in describing these lesions. Computational models of VST are learned from image data using high–order steerable Riesz wavelets and support vector machines (SVM). The organization of scales and directions that are specific to every VST are modeled as linear combinations of directional Riesz wavelets. The models obtained are steerable, which means that any orientation of the model can be synthesized from linear combinations of the basis filters. The latter property is leveraged to model VST independently from their local orientation. In a first step, these models are used to predict the presence of each semantic term that describes liver lesions. In a second step, the distances between all VST models are calculated to establish a non–hierarchical computationally–derived ontology of VST containing inter–term synonymy and complementarity. A preliminary evaluation of the proposed framework was carried out using 74 liver lesions annotated with a set of 18 VSTs from the RadLex ontology. A leave–one–patient–out cross–validation resulted in an average area under the ROC curve of 0.853 for predicting the presence of each VST when using SVMs in a feature space combining the magnitudes of the steered models with CT intensities. Likelihood maps are created for each VST, which enables high transparency of the information modeled. The computationally–derived ontology obtained from the VST models was found to be consistent with the underlying semantics of the visual terms. It was found to be complementary to the RadLex ontology, and constitutes a potential method to link the image content to visual semantics. The proposed framework is expected to foster human–computer synergies for the interpretation of radiological images while using rotation–covariant computational models of VSTs to (1) quantify their local likelihood and (2) explicitly link them with pixel–based image content in the context of a given imaging domain. PMID:24808406
Ontology based decision system for breast cancer diagnosis
NASA Astrophysics Data System (ADS)
Trabelsi Ben Ameur, Soumaya; Cloppet, Florence; Wendling, Laurent; Sellami, Dorra
2018-04-01
In this paper, we focus on analysis and diagnosis of breast masses inspired by expert concepts and rules. Accordingly, a Bag of Words is built based on the ontology of breast cancer diagnosis, accurately described in the Breast Imaging Reporting and Data System. To fill the gap between low level knowledge and expert concepts, a semantic annotation is developed using a machine learning tool. Then, breast masses are classified into benign or malignant according to expert rules implicitly modeled with a set of classifiers (KNN, ANN, SVM and Decision Tree). This semantic context of analysis offers a frame where we can include external factors and other meta-knowledge such as patient risk factors as well as exploiting more than one modality. Based on MRI and DECEDM modalities, our developed system leads a recognition rate of 99.7% with Decision Tree where an improvement of 24.7 % is obtained owing to semantic analysis.
Practical life log video indexing based on content and context
NASA Astrophysics Data System (ADS)
Tancharoen, Datchakorn; Yamasaki, Toshihiko; Aizawa, Kiyoharu
2006-01-01
Today, multimedia information has gained an important role in daily life and people can use imaging devices to capture their visual experiences. In this paper, we present our personal Life Log system to record personal experiences in form of wearable video and environmental data; in addition, an efficient retrieval system is demonstrated to recall the desirable media. We summarize the practical video indexing techniques based on Life Log content and context to detect talking scenes by using audio/visual cues and semantic key frames from GPS data. Voice annotation is also demonstrated as a practical indexing method. Moreover, we apply body media sensors to record continuous life style and use body media data to index the semantic key frames. In the experiments, we demonstrated various video indexing results which provided their semantic contents and showed Life Log visualizations to examine personal life effectively.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.
Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
Recommendations for recognizing video events by concept vocabularies
2014-06-01
authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright anno - tation thereon. Disclaimer: The views and...detection and recognition for semantic annotation of video, Multimedia Tools Appl. 51 (1) (2011) 279–302. [6] A. Berg, J . Deng, S. Satheesh, H. Su, F.-F...Li, Imagenet Large Scale Visual Recognition Challenge 2011. <http://www.image-net.org/challenges/LSVRC/ 2011/>. [7] J . Deng, W. Dong, R. Socher, L
Semantic web data warehousing for caGrid.
McCusker, James P; Phillips, Joshua A; González Beltrán, Alejandra; Finkelstein, Anthony; Krauthammer, Michael
2009-10-01
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges.
Generative Adversarial Networks: An Overview
NASA Astrophysics Data System (ADS)
Creswell, Antonia; White, Tom; Dumoulin, Vincent; Arulkumaran, Kai; Sengupta, Biswa; Bharath, Anil A.
2018-01-01
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.
Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-09-01
To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
A Semantic Web-based System for Managing Clinical Archetypes.
Fernandez-Breis, Jesualdo Tomas; Menarguez-Tortosa, Marcos; Martinez-Costa, Catalina; Fernandez-Breis, Eneko; Herrero-Sempere, Jose; Moner, David; Sanchez, Jesus; Valencia-Garcia, Rafael; Robles, Montserrat
2008-01-01
Archetypes facilitate the sharing of clinical knowledge and therefore are a basic tool for achieving interoperability between healthcare information systems. In this paper, a Semantic Web System for Managing Archetypes is presented. This system allows for the semantic annotation of archetypes, as well for performing semantic searches. The current system is capable of working with both ISO13606 and OpenEHR archetypes.
Semantic-Aware Components and Services of ActiveMath
ERIC Educational Resources Information Center
Melis, Erica; Goguadze, Giorgi; Homik, Martin; Libbrecht, Paul; Ullrich, Carsten; Winterstein, Stefan
2006-01-01
ActiveMath is a complex web-based adaptive learning environment with a number of components and interactive learning tools. The basis for handling semantics of learning content is provided by its semantic (mathematics) content markup, which is additionally annotated with educational metadata. Several components, tools and external services can…
Open semantic annotation of scientific publications using DOMEO.
Ciccarese, Paolo; Ocana, Marco; Clark, Tim
2012-04-24
Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO). Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies.Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org) and is scheduled for production deployment in the NIF's next full release.Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework.
Open semantic annotation of scientific publications using DOMEO
2012-01-01
Background Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. Methods The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO). Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies. Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. Results We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org) and is scheduled for production deployment in the NIF’s next full release. Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework. PMID:22541592
Clark, Alex M; Bunin, Barry A; Litterman, Nadia K; Schürer, Stephan C; Visser, Ubbo
2014-01-01
Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
Bunin, Barry A.; Litterman, Nadia K.; Schürer, Stephan C.; Visser, Ubbo
2014-01-01
Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers. PMID:25165633
Representing annotation compositionality and provenance for the Semantic Web
2013-01-01
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking. PMID:24268021
Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi
2009-01-01
The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.
Jiang, Guoqian; Solbrig, Harold R; Chute, Christopher G
2011-01-01
A source of semantically coded Adverse Drug Event (ADE) data can be useful for identifying common phenotypes related to ADEs. We proposed a comprehensive framework for building a standardized ADE knowledge base (called ADEpedia) through combining ontology-based approach with semantic web technology. The framework comprises four primary modules: 1) an XML2RDF transformation module; 2) a data normalization module based on NCBO Open Biomedical Annotator; 3) a RDF store based persistence module; and 4) a front-end module based on a Semantic Wiki for the review and curation. A prototype is successfully implemented to demonstrate the capability of the system to integrate multiple drug data and ontology resources and open web services for the ADE data standardization. A preliminary evaluation is performed to demonstrate the usefulness of the system, including the performance of the NCBO annotator. In conclusion, the semantic web technology provides a highly scalable framework for ADE data source integration and standard query service.
Semantic web data warehousing for caGrid
McCusker, James P; Phillips, Joshua A; Beltrán, Alejandra González; Finkelstein, Anthony; Krauthammer, Michael
2009-01-01
The National Cancer Institute (NCI) is developing caGrid as a means for sharing cancer-related data and services. As more data sets become available on caGrid, we need effective ways of accessing and integrating this information. Although the data models exposed on caGrid are semantically well annotated, it is currently up to the caGrid client to infer relationships between the different models and their classes. In this paper, we present a Semantic Web-based data warehouse (Corvus) for creating relationships among caGrid models. This is accomplished through the transformation of semantically-annotated caBIG® Unified Modeling Language (UML) information models into Web Ontology Language (OWL) ontologies that preserve those semantics. We demonstrate the validity of the approach by Semantic Extraction, Transformation and Loading (SETL) of data from two caGrid data sources, caTissue and caArray, as well as alignment and query of those sources in Corvus. We argue that semantic integration is necessary for integration of data from distributed web services and that Corvus is a useful way of accomplishing this. Our approach is generalizable and of broad utility to researchers facing similar integration challenges. PMID:19796399
Collective dynamics of social annotation
Cattuto, Ciro; Barrat, Alain; Baldassarri, Andrea; Schehr, Gregory; Loreto, Vittorio
2009-01-01
The enormous increase of popularity and use of the worldwide web has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with keywords known as “tags.” Understanding the rich emergent structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks (RWs), and complex networks theory, can effectively contribute to the mathematical modeling of social annotation systems. Here, we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of RWs. This modeling framework reproduces several aspects, thus far unexplained, of social annotation, among which are the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and that are typically hard to access. PMID:19506244
EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.
Boland, Mary Regina; Tu, Samson W; Carini, Simona; Sim, Ida; Weng, Chunhua
2012-01-01
Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria.
Semantic markup of sensor capabilities: how simple it too simple?
NASA Astrophysics Data System (ADS)
Rueda-Velasquez, C. A.; Janowicz, K.; Fredericks, J.
2016-12-01
Semantics plays a key role for the publication, retrieval, integration, and reuse of observational data across the geosciences. In most cases, one can safely assume that the providers of such data, e.g., individual scientists, understand the observation context in which their data are collected,e.g., the used observation procedure, the sampling strategy, the feature of interest being studied, and so forth. However, can we expect that the same is true for the technical details of the used sensors and especially the nuanced changes that can impact observations in often unpredictable ways? Should the burden of annotating the sensor capabilities, firmware, operation ranges, and so forth be really part of a scientist's responsibility? Ideally, semantic annotations should be provided by the parties that understand these details and have a vested interest in maintaining these data. With manufactures providing semantically-enabled metadata for their sensors and instruments, observations could more easily be annotated and thereby enriched using this information. Unfortunately, today's sensor ontologies and tool chains developed for the Semantic Web community require expertise beyond the knowledge and interest of most manufacturers. Consequently, knowledge engineers need to better understand the sweet spot between simple ontologies/vocabularies and sufficient expressivity as well as the tools required to enable manufacturers to share data about their sensors. Here, we report on the current results of EarthCube's X-Domes project that aims to address the questions outlined above.
Semantic Annotation of Computational Components
NASA Technical Reports Server (NTRS)
Vanderbilt, Peter; Mehrotra, Piyush
2004-01-01
This paper describes a methodology to specify machine-processable semantic descriptions of computational components to enable them to be shared and reused. A particular focus of this scheme is to enable automatic compositon of such components into simple work-flows.
NoGOA: predicting noisy GO annotations using evidences and sparse representation.
Yu, Guoxian; Lu, Chang; Wang, Jun
2017-07-21
Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Transformation of an uncertain video search pipeline to a sketch-based visual analytics loop.
Legg, Philip A; Chung, David H S; Parry, Matthew L; Bown, Rhodri; Jones, Mark W; Griffiths, Iwan W; Chen, Min
2013-12-01
Traditional sketch-based image or video search systems rely on machine learning concepts as their core technology. However, in many applications, machine learning alone is impractical since videos may not be semantically annotated sufficiently, there may be a lack of suitable training data, and the search requirements of the user may frequently change for different tasks. In this work, we develop a visual analytics systems that overcomes the shortcomings of the traditional approach. We make use of a sketch-based interface to enable users to specify search requirement in a flexible manner without depending on semantic annotation. We employ active machine learning to train different analytical models for different types of search requirements. We use visualization to facilitate knowledge discovery at the different stages of visual analytics. This includes visualizing the parameter space of the trained model, visualizing the search space to support interactive browsing, visualizing candidature search results to support rapid interaction for active learning while minimizing watching videos, and visualizing aggregated information of the search results. We demonstrate the system for searching spatiotemporal attributes from sports video to identify key instances of the team and player performance.
A Linked Data-Based Collaborative Annotation System for Increasing Learning Achievements
ERIC Educational Resources Information Center
Zarzour, Hafed; Sellami, Mokhtar
2017-01-01
With the emergence of the Web 2.0, collaborative annotation practices have become more mature in the field of learning. In this context, several recent studies have shown the powerful effects of the integration of annotation mechanism in learning process. However, most of these studies provide poor support for semantically structured resources,…
Ellouze, Afef Samet; Bouaziz, Rafik; Ghorbel, Hanen
2016-10-01
Integrating semantic dimension into clinical archetypes is necessary once modeling medical records. First, it enables semantic interoperability and, it offers applying semantic activities on clinical data and provides a higher design quality of Electronic Medical Record (EMR) systems. However, to obtain these advantages, designers need to use archetypes that cover semantic features of clinical concepts involved in their specific applications. In fact, most of archetypes filed within open repositories are expressed in the Archetype Definition Language (ALD) which allows defining only the syntactic structure of clinical concepts weakening semantic activities on the EMR content in the semantic web environment. This paper focuses on the modeling of an EMR prototype for infants affected by Cerebral Palsy (CP), using the dual model approach and integrating semantic web technologies. Such a modeling provides a better delivery of quality of care and ensures semantic interoperability between all involved therapies' information systems. First, data to be documented are identified and collected from the involved therapies. Subsequently, data are analyzed and arranged into archetypes expressed in accordance of ADL. During this step, open archetype repositories are explored, in order to find the suitable archetypes. Then, ADL archetypes are transformed into archetypes expressed in OWL-DL (Ontology Web Language - Description Language). Finally, we construct an ontological source related to these archetypes enabling hence their annotation to facilitate data extraction and providing possibility to exercise semantic activities on such archetypes. Semantic dimension integration into EMR modeled in accordance to the archetype approach. The feasibility of our solution is shown through the development of a prototype, baptized "CP-SMS", which ensures semantic exploitation of CP EMR. This prototype provides the following features: (i) creation of CP EMR instances and their checking by using a knowledge base which we have constructed by interviews with domain experts, (ii) translation of initially CP ADL archetypes into CP OWL-DL archetypes, (iii) creation of an ontological source which we can use to annotate obtained archetypes and (vi) enrichment and supply of the ontological source and integration of semantic relations by providing hence fueling the ontology with new concepts, ensuring consistency and eliminating ambiguity between concepts. The degree of semantic interoperability that could be reached between EMR systems depends strongly on the quality of the used archetypes. Thus, the integration of semantic dimension in archetypes modeling process is crucial. By creating an ontological source and annotating archetypes, we create a supportive platform ensuring semantic interoperability between archetypes-based EMR-systems. Copyright © 2016. Published by Elsevier Inc.
The development of non-coding RNA ontology.
Huang, Jingshan; Eilbeck, Karen; Smith, Barry; Blake, Judith A; Dou, Dejing; Huang, Weili; Natale, Darren A; Ruttenberg, Alan; Huan, Jun; Zimmermann, Michael T; Jiang, Guoqian; Lin, Yu; Wu, Bin; Strachan, Harrison J; de Silva, Nisansa; Kasukurthi, Mohan Vamsi; Jha, Vikash Kumar; He, Yongqun; Zhang, Shaojie; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming
2016-01-01
Identification of non-coding RNAs (ncRNAs) has been significantly improved over the past decade. On the other hand, semantic annotation of ncRNA data is facing critical challenges due to the lack of a comprehensive ontology to serve as common data elements and data exchange standards in the field. We developed the Non-Coding RNA Ontology (NCRO) to handle this situation. By providing a formally defined ncRNA controlled vocabulary, the NCRO aims to fill a specific and highly needed niche in semantic annotation of large amounts of ncRNA biological and clinical data.
Measurement of tag confidence in user generated contents retrieval
NASA Astrophysics Data System (ADS)
Lee, Sihyoung; Min, Hyun-Seok; Lee, Young Bok; Ro, Yong Man
2009-01-01
As online image sharing services are becoming popular, the importance of correctly annotated tags is being emphasized for precise search and retrieval. Tags created by user along with user-generated contents (UGC) are often ambiguous due to the fact that some tags are highly subjective and visually unrelated to the image. They cause unwanted results to users when image search engines rely on tags. In this paper, we propose a method of measuring tag confidence so that one can differentiate confidence tags from noisy tags. The proposed tag confidence is measured from visual semantics of the image. To verify the usefulness of the proposed method, experiments were performed with UGC database from social network sites. Experimental results showed that the image retrieval performance with confidence tags was increased.
Practical solutions to implementing "Born Semantic" data systems
NASA Astrophysics Data System (ADS)
Leadbetter, A.; Buck, J. J. H.; Stacey, P.
2015-12-01
The concept of data being "Born Semantic" has been proposed in recent years as a Semantic Web analogue to the idea of data being "born digital"[1], [2]. Within the "Born Semantic" concept, data are captured digitally and at a point close to the time of creation are annotated with markup terms from semantic web resources (controlled vocabularies, thesauri or ontologies). This allows heterogeneous data to be more easily ingested and amalgamated in near real-time due to the standards compliant annotation of the data. In taking the "Born Semantic" proposal from concept to operation, a number of difficulties have been encountered. For example, although there are recognised methods such as Header, Dictionary, Triples [3] for the compression, publication and dissemination of large volumes of triples these systems are not practical to deploy in the field on low-powered (both electrically and computationally) devices. Similarly, it is not practical for instruments to output fully formed semantically annotated data files if they are designed to be plugged into a modular system and the data to be centrally logged in the field as is the case on Argo floats and oceanographic gliders where internal bandwidth becomes an issue [2]. In light of these issues, this presentation will concentrate on pragmatic solutions being developed to the problem of generating Linked Data in near real-time systems. Specific examples from the European Commission SenseOCEAN project where Linked Data systems are being developed for autonomous underwater platforms, and from work being undertaken in the streaming of data from the Irish Galway Bay Cable Observatory initiative will be highlighted. Further, developments of a set of tools for the LogStash-ElasticSearch software ecosystem to allow the storing and retrieval of Linked Data will be introduced. References[1] A. Leadbetter & J. Fredericks, We have "born digital" - now what about "born semantic"?, European Geophysical Union General Assembly, 2014.[2] J. Buck & A. Leadbetter, Born semantic: linking data from sensors to users and balancing hardware limitations with data standards, European Geophysical Union General Assembly, 2015.[3] J. Fernandez et al., Binary RDF Representation for Publication and Exchange (HDT), Web Semantics 19:22-41, 2013.
Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn
2018-03-19
Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.
Improving integrative searching of systems chemical biology data using semantic annotation.
Chen, Bin; Ding, Ying; Wild, David J
2012-03-08
Systems chemical biology and chemogenomics are considered critical, integrative disciplines in modern biomedical research, but require data mining of large, integrated, heterogeneous datasets from chemistry and biology. We previously developed an RDF-based resource called Chem2Bio2RDF that enabled querying of such data using the SPARQL query language. Whilst this work has proved useful in its own right as one of the first major resources in these disciplines, its utility could be greatly improved by the application of an ontology for annotation of the nodes and edges in the RDF graph, enabling a much richer range of semantic queries to be issued. We developed a generalized chemogenomics and systems chemical biology OWL ontology called Chem2Bio2OWL that describes the semantics of chemical compounds, drugs, protein targets, pathways, genes, diseases and side-effects, and the relationships between them. The ontology also includes data provenance. We used it to annotate our Chem2Bio2RDF dataset, making it a rich semantic resource. Through a series of scientific case studies we demonstrate how this (i) simplifies the process of building SPARQL queries, (ii) enables useful new kinds of queries on the data and (iii) makes possible intelligent reasoning and semantic graph mining in chemogenomics and systems chemical biology. Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The document is available at http://chem2bio2owl.wikispaces.com.
Annotations of Mexican bullfighting videos for semantic index
NASA Astrophysics Data System (ADS)
Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro
2015-09-01
The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.
CUILESS2016: a clinical corpus applying compositional normalization of text mentions.
Osborne, John D; Neu, Matthew B; Danila, Maria I; Solorio, Thamar; Bethard, Steven J
2018-01-10
Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text.
Identifying image preferences based on demographic attributes
NASA Astrophysics Data System (ADS)
Fedorovskaya, Elena A.; Lawrence, Daniel R.
2014-02-01
The intent of this study is to determine what sorts of images are considered more interesting by which demographic groups. Specifically, we attempt to identify images whose interestingness ratings are influenced by the demographic attribute of the viewer's gender. To that end, we use the data from an experiment where 18 participants (9 women and 9 men) rated several hundred images based on "visual interest" or preferences in viewing images. The images were selected to represent the consumer "photo-space" - typical categories of subject matter found in consumer photo collections. They were annotated using perceptual and semantic descriptors. In analyzing the image interestingness ratings, we apply a multivariate procedure known as forced classification, a feature of dual scaling, a discrete analogue of principal components analysis (similar to correspondence analysis). This particular analysis of ratings (i.e., ordered-choice or Likert) data enables the investigator to emphasize the effect of a specific item or collection of items. We focus on the influence of the demographic item of gender on the analysis, so that the solutions are essentially confined to subspaces spanned by the emphasized item. Using this technique, we can know definitively which images' ratings have been influenced by the demographic item of choice. Subsequently, images can be evaluated and linked, on one hand, to their perceptual and semantic descriptors, and, on the other hand, to the preferences associated with viewers' demographic attributes.
Gestural cue analysis in automated semantic miscommunication annotation
Inoue, Masashi; Ogihara, Mitsunori; Hanada, Ryoko; Furuyama, Nobuhiro
2011-01-01
The automated annotation of conversational video by semantic miscommunication labels is a challenging topic. Although miscommunications are often obvious to the speakers as well as the observers, it is difficult for machines to detect them from the low-level features. We investigate the utility of gestural cues in this paper among various non-verbal features. Compared with gesture recognition tasks in human-computer interaction, this process is difficult due to the lack of understanding on which cues contribute to miscommunications and the implicitness of gestures. Nine simple gestural features are taken from gesture data, and both simple and complex classifiers are constructed using machine learning. The experimental results suggest that there is no single gestural feature that can predict or explain the occurrence of semantic miscommunication in our setting. PMID:23585724
Enriched Video Semantic Metadata: Authorization, Integration, and Presentation.
ERIC Educational Resources Information Center
Mu, Xiangming; Marchionini, Gary
2003-01-01
Presents an enriched video metadata framework including video authorization using the Video Annotation and Summarization Tool (VAST)-a video metadata authorization system that integrates both semantic and visual metadata-- metadata integration, and user level applications. Results demonstrated that the enriched metadata were seamlessly…
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.
Sogancioglu, Gizem; Öztürk, Hakime; Özgür, Arzucan
2017-07-15
The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi
2013-01-01
The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present “Entrez Neuron”, a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the ‘HCLS knowledgebase’ developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrates how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup. PMID:19745321
Constructive Ontology Engineering
ERIC Educational Resources Information Center
Sousan, William L.
2010-01-01
The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in…
Driving Under the Influence (of Language).
Barrett, Daniel Paul; Bronikowski, Scott Alan; Yu, Haonan; Siskind, Jeffrey Mark
2017-06-09
We present a unified framework which supports grounding natural-language semantics in robotic driving. This framework supports acquisition (learning grounded meanings of nouns and prepositions from human sentential annotation of robotic driving paths), generation (using such acquired meanings to generate sentential description of new robotic driving paths), and comprehension (using such acquired meanings to support automated driving to accomplish navigational goals specified in natural language). We evaluate the performance of these three tasks by having independent human judges rate the semantic fidelity of the sentences associated with paths. Overall, machine performance is 74.9%, while the performance of human annotators is 83.8%.
Progress in The Semantic Analysis of Scientific Code
NASA Technical Reports Server (NTRS)
Stewart, Mark
2000-01-01
This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, independent expert parsers. These semantic parsers encode domain knowledge and recognize formulae in different disciplines including physics, numerical methods, mathematics, and geometry. The parsers will automatically recognize and document some static, semantic concepts and help locate some program semantic errors. These techniques may apply to a wider range of scientific codes. If so, the techniques could reduce the time, risk, and effort required to develop and modify scientific codes.
An Experiment in Scientific Code Semantic Analysis
NASA Technical Reports Server (NTRS)
Stewart, Mark E. M.
1998-01-01
This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, distributed expert parsers. These semantic parser are designed to recognize formulae in different disciplines including physical and mathematical formulae and geometrical position in a numerical scheme. The parsers will automatically recognize and document some static, semantic concepts and locate some program semantic errors. Results are shown for a subroutine test case and a collection of combustion code routines. This ability to locate some semantic errors and document semantic concepts in scientific and engineering code should reduce the time, risk, and effort of developing and using these codes.
Incorporating Semantics into Data Driven Workflows for Content Based Analysis
NASA Astrophysics Data System (ADS)
Argüello, M.; Fernandez-Prieto, M. J.
Finding meaningful associations between text elements and knowledge structures within clinical narratives in a highly verbal domain, such as psychiatry, is a challenging goal. The research presented here uses a small corpus of case histories and brings into play pre-existing knowledge, and therefore, complements other approaches that use large corpus (millions of words) and no pre-existing knowledge. The paper describes a variety of experiments for content-based analysis: Linguistic Analysis using NLP-oriented approaches, Sentiment Analysis, and Semantically Meaningful Analysis. Although it is not standard practice, the paper advocates providing automatic support to annotate the functionality as well as the data for each experiment by performing semantic annotation that uses OWL and OWL-S. Lessons learnt can be transmitted to legacy clinical databases facing the conversion of clinical narratives according to prominent Electronic Health Records standards.
DynGO: a tool for visualizing and mining of Gene Ontology and its associations
Liu, Hongfang; Hu, Zhang-Zhi; Wu, Cathy H
2005-01-01
Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website . PMID:16091147
A semantic medical multimedia retrieval approach using ontology information hiding.
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches.
Exploring context and content links in social media: a latent space method.
Qi, Guo-Jun; Aggarwal, Charu; Tian, Qi; Ji, Heng; Huang, Thomas S
2012-05-01
Social media networks contain both content and context-specific information. Most existing methods work with either of the two for the purpose of multimedia mining and retrieval. In reality, both content and context information are rich sources of information for mining, and the full power of mining and processing algorithms can be realized only with the use of a combination of the two. This paper proposes a new algorithm which mines both context and content links in social media networks to discover the underlying latent semantic space. This mapping of the multimedia objects into latent feature vectors enables the use of any off-the-shelf multimedia retrieval algorithms. Compared to the state-of-the-art latent methods in multimedia analysis, this algorithm effectively solves the problem of sparse context links by mining the geometric structure underlying the content links between multimedia objects. Specifically for multimedia annotation, we show that an effective algorithm can be developed to directly construct annotation models by simultaneously leveraging both context and content information based on latent structure between correlated semantic concepts. We conduct experiments on the Flickr data set, which contains user tags linked with images. We illustrate the advantages of our approach over the state-of-the-art multimedia retrieval techniques.
Desiderata for ontologies to be used in semantic annotation of biomedical documents.
Bada, Michael; Hunter, Lawrence
2011-02-01
A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities. Copyright © 2010 Elsevier Inc. All rights reserved.
Misirli, Goksel; Cavaliere, Matteo; Waites, William; Pocock, Matthew; Madsen, Curtis; Gilfellon, Owen; Honorato-Zimmer, Ricardo; Zuliani, Paolo; Danos, Vincent; Wipat, Anil
2016-03-15
Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. The annotation ontology for rule-based models can be found at http://purl.org/rbm/rbmo The krdf tool and associated executable examples are available at http://purl.org/rbm/rbmo/krdf anil.wipat@newcastle.ac.uk or vdanos@inf.ed.ac.uk. © The Author 2015. Published by Oxford University Press.
A Semantic Medical Multimedia Retrieval Approach Using Ontology Information Hiding
Guo, Kehua; Zhang, Shigeng
2013-01-01
Searching useful information from unstructured medical multimedia data has been a difficult problem in information retrieval. This paper reports an effective semantic medical multimedia retrieval approach which can reflect the users' query intent. Firstly, semantic annotations will be given to the multimedia documents in the medical multimedia database. Secondly, the ontology that represented semantic information will be hidden in the head of the multimedia documents. The main innovations of this approach are cross-type retrieval support and semantic information preservation. Experimental results indicate a good precision and efficiency of our approach for medical multimedia retrieval in comparison with some traditional approaches. PMID:24082915
NASA Astrophysics Data System (ADS)
Brambilla, Marco; Ceri, Stefano; Valle, Emanuele Della; Facca, Federico M.; Tziviskou, Christina
Although Semantic Web Services are expected to produce a revolution in the development of Web-based systems, very few enterprise-wide design experiences are available; one of the main reasons is the lack of sound Software Engineering methods and tools for the deployment of Semantic Web applications. In this chapter, we present an approach to software development for the Semantic Web based on classical Software Engineering methods (i.e., formal business process development, computer-aided and component-based software design, and automatic code generation) and on semantic methods and tools (i.e., ontology engineering, semantic service annotation and discovery).
2013-01-01
Background Open metadata registries are a fundamental tool for researchers in the Life Sciences trying to locate resources. While most current registries assume that resources are annotated with well-structured metadata, evidence shows that most of the resource annotations simply consists of informal free text. This reality must be taken into account in order to develop effective techniques for resource discovery in Life Sciences. Results BioUSeR is a semantic-based tool aimed at retrieving Life Sciences resources described in free text. The retrieval process is driven by the user requirements, which consist of a target task and a set of facets of interest, both expressed in free text. BioUSeR is able to effectively exploit the available textual descriptions to find relevant resources by using semantic-aware techniques. Conclusions BioUSeR overcomes the limitations of the current registries thanks to: (i) rich specification of user information needs, (ii) use of semantics to manage textual descriptions, (iii) retrieval and ranking of resources based on user requirements. PMID:23635042
Culto: AN Ontology-Based Annotation Tool for Data Curation in Cultural Heritage
NASA Astrophysics Data System (ADS)
Garozzo, R.; Murabito, F.; Santagati, C.; Pino, C.; Spampinato, C.
2017-08-01
This paper proposes CulTO, a software tool relying on a computational ontology for Cultural Heritage domain modelling, with a specific focus on religious historical buildings, for supporting cultural heritage experts in their investigations. It is specifically thought to support annotation, automatic indexing, classification and curation of photographic data and text documents of historical buildings. CULTO also serves as a useful tool for Historical Building Information Modeling (H-BIM) by enabling semantic 3D data modeling and further enrichment with non-geometrical information of historical buildings through the inclusion of new concepts about historical documents, images, decay or deformation evidence as well as decorative elements into BIM platforms. CulTO is the result of a joint research effort between the Laboratory of Surveying and Architectural Photogrammetry "Luigi Andreozzi" and the PeRCeiVe Lab (Pattern Recognition and Computer Vision Lab) of the University of Catania,
Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan
2016-08-01
This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Next Generation Models for Storage and Representation of Microbial Biological Annotation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quest, Daniel J; Land, Miriam L; Brettin, Thomas S
2010-01-01
Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software systemmore » to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.« less
Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges
Dos Reis, J. C.; Pruski, C.
2015-01-01
Summary Objectives Controlled terminologies and their dependent artefacts provide a consensual understanding of a domain while reducing ambiguities and enabling reasoning. However, the evolution of a domain’s knowledge directly impacts these terminologies and generates inconsistencies in the underlying biomedical information systems. In this article, we review existing work addressing the dynamic aspect of terminologies as well as their effects on mappings and semantic annotations. Methods We investigate approaches related to the identification, characterization and propagation of changes in terminologies, mappings and semantic annotations including techniques to update their content. Results and conclusion Based on the explored issues and existing methods, we outline open research challenges requiring investigation in the near future. PMID:26293859
Accelerating Cancer Systems Biology Research through Semantic Web Technology
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S.
2012-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute’s caBIG®, so users can not only interact with the DMR through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers’ intellectual property. PMID:23188758
Accelerating cancer systems biology research through Semantic Web technology.
Wang, Zhihui; Sagotsky, Jonathan; Taylor, Thomas; Shironoshita, Patrick; Deisboeck, Thomas S
2013-01-01
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute's caBIG, so users can interact with the DMR not only through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers' intellectual property. Copyright © 2012 Wiley Periodicals, Inc.
Legaz-García, María del Carmen; Martínez-Costa, Catalina; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás
2012-01-01
Linking Electronic Healthcare Records (EHR) content to educational materials has been considered a key international recommendation to enable clinical engagement and to promote patient safety. This would suggest citizens to access reliable information available on the web and to guide them properly. In this paper, we describe an approach in that direction, based on the use of dual model EHR standards and standardized educational contents. The recommendation method will be based on the semantic coverage of the learning content repository for a particular archetype, which will be calculated by applying semantic web technologies like ontologies and semantic annotations.
SciFlo: Semantically-Enabled Grid Workflow for Collaborative Science
NASA Astrophysics Data System (ADS)
Yunck, T.; Wilson, B. D.; Raskin, R.; Manipon, G.
2005-12-01
SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators). SciFlo's XML dataflow documents can be a mixture of concrete operators (fully bound operations) and abstract template operators (late binding via semantic lookup). All data objects and operators can be both simply typed (simple and complex types in XML schema) and semantically typed using controlled vocabularies (linked to OWL ontologies such as SWEET). By exploiting ontology-enhanced search and inference, one can discover (and automatically invoke) Web Services and operators that have been semantically labeled as performing the desired transformation, and adapt a particular invocation to the proper interface (number, types, and meaning of inputs and outputs). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data. SciFlo uses and preserves semantics, and also generates and infers new semantic annotations. Specifically, the SciFlo engine uses semantic metadata to understand (infer) what it is doing and potentially improve the data flow; preserves semantics by saving links to the semantics of (metadata describing) the input datasets, related datasets, and the data transformations (algorithms) used to generate downstream products; generates new metadata by allowing the user to add semantic annotations to the generated data products (or simply accept automatically generated provenance annotations); and infers new semantic metadata by understanding and applying logic to the semantics of the data and the transformations performed. Much ontology development still needs to be done but, nevertheless, SciFlo documents provide a substrate for using and preserving more semantics as ontologies develop. We will give a live demonstration of the growing SciFlo network using an example dataflow in which atmospheric temperature and water vapor profiles from three Earth Observing System (EOS) instruments are retrieved using SOAP (geo-location query & data access) services, co-registered, and visually & statistically compared on demand (see http://sciflo.jpl.nasa.gov for more information).
Keynote Talk: Mining the Web 2.0 for Improved Image Search
NASA Astrophysics Data System (ADS)
Baeza-Yates, Ricardo
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself.
Determining similarity of scientific entities in annotation datasets
Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas
2015-01-01
Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057
Determining similarity of scientific entities in annotation datasets.
Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas
2015-01-01
Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ © The Author(s) 2015. Published by Oxford University Press.
ERIC Educational Resources Information Center
Vrablecová, Petra; Šimko, Marián
2016-01-01
The domain model is an essential part of an adaptive learning system. For each educational course, it involves educational content and semantics, which is also viewed as a form of conceptual metadata about educational content. Due to the size of a domain model, manual domain model creation is a challenging and demanding task for teachers or…
Fast gene ontology based clustering for microarray experiments.
Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa
2008-11-21
Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
The expansion of chemical-bioassay data in the public domain is a boon to science; however, the difficulty in establishing accurate linkages from CAS registry number (CASRN) to structure, or for properly annotating names and synonyms for a particular structure is well known. DSS...
Sharing and reusing multimedia multilingual educational resources in medicine.
Zdrahal, Zdenek; Knoth, Petr; Mulholland, Paul; Collins, Trevor
2013-01-01
The paper describes the Eurogene portal for sharing and reusing multilingual multimedia educational resources in human genetics. The content is annotated using concepts of two ontologies and a topic hierarchy. The ontology annotation is used to guide search and for calculating semantically similar content. Educational resources can be aggregated into learning packages. The system is in routine use since 2009.
Semantic integration to identify overlapping functional modules in protein interaction networks
Cho, Young-Rae; Hwang, Woochang; Ramanathan, Murali; Zhang, Aidong
2007-01-01
Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification. PMID:17650343
Harispe, Sébastien; Ranwez, Sylvie; Janaqi, Stefan; Montmain, Jacky
2014-03-01
The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org.
A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.
Kothari, Cartik R; Payne, Philip R O
2015-01-01
In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.
A Weighted Multipath Measurement Based on Gene Ontology for Estimating Gene Products Similarity
Liu, Lizhen; Dai, Xuemin; Song, Wei; Lu, Jingli
2014-01-01
Abstract Many different methods have been proposed for calculating the semantic similarity of term pairs based on gene ontology (GO). Most existing methods are based on information content (IC), and the methods based on IC are used more commonly than those based on the structure of GO. However, most IC-based methods not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins. We propose a new method called weighted multipath measurement (WMM) for estimating the semantic similarity of gene products based on the structure of the GO. We not only considered the contribution of every path between two GO terms but also took the depth of the lowest common ancestors into account. We assigned different weights for different kinds of edges in GO graph. The similarity values calculated by WMM can be reused because they are only relative to the characteristics of GO terms. Experimental results showed that the similarity values obtained by WMM have a higher accuracy. We compared the performance of WMM with that of other methods using GO data and gene annotation datasets for yeast and humans downloaded from the GO database. We found that WMM is more suited for prediction of gene function than most existing IC-based methods and that it can distinguish proteins with identical annotations (two proteins are annotated with the same terms) from each other. PMID:25229994
Collaborative Movie Annotation
NASA Astrophysics Data System (ADS)
Zad, Damon Daylamani; Agius, Harry
In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.
A Methodology for the Development of RESTful Semantic Web Services for Gene Expression Analysis
Guardia, Gabriela D. A.; Pires, Luís Ferreira; Vêncio, Ricardo Z. N.; Malmegrim, Kelen C. R.; de Farias, Cléver R. G.
2015-01-01
Gene expression studies are generally performed through multi-step analysis processes, which require the integrated use of a number of analysis tools. In order to facilitate tool/data integration, an increasing number of analysis tools have been developed as or adapted to semantic web services. In recent years, some approaches have been defined for the development and semantic annotation of web services created from legacy software tools, but these approaches still present many limitations. In addition, to the best of our knowledge, no suitable approach has been defined for the functional genomics domain. Therefore, this paper aims at defining an integrated methodology for the implementation of RESTful semantic web services created from gene expression analysis tools and the semantic annotation of such services. We have applied our methodology to the development of a number of services to support the analysis of different types of gene expression data, including microarray and RNASeq. All developed services are publicly available in the Gene Expression Analysis Services (GEAS) Repository at http://dcm.ffclrp.usp.br/lssb/geas. Additionally, we have used a number of the developed services to create different integrated analysis scenarios to reproduce parts of two gene expression studies documented in the literature. The first study involves the analysis of one-color microarray data obtained from multiple sclerosis patients and healthy donors. The second study comprises the analysis of RNA-Seq data obtained from melanoma cells to investigate the role of the remodeller BRG1 in the proliferation and morphology of these cells. Our methodology provides concrete guidelines and technical details in order to facilitate the systematic development of semantic web services. Moreover, it encourages the development and reuse of these services for the creation of semantically integrated solutions for gene expression analysis. PMID:26207740
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.
Velupillai, S; Mowery, D; South, B R; Kvist, M; Dalianis, H
2015-08-13
We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.
A Methodology for the Development of RESTful Semantic Web Services for Gene Expression Analysis.
Guardia, Gabriela D A; Pires, Luís Ferreira; Vêncio, Ricardo Z N; Malmegrim, Kelen C R; de Farias, Cléver R G
2015-01-01
Gene expression studies are generally performed through multi-step analysis processes, which require the integrated use of a number of analysis tools. In order to facilitate tool/data integration, an increasing number of analysis tools have been developed as or adapted to semantic web services. In recent years, some approaches have been defined for the development and semantic annotation of web services created from legacy software tools, but these approaches still present many limitations. In addition, to the best of our knowledge, no suitable approach has been defined for the functional genomics domain. Therefore, this paper aims at defining an integrated methodology for the implementation of RESTful semantic web services created from gene expression analysis tools and the semantic annotation of such services. We have applied our methodology to the development of a number of services to support the analysis of different types of gene expression data, including microarray and RNASeq. All developed services are publicly available in the Gene Expression Analysis Services (GEAS) Repository at http://dcm.ffclrp.usp.br/lssb/geas. Additionally, we have used a number of the developed services to create different integrated analysis scenarios to reproduce parts of two gene expression studies documented in the literature. The first study involves the analysis of one-color microarray data obtained from multiple sclerosis patients and healthy donors. The second study comprises the analysis of RNA-Seq data obtained from melanoma cells to investigate the role of the remodeller BRG1 in the proliferation and morphology of these cells. Our methodology provides concrete guidelines and technical details in order to facilitate the systematic development of semantic web services. Moreover, it encourages the development and reuse of these services for the creation of semantically integrated solutions for gene expression analysis.
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis
Mowery, D.; South, B. R.; Kvist, M.; Dalianis, H.
2015-01-01
Summary Objectives We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. Methods We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. Results Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. Conclusions There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices. PMID:26293867
Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid
2016-01-01
With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments. PMID:27529152
Arend, Daniel; Lange, Matthias; Pape, Jean-Michel; Weigelt-Fischer, Kathleen; Arana-Ceballos, Fernando; Mücke, Ingo; Klukas, Christian; Altmann, Thomas; Scholz, Uwe; Junker, Astrid
2016-08-16
With the implementation of novel automated, high throughput methods and facilities in the last years, plant phenomics has developed into a highly interdisciplinary research domain integrating biology, engineering and bioinformatics. Here we present a dataset of a non-invasive high throughput plant phenotyping experiment, which uses image- and image analysis- based approaches to monitor the growth and development of 484 Arabidopsis thaliana plants (thale cress). The result is a comprehensive dataset of images and extracted phenotypical features. Such datasets require detailed documentation, standardized description of experimental metadata as well as sustainable data storage and publication in order to ensure the reproducibility of experiments, data reuse and comparability among the scientific community. Therefore the here presented dataset has been annotated using the standardized ISA-Tab format and considering the recently published recommendations for the semantical description of plant phenotyping experiments.
linkedISA: semantic representation of ISA-Tab experimental metadata.
González-Beltrán, Alejandra; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe
2014-01-01
Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Kalpathy-Cramer, Jayashree; Hersh, William
2008-01-01
In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data. PMID:19884953
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.
Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René
2011-10-01
Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.
Vempati, Uma D.; Przydzial, Magdalena J.; Chung, Caty; Abeyruwan, Saminda; Mir, Ahsan; Sakurai, Kunie; Visser, Ubbo; Lemmon, Vance P.; Schürer, Stephan C.
2012-01-01
Huge amounts of high-throughput screening (HTS) data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO). BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint. Previously, we reported on the higher-level design of BAO and on the semantic querying capabilities offered by the ontology-indexed triple store of HTS data. Here, we report on our detailed design, annotation pipeline, substantially enlarged annotation knowledgebase, and analysis results. We used BAO to annotate assays from the largest public HTS data repository, PubChem, and demonstrate its utility to categorize and analyze diverse HTS results from numerous experiments. BAO is publically available from the NCBO BioPortal at http://bioportal.bioontology.org/ontologies/1533. BAO provides controlled terminology and uniform scope to report probe and drug discovery screening assays and results. BAO leverages description logic to formalize the domain knowledge and facilitate the semantic integration with diverse other resources. As a consequence, BAO offers the potential to infer new knowledge from a corpus of assay results, for example molecular mechanisms of action of perturbagens. PMID:23155465
Dugas, Martin; Dugas-Breit, Susanne
2014-01-01
Design, execution and analysis of clinical studies involves several stakeholders with different professional backgrounds. Typically, principle investigators are familiar with standard office tools, data managers apply electronic data capture (EDC) systems and statisticians work with statistics software. Case report forms (CRFs) specify the data model of study subjects, evolve over time and consist of hundreds to thousands of data items per study. To avoid erroneous manual transformation work, a converting tool for different representations of study data models was designed. It can convert between office format, EDC and statistics format. In addition, it supports semantic annotations, which enable precise definitions for data items. A reference implementation is available as open source package ODMconverter at http://cran.r-project.org.
An Experiment in Scientific Program Understanding
NASA Technical Reports Server (NTRS)
Stewart, Mark E. M.; Owen, Karl (Technical Monitor)
2000-01-01
This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, independent expert parsers. These semantic parsers encode domain knowledge and recognize formulae in different disciplines including physics, numerical methods, mathematics, and geometry. The parsers will automatically recognize and document some static, semantic concepts and help locate some program semantic errors. Results are shown for three intensively studied codes and seven blind test cases; all test cases are state of the art scientific codes. These techniques may apply to a wider range of scientific codes. If so, the techniques could reduce the time, risk, and effort required to develop and modify scientific codes.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Synergistic Instance-Level Subspace Alignment for Fine-Grained Sketch-Based Image Retrieval.
Li, Ke; Pang, Kaiyue; Song, Yi-Zhe; Hospedales, Timothy M; Xiang, Tao; Zhang, Honggang
2017-08-25
We study the problem of fine-grained sketch-based image retrieval. By performing instance-level (rather than category-level) retrieval, it embodies a timely and practical application, particularly with the ubiquitous availability of touchscreens. Three factors contribute to the challenging nature of the problem: (i) free-hand sketches are inherently abstract and iconic, making visual comparisons with photos difficult, (ii) sketches and photos are in two different visual domains, i.e. black and white lines vs. color pixels, and (iii) fine-grained distinctions are especially challenging when executed across domain and abstraction-level. To address these challenges, we propose to bridge the image-sketch gap both at the high-level via parts and attributes, as well as at the low-level, via introducing a new domain alignment method. More specifically, (i) we contribute a dataset with 304 photos and 912 sketches, where each sketch and image is annotated with its semantic parts and associated part-level attributes. With the help of this dataset, we investigate (ii) how strongly-supervised deformable part-based models can be learned that subsequently enable automatic detection of part-level attributes, and provide pose-aligned sketch-image comparisons. To reduce the sketch-image gap when comparing low-level features, we also (iii) propose a novel method for instance-level domain-alignment, that exploits both subspace and instance-level cues to better align the domains. Finally (iv) these are combined in a matching framework integrating aligned low-level features, mid-level geometric structure and high-level semantic attributes. Extensive experiments conducted on our new dataset demonstrate effectiveness of the proposed method.
Semantic Web repositories for genomics data using the eXframe platform.
Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna
2014-01-01
With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
Semi-Supervised Learning to Identify UMLS Semantic Relations.
Luo, Yuan; Uzuner, Ozlem
2014-01-01
The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).
Geographical topic learning for social images with a deep neural network
NASA Astrophysics Data System (ADS)
Feng, Jiangfan; Xu, Xin
2017-03-01
The use of geographical tagging in social-media images is becoming a part of image metadata and a great interest for geographical information science. It is well recognized that geographical topic learning is crucial for geographical annotation. Existing methods usually exploit geographical characteristics using image preprocessing, pixel-based classification, and feature recognition. How to effectively exploit the high-level semantic feature and underlying correlation among different types of contents is a crucial task for geographical topic learning. Deep learning (DL) has recently demonstrated robust capabilities for image tagging and has been introduced into geoscience. It extracts high-level features computed from a whole image component, where the cluttered background may dominate spatial features in the deep representation. Therefore, a method of spatial-attentional DL for geographical topic learning is provided and we can regard it as a special case of DL combined with various deep networks and tuning tricks. Results demonstrated that the method is discriminative for different types of geographical topic learning. In addition, it outperforms other sequential processing models in a tagging task for a geographical image dataset.
eClims: An Extensible and Dynamic Integration Framework for Biomedical Information Systems.
Savonnet, Marinette; Leclercq, Eric; Naubourg, Pierre
2016-11-01
Biomedical information systems (BIS) require consideration of three types of variability: data variability induced by new high throughput technologies, schema or model variability induced by large scale studies or new fields of research, and knowledge variability resulting from new discoveries. Beyond data heterogeneity, managing variabilities in the context of BIS requires extensible and dynamic integration process. In this paper, we focus on data and schema variabilities and we propose an integration framework based on ontologies, master data, and semantic annotations. The framework addresses issues related to: 1) collaborative work through a dynamic integration process; 2) variability among studies using an annotation mechanism; and 3) quality control over data and semantic annotations. Our approach relies on two levels of knowledge: BIS-related knowledge is modeled using an application ontology coupled with UML models that allow controlling data completeness and consistency, and domain knowledge is described by a domain ontology, which ensures data coherence. A system build with the eClims framework has been implemented and evaluated in the context of a proteomic platform.
Combining computational models, semantic annotations and simulation experiments in a graph database
Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar
2015-01-01
Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863
Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir
2018-01-01
Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.
Cross-Modal Retrieval With CNN Visual Features: A New Baseline.
Wei, Yunchao; Zhao, Yao; Lu, Canyi; Wei, Shikui; Liu, Luoqi; Zhu, Zhenfeng; Yan, Shuicheng
2017-02-01
Recently, convolutional neural network (CNN) visual features have demonstrated their powerful ability as a universal representation for various recognition tasks. In this paper, cross-modal retrieval with CNN visual features is implemented with several classic methods. Specifically, off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval. To further enhance the representational ability of CNN visual features, based on the pretrained CNN model on ImageNet, a fine-tuning step is performed by using the open source Caffe CNN library for each target data set. Besides, we propose a deep semantic matching method to address the cross-modal retrieval problem with respect to samples which are annotated with one or multiple labels. Extensive experiments on five popular publicly available data sets well demonstrate the superiority of CNN visual features for cross-modal retrieval.
Calling on a million minds for community annotation in WikiProteins
Mons, Barend; Ashburner, Michael; Chichester, Christine; van Mulligen, Erik; Weeber, Marc; den Dunnen, Johan; van Ommen, Gert-Jan; Musen, Mark; Cockerill, Matthew; Hermjakob, Henning; Mons, Albert; Packer, Abel; Pacheco, Roberto; Lewis, Suzanna; Berkeley, Alfred; Melton, William; Barris, Nickolas; Wales, Jimmy; Meijssen, Gerard; Moeller, Erik; Roes, Peter Jan; Borner, Katy; Bairoch, Amos
2008-01-01
WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at . PMID:18507872
Folksonomies and clustering in the collaborative system CiteULike
NASA Astrophysics Data System (ADS)
Capocci, Andrea; Caldarelli, Guido
2008-06-01
We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tri-partite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient can be used to analyze the semantical patterns among tags.
The MGED Ontology: a resource for semantics-based description of microarray experiments.
Whetzel, Patricia L; Parkinson, Helen; Causton, Helen C; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Game, Laurence; Heiskanen, Mervi; Morrison, Norman; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Taylor, Chris; White, Joseph; Stoeckert, Christian J
2006-04-01
The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards. Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation. The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at http://mged.sourceforge.net/ontologies/index.php. Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do). Stoeckrt@pcbi.upenn.edu Supplementary data are available at Bioinformatics online.
Generating region proposals for histopathological whole slide image retrieval.
Ma, Yibing; Jiang, Zhiguo; Zhang, Haopeng; Xie, Fengying; Zheng, Yushan; Shi, Huaqiang; Zhao, Yu; Shi, Jun
2018-06-01
Content-based image retrieval is an effective method for histopathological image analysis. However, given a database of huge whole slide images (WSIs), acquiring appropriate region-of-interests (ROIs) for training is significant and difficult. Moreover, histopathological images can only be annotated by pathologists, resulting in the lack of labeling information. Therefore, it is an important and challenging task to generate ROIs from WSI and retrieve image with few labels. This paper presents a novel unsupervised region proposing method for histopathological WSI based on Selective Search. Specifically, the WSI is over-segmented into regions which are hierarchically merged until the WSI becomes a single region. Nucleus-oriented similarity measures for region mergence and Nucleus-Cytoplasm color space for histopathological image are specially defined to generate accurate region proposals. Additionally, we propose a new semi-supervised hashing method for image retrieval. The semantic features of images are extracted with Latent Dirichlet Allocation and transformed into binary hashing codes with Supervised Hashing. The methods are tested on a large-scale multi-class database of breast histopathological WSIs. The results demonstrate that for one WSI, our region proposing method can generate 7.3 thousand contoured regions which fit well with 95.8% of the ROIs annotated by pathologists. The proposed hashing method can retrieve a query image among 136 thousand images in 0.29 s and reach precision of 91% with only 10% of images labeled. The unsupervised region proposing method can generate regions as predictions of lesions in histopathological WSI. The region proposals can also serve as the training samples to train machine-learning models for image retrieval. The proposed hashing method can achieve fast and precise image retrieval with small amount of labels. Furthermore, the proposed methods can be potentially applied in online computer-aided-diagnosis systems. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Graham, Matthew; Gray, N.; Burke, D.
2010-01-01
Many activities in the era of data-intensive astronomy are predicated upon some transference of domain knowledge and expertise from human to machine. The semantic infrastructure required to support this is no longer a pipe dream of computer science but a set of practical engineering challenges, more concerned with deployment and performance details than AI abstractions. The application of such ideas promises to help in such areas as contextual data access, exploiting distributed annotation and heterogeneous sources, and intelligent data dissemination and discovery. In this talk, we will review the status and use of semantic technologies in astronomy, particularly to address current problems in astroinformatics, with such projects as SKUA and AstroCollation.
Semantic Web repositories for genomics data using the eXframe platform
2014-01-01
Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.
2011-01-01
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598
Large-Scale medical image analytics: Recent methodologies, applications and Future directions.
Zhang, Shaoting; Metaxas, Dimitris
2016-10-01
Despite the ever-increasing amount and complexity of annotated medical image data, the development of large-scale medical image analysis algorithms has not kept pace with the need for methods that bridge the semantic gap between images and diagnoses. The goal of this position paper is to discuss and explore innovative and large-scale data science techniques in medical image analytics, which will benefit clinical decision-making and facilitate efficient medical data management. Particularly, we advocate that the scale of image retrieval systems should be significantly increased at which interactive systems can be effective for knowledge discovery in potentially large databases of medical images. For clinical relevance, such systems should return results in real-time, incorporate expert feedback, and be able to cope with the size, quality, and variety of the medical images and their associated metadata for a particular domain. The design, development, and testing of the such framework can significantly impact interactive mining in medical image databases that are growing rapidly in size and complexity and enable novel methods of analysis at much larger scales in an efficient, integrated fashion. Copyright © 2016. Published by Elsevier B.V.
DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.
Mazandu, Gaston K; Mulder, Nicola J
2013-09-25
The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.
A transversal approach to predict gene product networks from ontology-based similarity
Chabalier, Julie; Mosser, Jean; Burgun, Anita
2007-01-01
Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression. PMID:17605807
Enhancing acronym/abbreviation knowledge bases with semantic information.
Torii, Manabu; Liu, Hongfang
2007-10-11
In the biomedical domain, a terminology knowledge base that associates acronyms/abbreviations (denoted as SFs) with the definitions (denoted as LFs) is highly needed. For the construction such terminology knowledge base, we investigate the feasibility to build a system automatically assigning semantic categories to LFs extracted from text. Given a collection of pairs (SF,LF) derived from text, we i) assess the coverage of LFs and pairs (SF,LF) in the UMLS and justify the need of a semantic category assignment system; and ii) automatically derive name phrases annotated with semantic category and construct a system using machine learning. Utilizing ADAM, an existing collection of (SF,LF) pairs extracted from MEDLINE, our system achieved an f-measure of 87% when assigning eight UMLS-based semantic groups to LFs. The system has been incorporated into a web interface which integrates SF knowledge from multiple SF knowledge bases. Web site: http://gauss.dbb.georgetown.edu/liblab/SFThesurus.
Charlet, J; Darmoni, S J
2015-08-13
To summarize the best papers in the field of Knowledge Representation and Management (KRM). A comprehensive review of medical informatics literature was performed to select some of the most interesting papers of KRM published in 2014. Four articles were selected, two focused on annotation and information retrieval using an ontology. The two others focused mainly on ontologies, one dealing with the usage of a temporal ontology in order to analyze the content of narrative document, one describing a methodology for building multilingual ontologies. Semantic models began to show their efficiency, coupled with annotation tools.
TOPSAN: a dynamic web database for structural genomics.
Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John
2011-01-01
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
a Clustering-Based Approach for Evaluation of EO Image Indexing
NASA Astrophysics Data System (ADS)
Bahmanyar, R.; Rigoll, G.; Datcu, M.
2013-09-01
The volume of Earth Observation data is increasing immensely in order of several Terabytes a day. Therefore, to explore and investigate the content of this huge amount of data, developing more sophisticated Content-Based Information Retrieval (CBIR) systems are highly demanded. These systems should be able to not only discover unknown structures behind the data, but also provide relevant results to the users' queries. Since in any retrieval system the images are processed based on a discrete set of their features (i.e., feature descriptors), study and assessment of the structure of feature space, build by different feature descriptors, is of high importance. In this paper, we introduce a clustering-based approach to study the content of image collections. In our approach, we claim that using both internal and external evaluation of clusters for different feature descriptors, helps to understand the structure of feature space. Moreover, the semantic understanding of users about the images also can be assessed. To validate the performance of our approach, we used an annotated Synthetic Aperture Radar (SAR) image collection. Quantitative results besides the visualization of feature space demonstrate the applicability of our approach.
Biotea: semantics for Pubmed Central.
Garcia, Alexander; Lopez, Federico; Garcia, Leyla; Giraldo, Olga; Bucheli, Victor; Dumontier, Michel
2018-01-01
A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.
Percha, Bethany; Altman, Russ B
2013-01-01
The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.
Percha, Bethany; Altman, Russ B.
2013-01-01
The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology. PMID:24551397
Kovalenko, Lyudmyla Y; Chaumon, Maximilien; Busch, Niko A
2012-07-01
Semantic processing of verbal and visual stimuli has been investigated in semantic violation or semantic priming paradigms in which a stimulus is either related or unrelated to a previously established semantic context. A hallmark of semantic priming is the N400 event-related potential (ERP)--a deflection of the ERP that is more negative for semantically unrelated target stimuli. The majority of studies investigating the N400 and semantic integration have used verbal material (words or sentences), and standardized stimulus sets with norms for semantic relatedness have been published for verbal but not for visual material. However, semantic processing of visual objects (as opposed to words) is an important issue in research on visual cognition. In this study, we present a set of 800 pairs of semantically related and unrelated visual objects. The images were rated for semantic relatedness by a sample of 132 participants. Furthermore, we analyzed low-level image properties and matched the two semantic categories according to these features. An ERP study confirmed the suitability of this image set for evoking a robust N400 effect of semantic integration. Additionally, using a general linear modeling approach of single-trial data, we also demonstrate that low-level visual image properties and semantic relatedness are in fact only minimally overlapping. The image set is available for download from the authors' website. We expect that the image set will facilitate studies investigating mechanisms of semantic and contextual processing of visual stimuli.
Methodology for the inference of gene function from phenotype data.
Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A
2014-12-12
Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.
Evaluation of web-based annotation of ophthalmic images for multicentric clinical trials.
Chalam, K V; Jain, P; Shah, V A; Shah, Gaurav Y
2006-06-01
An Internet browser-based annotation system can be used to identify and describe features in digitalized retinal images, in multicentric clinical trials, in real time. In this web-based annotation system, the user employs a mouse to draw and create annotations on a transparent layer, that encapsulates the observations and interpretations of a specific image. Multiple annotation layers may be overlaid on a single image. These layers may correspond to annotations by different users on the same image or annotations of a temporal sequence of images of a disease process, over a period of time. In addition, geometrical properties of annotated figures may be computed and measured. The annotations are stored in a central repository database on a server, which can be retrieved by multiple users in real time. This system facilitates objective evaluation of digital images and comparison of double-blind readings of digital photographs, with an identifiable audit trail. Annotation of ophthalmic images allowed clinically feasible and useful interpretation to track properties of an area of fundus pathology. This provided an objective method to monitor properties of pathologies over time, an essential component of multicentric clinical trials. The annotation system also allowed users to view stereoscopic images that are stereo pairs. This web-based annotation system is useful and valuable in monitoring patient care, in multicentric clinical trials, telemedicine, teaching and routine clinical settings.
Storck, Michael; Krumm, Rainer; Dugas, Martin
2016-01-01
Medical documentation is applied in various settings including patient care and clinical research. Since procedures of medical documentation are heterogeneous and developed further, secondary use of medical data is complicated. Development of medical forms, merging of data from different sources and meta-analyses of different data sets are currently a predominantly manual process and therefore difficult and cumbersome. Available applications to automate these processes are limited. In particular, tools to compare multiple documentation forms are missing. The objective of this work is to design, implement and evaluate the new system ODMSummary for comparison of multiple forms with a high number of semantically annotated data elements and a high level of usability. System requirements are the capability to summarize and compare a set of forms, enable to estimate the documentation effort, track changes in different versions of forms and find comparable items in different forms. Forms are provided in Operational Data Model format with semantic annotations from the Unified Medical Language System. 12 medical experts were invited to participate in a 3-phase evaluation of the tool regarding usability. ODMSummary (available at https://odmtoolbox.uni-muenster.de/summary/summary.html) provides a structured overview of multiple forms and their documentation fields. This comparison enables medical experts to assess multiple forms or whole datasets for secondary use. System usability was optimized based on expert feedback. The evaluation demonstrates that feedback from domain experts is needed to identify usability issues. In conclusion, this work shows that automatic comparison of multiple forms is feasible and the results are usable for medical experts.
NASA Astrophysics Data System (ADS)
Borne, K. D.
2009-12-01
The emergence of e-Science over the past decade as a paradigm for Internet-based science was an inevitable evolution of science that built upon the web protocols and access patterns that were prevalent at that time, including Web Services, XML-based information exchange, machine-to-machine communication, service registries, the Grid, and distributed data. We now see a major shift in web behavior patterns to social networks, user-provided content (e.g., tags and annotations), ubiquitous devices, user-centric experiences, and user-led activities. The inevitable accrual of these social networking patterns and protocols by scientists and science projects leads to U-Science as a new paradigm for online scientific research (i.e., ubiquitous, user-led, untethered, You-centered science). U-Science applications include components from semantic e-science (ontologies, taxonomies, folksonomies, tagging, annotations, and classification systems), which is much more than Web 2.0-based science (Wikis, blogs, and online environments like Second Life). Among the best examples of U-Science are Citizen Science projects, including Galaxy Zoo, Stardust@Home, Project Budburst, Volksdata, CoCoRaHS (the Community Collaborative Rain, Hail and Snow network), and projects utilizing Volunteer Geographic Information (VGI). There are also scientist-led projects for scientists that engage a wider community in building knowledge through user-provided content. Among the semantic-based U-Science projects for scientists are those that specifically enable user-based annotation of scientific results in databases. These include the Heliophysics Knowledgebase, BioDAS, WikiProteins, The Entity Describer, and eventually AstroDAS. Such collaborative tagging of scientific data addresses several petascale data challenges for scientists: how to find the most relevant data, how to reuse those data, how to integrate data from multiple sources, how to mine and discover new knowledge in large databases, how to represent and encode the new knowledge, and how to curate the discovered knowledge. This talk will address the emergence of U-Science as a type of Semantic e-Science, and will explore challenges, implementations, and results. Semantic e-Science and U-Science applications and concepts will be discussed within the context of one particular implementation (AstroDAS: Astronomy Distributed Annotation System) and its applicability to petascale science projects such as the LSST (Large Synoptic Survey Telescope), coming online within the next few years.
Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E
2017-08-17
Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.
A unified framework for image retrieval using keyword and visual features.
Jing, Feng; Li, Mingling; Zhang, Hong-Jiang; Zhang, Bo
2005-07-01
In this paper, a unified image retrieval framework based on both keyword annotations and visual features is proposed. In this framework, a set of statistical models are built based on visual features of a small set of manually labeled images to represent semantic concepts and used to propagate keywords to other unlabeled images. These models are updated periodically when more images implicitly labeled by users become available through relevance feedback. In this sense, the keyword models serve the function of accumulation and memorization of knowledge learned from user-provided relevance feedback. Furthermore, two sets of effective and efficient similarity measures and relevance feedback schemes are proposed for query by keyword scenario and query by image example scenario, respectively. Keyword models are combined with visual features in these schemes. In particular, a new, entropy-based active learning strategy is introduced to improve the efficiency of relevance feedback for query by keyword. Furthermore, a new algorithm is proposed to estimate the keyword features of the search concept for query by image example. It is shown to be more appropriate than two existing relevance feedback algorithms. Experimental results demonstrate the effectiveness of the proposed framework.
Chapman, Wendy W.; Dowling, John N.
2006-01-01
Evaluating automated indexing applications requires comparing automatically indexed terms against manual reference standard annotations. However, there are no standard guidelines for determining which words from a textual document to include in manual annotations, and the vague task can result in substantial variation among manual indexers. We applied grounded theory to emergency department reports to create an annotation schema representing syntactic and semantic variables that could be annotated when indexing clinical conditions. We describe the annotation schema, which includes variables representing medical concepts (e.g., symptom, demographics), linguistic form (e.g., noun, adjective), and modifier types (e.g., anatomic location, severity). We measured the schema’s quality and found: (1) the schema was comprehensive enough to be applied to 20 unseen reports without changes to the schema; (2) agreement between author annotators applying the schema was high, with an F measure of 93%; and (3) an error analysis showed that the authors made complementary errors when applying the schema, demonstrating that the schema incorporates both linguistic and medical expertise. PMID:16230050
A Case Study on Sepsis Using PubMed and Deep Learning for Ontology Learning.
Arguello Casteleiro, Mercedes; Maseda Fernandez, Diego; Demetriou, George; Read, Warren; Fernandez Prieto, Maria Jesus; Des Diz, Julio; Nenadic, Goran; Keane, John; Stevens, Robert
2017-01-01
We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision.
Zhou, Xiangrong; Takayama, Ryosuke; Wang, Song; Hara, Takeshi; Fujita, Hiroshi
2017-10-01
We propose a single network trained by pixel-to-label deep learning to address the general issue of automatic multiple organ segmentation in three-dimensional (3D) computed tomography (CT) images. Our method can be described as a voxel-wise multiple-class classification scheme for automatically assigning labels to each pixel/voxel in a 2D/3D CT image. We simplify the segmentation algorithms of anatomical structures (including multiple organs) in a CT image (generally in 3D) to a majority voting scheme over the semantic segmentation of multiple 2D slices drawn from different viewpoints with redundancy. The proposed method inherits the spirit of fully convolutional networks (FCNs) that consist of "convolution" and "deconvolution" layers for 2D semantic image segmentation, and expands the core structure with 3D-2D-3D transformations to adapt to 3D CT image segmentation. All parameters in the proposed network are trained pixel-to-label from a small number of CT cases with human annotations as the ground truth. The proposed network naturally fulfills the requirements of multiple organ segmentations in CT cases of different sizes that cover arbitrary scan regions without any adjustment. The proposed network was trained and validated using the simultaneous segmentation of 19 anatomical structures in the human torso, including 17 major organs and two special regions (lumen and content inside of stomach). Some of these structures have never been reported in previous research on CT segmentation. A database consisting of 240 (95% for training and 5% for testing) 3D CT scans, together with their manually annotated ground-truth segmentations, was used in our experiments. The results show that the 19 structures of interest were segmented with acceptable accuracy (88.1% and 87.9% voxels in the training and testing datasets, respectively, were labeled correctly) against the ground truth. We propose a single network based on pixel-to-label deep learning to address the challenging issue of anatomical structure segmentation in 3D CT cases. The novelty of this work is the policy of deep learning of the different 2D sectional appearances of 3D anatomical structures for CT cases and the majority voting of the 3D segmentation results from multiple crossed 2D sections to achieve availability and reliability with better efficiency, generality, and flexibility than conventional segmentation methods, which must be guided by human expertise. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Clinical applications of textural analysis in non-small cell lung cancer.
Phillips, Iain; Ajaz, Mazhar; Ezhil, Veni; Prakash, Vineet; Alobaidli, Sheaka; McQuaid, Sarah J; South, Christopher; Scuffham, James; Nisbet, Andrew; Evans, Philip
2018-01-01
Lung cancer is the leading cause of cancer mortality worldwide. Treatment pathways include regular cross-sectional imaging, generating large data sets which present intriguing possibilities for exploitation beyond standard visual interpretation. This additional data mining has been termed "radiomics" and includes semantic and agnostic approaches. Textural analysis (TA) is an example of the latter, and uses a range of mathematically derived features to describe an image or region of an image. Often TA is used to describe a suspected or known tumour. TA is an attractive tool as large existing image sets can be submitted to diverse techniques for data processing, presentation, interpretation and hypothesis testing with annotated clinical outcomes. There is a growing anthology of published data using different TA techniques to differentiate between benign and malignant lung nodules, differentiate tissue subtypes of lung cancer, prognosticate and predict outcome and treatment response, as well as predict treatment side effects and potentially aid radiotherapy planning. The aim of this systematic review is to summarize the current published data and understand the potential future role of TA in managing lung cancer.
An ontology design pattern for surface water features
Sinha, Gaurav; Mark, David; Kolas, Dave; Varanka, Dalia; Romero, Boleslo E.; Feng, Chen-Chieh; Usery, E. Lynn; Liebermann, Joshua; Sorokine, Alexandre
2014-01-01
Surface water is a primary concept of human experience but concepts are captured in cultures and languages in many different ways. Still, many commonalities exist due to the physical basis of many of the properties and categories. An abstract ontology of surface water features based only on those physical properties of landscape features has the best potential for serving as a foundational domain ontology for other more context-dependent ontologies. The Surface Water ontology design pattern was developed both for domain knowledge distillation and to serve as a conceptual building-block for more complex or specialized surface water ontologies. A fundamental distinction is made in this ontology between landscape features that act as containers (e.g., stream channels, basins) and the bodies of water (e.g., rivers, lakes) that occupy those containers. Concave (container) landforms semantics are specified in a Dry module and the semantics of contained bodies of water in a Wet module. The pattern is implemented in OWL, but Description Logic axioms and a detailed explanation is provided in this paper. The OWL ontology will be an important contribution to Semantic Web vocabulary for annotating surface water feature datasets. Also provided is a discussion of why there is a need to complement the pattern with other ontologies, especially the previously developed Surface Network pattern. Finally, the practical value of the pattern in semantic querying of surface water datasets is illustrated through an annotated geospatial dataset and sample queries using the classes of the Surface Water pattern.
Cruz-Roa, Angel; Díaz, Gloria; Romero, Eduardo; González, Fabio A.
2011-01-01
Histopathological images are an important resource for clinical diagnosis and biomedical research. From an image understanding point of view, the automatic annotation of these images is a challenging problem. This paper presents a new method for automatic histopathological image annotation based on three complementary strategies, first, a part-based image representation, called the bag of features, which takes advantage of the natural redundancy of histopathological images for capturing the fundamental patterns of biological structures, second, a latent topic model, based on non-negative matrix factorization, which captures the high-level visual patterns hidden in the image, and, third, a probabilistic annotation model that links visual appearance of morphological and architectural features associated to 10 histopathological image annotations. The method was evaluated using 1,604 annotated images of skin tissues, which included normal and pathological architectural and morphological features, obtaining a recall of 74% and a precision of 50%, which improved a baseline annotation method based on support vector machines in a 64% and 24%, respectively. PMID:22811960
Bigdely-Shamlo, Nima; Cockfield, Jeremy; Makeig, Scott; Rognon, Thomas; La Valle, Chris; Miyakoshi, Makoto; Robbins, Kay A.
2016-01-01
Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies. PMID:27799907
Semantic Document Model to Enhance Data and Knowledge Interoperability
NASA Astrophysics Data System (ADS)
Nešić, Saša
To enable document data and knowledge to be efficiently shared and reused across application, enterprise, and community boundaries, desktop documents should be completely open and queryable resources, whose data and knowledge are represented in a form understandable to both humans and machines. At the same time, these are the requirements that desktop documents need to satisfy in order to contribute to the visions of the Semantic Web. With the aim of achieving this goal, we have developed the Semantic Document Model (SDM), which turns desktop documents into Semantic Documents as uniquely identified and semantically annotated composite resources, that can be instantiated into human-readable (HR) and machine-processable (MP) forms. In this paper, we present the SDM along with an RDF and ontology-based solution for the MP document instance. Moreover, on top of the proposed model, we have built the Semantic Document Management System (SDMS), which provides a set of services that exploit the model. As an application example that takes advantage of SDMS services, we have extended MS Office with a set of tools that enables users to transform MS Office documents (e.g., MS Word and MS PowerPoint) into Semantic Documents, and to search local and distant semantic document repositories for document content units (CUs) over Semantic Web protocols.
DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures
2013-01-01
Background The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. Results We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. Conclusions The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis. PMID:24067102
The caBIG annotation and image Markup project.
Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Sepukar, Kastubh; Rubin, Daniel L
2010-04-01
Image annotation and markup are at the core of medical interpretation in both the clinical and the research setting. Digital medical images are managed with the DICOM standard format. While DICOM contains a large amount of meta-data about whom, where, and how the image was acquired, DICOM says little about the content or meaning of the pixel data. An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human or machine observer. An image markup is the graphical symbols placed over the image to depict an annotation. While DICOM is the standard for medical image acquisition, manipulation, transmission, storage, and display, there are no standards for image annotation and markup. Many systems expect annotation to be reported verbally, while markups are stored in graphical overlays or proprietary formats. This makes it difficult to extract and compute with both of them. The goal of the Annotation and Image Markup (AIM) project is to develop a mechanism, for modeling, capturing, and serializing image annotation and markup data that can be adopted as a standard by the medical imaging community. The AIM project produces both human- and machine-readable artifacts. This paper describes the AIM information model, schemas, software libraries, and tools so as to prepare researchers and developers for their use of AIM.
Large-scale weakly supervised object localization via latent category learning.
Chong Wang; Kaiqi Huang; Weiqiang Ren; Junge Zhang; Maybank, Steve
2015-04-01
Localizing objects in cluttered backgrounds is challenging under large-scale weakly supervised conditions. Due to the cluttered image condition, objects usually have large ambiguity with backgrounds. Besides, there is also a lack of effective algorithm for large-scale weakly supervised localization in cluttered backgrounds. However, backgrounds contain useful latent information, e.g., the sky in the aeroplane class. If this latent information can be learned, object-background ambiguity can be largely reduced and background can be suppressed effectively. In this paper, we propose the latent category learning (LCL) in large-scale cluttered conditions. LCL is an unsupervised learning method which requires only image-level class labels. First, we use the latent semantic analysis with semantic object representation to learn the latent categories, which represent objects, object parts or backgrounds. Second, to determine which category contains the target object, we propose a category selection strategy by evaluating each category's discrimination. Finally, we propose the online LCL for use in large-scale conditions. Evaluation on the challenging PASCAL Visual Object Class (VOC) 2007 and the large-scale imagenet large-scale visual recognition challenge 2013 detection data sets shows that the method can improve the annotation precision by 10% over previous methods. More importantly, we achieve the detection precision which outperforms previous results by a large margin and can be competitive to the supervised deformable part model 5.0 baseline on both data sets.
A Query Integrator and Manager for the Query Web
Brinkley, James F.; Detwiler, Landon T.
2012-01-01
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions. PMID:22531831
Zhang, Jiongmin; Jia, Ke; Jia, Jinmeng; Qian, Ying
2018-04-27
Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .
Krumm, Rainer; Dugas, Martin
2016-01-01
Introduction Medical documentation is applied in various settings including patient care and clinical research. Since procedures of medical documentation are heterogeneous and developed further, secondary use of medical data is complicated. Development of medical forms, merging of data from different sources and meta-analyses of different data sets are currently a predominantly manual process and therefore difficult and cumbersome. Available applications to automate these processes are limited. In particular, tools to compare multiple documentation forms are missing. The objective of this work is to design, implement and evaluate the new system ODMSummary for comparison of multiple forms with a high number of semantically annotated data elements and a high level of usability. Methods System requirements are the capability to summarize and compare a set of forms, enable to estimate the documentation effort, track changes in different versions of forms and find comparable items in different forms. Forms are provided in Operational Data Model format with semantic annotations from the Unified Medical Language System. 12 medical experts were invited to participate in a 3-phase evaluation of the tool regarding usability. Results ODMSummary (available at https://odmtoolbox.uni-muenster.de/summary/summary.html) provides a structured overview of multiple forms and their documentation fields. This comparison enables medical experts to assess multiple forms or whole datasets for secondary use. System usability was optimized based on expert feedback. Discussion The evaluation demonstrates that feedback from domain experts is needed to identify usability issues. In conclusion, this work shows that automatic comparison of multiple forms is feasible and the results are usable for medical experts. PMID:27736972
2014-01-06
products derived from this funding. This includes two proposed activities for Summer 2014: • Deep Semantic Annotation with Shallow Methods; James... process that we need to ensure that words are unambiguous before we read them (present in just the semantic field that is presently active). Publication...Technical Report). MIT Artificial Intelligence Laboratory. Allen, J., Manshadi, M., Dzikovska, M., & Swift, M. (2007). Deep linguistic processing for
GeneView: a comprehensive semantic search engine for PubMed.
Thomas, Philippe; Starlinger, Johannes; Vowinkel, Alexander; Arzt, Sebastian; Leser, Ulf
2012-07-01
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E.; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data. PMID:21045053
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.
Automated UMLS-Based Comparison of Medical Forms
Dugas, Martin; Fritz, Fleur; Krumm, Rainer; Breil, Bernhard
2013-01-01
Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far – to our knowledge – an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms. PMID:23861827
Semantics-enabled service discovery framework in the SIMDAT pharma grid.
Qu, Cangtao; Zimmermann, Falk; Kumpf, Kai; Kamuzinzi, Richard; Ledent, Valérie; Herzog, Robert
2008-03-01
We present the design and implementation of a semantics-enabled service discovery framework in the data Grids for process and product development using numerical simulation and knowledge discovery (SIMDAT) Pharma Grid, an industry-oriented Grid environment for integrating thousands of Grid-enabled biological data services and analysis services. The framework consists of three major components: the Web ontology language (OWL)-description logic (DL)-based biological domain ontology, OWL Web service ontology (OWL-S)-based service annotation, and semantic matchmaker based on the ontology reasoning. Built upon the framework, workflow technologies are extensively exploited in the SIMDAT to assist biologists in (semi)automatically performing in silico experiments. We present a typical usage scenario through the case study of a biological workflow: IXodus.
Jiang, Guoqian; Wang, Chen; Zhu, Qian; Chute, Christopher G
2013-01-01
Knowledge-driven text mining is becoming an important research area for identifying pharmacogenomics target genes. However, few of such studies have been focused on the pharmacogenomics targets of adverse drug events (ADEs). The objective of the present study is to build a framework of knowledge integration and discovery that aims to support pharmacogenomics target predication of ADEs. We integrate a semantically annotated literature corpus Semantic MEDLINE with a semantically coded ADE knowledgebase known as ADEpedia using a semantic web based framework. We developed a knowledge discovery approach combining a network analysis of a protein-protein interaction (PPI) network and a gene functional classification approach. We performed a case study of drug-induced long QT syndrome for demonstrating the usefulness of the framework in predicting potential pharmacogenomics targets of ADEs.
Lekschas, Fritz; Stachelscheid, Harald; Seltmann, Stefanie; Kurtz, Andreas
2015-03-01
Advancing technologies generate large amounts of molecular and phenotypic data on cells, tissues and organisms, leading to an ever-growing detail and complexity while information retrieval and analysis becomes increasingly time-consuming. The Semantic Body Browser is a web application for intuitively exploring the body of an organism from the organ to the subcellular level and visualising expression profiles by means of semantically annotated anatomical illustrations. It is used to comprehend biological and medical data related to the different body structures while relying on the strong pattern recognition capabilities of human users. The Semantic Body Browser is a JavaScript web application that is freely available at http://sbb.cellfinder.org. The source code is provided on https://github.com/flekschas/sbb. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A System for the Semantic Multimodal Analysis of News Audio-Visual Content
NASA Astrophysics Data System (ADS)
Mezaris, Vasileios; Gidaros, Spyros; Papadopoulos, GeorgiosTh; Kasper, Walter; Steffen, Jörg; Ordelman, Roeland; Huijbregts, Marijn; de Jong, Franciska; Kompatsiaris, Ioannis; Strintzis, MichaelG
2010-12-01
News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news content, the automatic analysis and annotation that is required for supporting intelligent search and delivery of this content remains an open issue. In this paper, a complete architecture for knowledge-assisted multimodal analysis of news-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately and proposes a novel fusion technique based on the particular characteristics of news-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic annotations.
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.
Yu, Guangchuang; Wang, Li-Gen; Yan, Guang-Rong; He, Qing-Yu
2015-02-15
Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary data are available at Bioinformatics online. gcyu@connect.hku.hk or tqyhe@jnu.edu.cn. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Rassinoux, A-M
2011-01-01
To summarize excellent current research in the field of knowledge representation and management (KRM). A synopsis of the articles selected for the IMIA Yearbook 2011 is provided and an attempt to highlight the current trends in the field is sketched. This last decade, with the extension of the text-based web towards a semantic-structured web, NLP techniques have experienced a renewed interest in knowledge extraction. This trend is corroborated through the five papers selected for the KRM section of the Yearbook 2011. They all depict outstanding studies that exploit NLP technologies whenever possible in order to accurately extract meaningful information from various biomedical textual sources. Bringing semantic structure to the meaningful content of textual web pages affords the user with cooperative sharing and intelligent finding of electronic data. As exemplified by the best paper selection, more and more advanced biomedical applications aim at exploiting the meaningful richness of free-text documents in order to generate semantic metadata and recently to learn and populate domain ontologies. These later are becoming a key piece as they allow portraying the semantics of the Semantic Web content. Maintaining their consistency with documents and semantic annotations that refer to them is a crucial challenge of the Semantic Web for the coming years.
Contextually guided very-high-resolution imagery classification with semantic segments
NASA Astrophysics Data System (ADS)
Zhao, Wenzhi; Du, Shihong; Wang, Qiao; Emery, William J.
2017-10-01
Contextual information, revealing relationships and dependencies between image objects, is one of the most important information for the successful interpretation of very-high-resolution (VHR) remote sensing imagery. Over the last decade, geographic object-based image analysis (GEOBIA) technique has been widely used to first divide images into homogeneous parts, and then to assign semantic labels according to the properties of image segments. However, due to the complexity and heterogeneity of VHR images, segments without semantic labels (i.e., semantic-free segments) generated with low-level features often fail to represent geographic entities (such as building roofs usually be partitioned into chimney/antenna/shadow parts). As a result, it is hard to capture contextual information across geographic entities when using semantic-free segments. In contrast to low-level features, "deep" features can be used to build robust segments with accurate labels (i.e., semantic segments) in order to represent geographic entities at higher levels. Based on these semantic segments, semantic graphs can be constructed to capture contextual information in VHR images. In this paper, semantic segments were first explored with convolutional neural networks (CNN) and a conditional random field (CRF) model was then applied to model the contextual information between semantic segments. Experimental results on two challenging VHR datasets (i.e., the Vaihingen and Beijing scenes) indicate that the proposed method is an improvement over existing image classification techniques in classification performance (overall accuracy ranges from 82% to 96%).
The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease
Groza, Tudor; Köhler, Sebastian; Moldenhauer, Dawid; Vasilevsky, Nicole; Baynam, Gareth; Zemojtel, Tomasz; Schriml, Lynn Marie; Kibbe, Warren Alden; Schofield, Paul N.; Beck, Tim; Vasant, Drashtti; Brookes, Anthony J.; Zankl, Andreas; Washington, Nicole L.; Mungall, Christopher J.; Lewis, Suzanna E.; Haendel, Melissa A.; Parkinson, Helen; Robinson, Peter N.
2015-01-01
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available. PMID:26119816
Impact of ontology evolution on functional analyses.
Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard
2012-10-15
Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.
Computer systems for annotation of single molecule fragments
Schwartz, David Charles; Severin, Jessica
2016-07-19
There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.
Image segmentation via foreground and background semantic descriptors
NASA Astrophysics Data System (ADS)
Yuan, Ding; Qiang, Jingjing; Yin, Jihao
2017-09-01
In the field of image processing, it has been a challenging task to obtain a complete foreground that is not uniform in color or texture. Unlike other methods, which segment the image by only using low-level features, we present a segmentation framework, in which high-level visual features, such as semantic information, are used. First, the initial semantic labels were obtained by using the nonparametric method. Then, a subset of the training images, with a similar foreground to the input image, was selected. Consequently, the semantic labels could be further refined according to the subset. Finally, the input image was segmented by integrating the object affinity and refined semantic labels. State-of-the-art performance was achieved in experiments with the challenging MSRC 21 dataset.
A Bloom Filter-Powered Technique Supporting Scalable Semantic Discovery in Data Service Networks
NASA Astrophysics Data System (ADS)
Zhang, J.; Shi, R.; Bao, Q.; Lee, T. J.; Ramachandran, R.
2016-12-01
More and more Earth data analytics software products are published onto the Internet as a service, in the format of either heavyweight WSDL service or lightweight RESTful API. Such reusable data analytics services form a data service network, which allows Earth scientists to compose (mashup) services into value-added ones. Therefore, it is important to have a technique that is capable of helping Earth scientists quickly identify appropriate candidate datasets and services in the global data service network. Most existing services discovery techniques, however, mainly rely on syntax or semantics-based service matchmaking between service requests and available services. Since the scale of the data service network is increasing rapidly, the run-time computational cost will soon become a bottleneck. To address this issue, this project presents a way of applying network routing mechanism to facilitate data service discovery in a service network, featuring scalability and performance. Earth data services are automatically annotated in Web Ontology Language for Services (OWL-S) based on their metadata, semantic information, and usage history. Deterministic Annealing (DA) technique is applied to dynamically organize annotated data services into a hierarchical network, where virtual routers are created to represent semantic local network featuring leading terms. Afterwards Bloom Filters are generated over virtual routers. A data service search request is transformed into a network routing problem in order to quickly locate candidate services through network hierarchy. A neural network-powered technique is applied to assure network address encoding and routing performance. A series of empirical study has been conducted to evaluate the applicability and effectiveness of the proposed approach.
Beijbom, Oscar; Edmunds, Peter J.; Roelfsema, Chris; Smith, Jennifer; Kline, David I.; Neal, Benjamin P.; Dunlap, Matthew J.; Moriarty, Vincent; Fan, Tung-Yung; Tan, Chih-Jui; Chan, Stephen; Treibitz, Tali; Gamst, Anthony; Mitchell, B. Greg; Kriegman, David
2015-01-01
Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys. PMID:26154157
Learning to merge: a new tool for interactive mapping
NASA Astrophysics Data System (ADS)
Porter, Reid B.; Lundquist, Sheng; Ruggiero, Christy
2013-05-01
The task of turning raw imagery into semantically meaningful maps and overlays is a key area of remote sensing activity. Image analysts, in applications ranging from environmental monitoring to intelligence, use imagery to generate and update maps of terrain, vegetation, road networks, buildings and other relevant features. Often these tasks can be cast as a pixel labeling problem, and several interactive pixel labeling tools have been developed. These tools exploit training data, which is generated by analysts using simple and intuitive paint-program annotation tools, in order to tailor the labeling algorithm for the particular dataset and task. In other cases, the task is best cast as a pixel segmentation problem. Interactive pixel segmentation tools have also been developed, but these tools typically do not learn from training data like the pixel labeling tools do. In this paper we investigate tools for interactive pixel segmentation that also learn from user input. The input has the form of segment merging (or grouping). Merging examples are 1) easily obtained from analysts using vector annotation tools, and 2) more challenging to exploit than traditional labels. We outline the key issues in developing these interactive merging tools, and describe their application to remote sensing.
Young, Nelson; Chang, Zhan; Wishart, David S
2004-04-12
GelScape is a web-based tool that permits facile, interactive annotation, comparison, manipulation and storage of protein gel images. It uses Java applet-servlet technology to allow rapid, remote image handling and image processing in a platform-independent manner. It supports many of the features found in commercial, stand-alone gel analysis software including spot annotation, spot integration, gel warping, image resizing, HTML image mapping, image overlaying as well as the storage of gel image and gel annotation data in compliance with Federated Gel Database requirements.
Multi-Atlas Segmentation using Partially Annotated Data: Methods and Annotation Strategies.
Koch, Lisa M; Rajchl, Martin; Bai, Wenjia; Baumgartner, Christian F; Tong, Tong; Passerat-Palmbach, Jonathan; Aljabar, Paul; Rueckert, Daniel
2017-08-22
Multi-atlas segmentation is a widely used tool in medical image analysis, providing robust and accurate results by learning from annotated atlas datasets. However, the availability of fully annotated atlas images for training is limited due to the time required for the labelling task. Segmentation methods requiring only a proportion of each atlas image to be labelled could therefore reduce the workload on expert raters tasked with annotating atlas images. To address this issue, we first re-examine the labelling problem common in many existing approaches and formulate its solution in terms of a Markov Random Field energy minimisation problem on a graph connecting atlases and the target image. This provides a unifying framework for multi-atlas segmentation. We then show how modifications in the graph configuration of the proposed framework enable the use of partially annotated atlas images and investigate different partial annotation strategies. The proposed method was evaluated on two Magnetic Resonance Imaging (MRI) datasets for hippocampal and cardiac segmentation. Experiments were performed aimed at (1) recreating existing segmentation techniques with the proposed framework and (2) demonstrating the potential of employing sparsely annotated atlas data for multi-atlas segmentation.
Recognizable or Not: Towards Image Semantic Quality Assessment for Compression
NASA Astrophysics Data System (ADS)
Liu, Dong; Wang, Dandan; Li, Houqiang
2017-12-01
Traditionally, image compression was optimized for the pixel-wise fidelity or the perceptual quality of the compressed images given a bit-rate budget. But recently, compressed images are more and more utilized for automatic semantic analysis tasks such as recognition and retrieval. For these tasks, we argue that the optimization target of compression is no longer perceptual quality, but the utility of the compressed images in the given automatic semantic analysis task. Accordingly, we propose to evaluate the quality of the compressed images neither at pixel level nor at perceptual level, but at semantic level. In this paper, we make preliminary efforts towards image semantic quality assessment (ISQA), focusing on the task of optical character recognition (OCR) from compressed images. We propose a full-reference ISQA measure by comparing the features extracted from text regions of original and compressed images. We then propose to integrate the ISQA measure into an image compression scheme. Experimental results show that our proposed ISQA measure is much better than PSNR and SSIM in evaluating the semantic quality of compressed images; accordingly, adopting our ISQA measure to optimize compression for OCR leads to significant bit-rate saving compared to using PSNR or SSIM. Moreover, we perform subjective test about text recognition from compressed images, and observe that our ISQA measure has high consistency with subjective recognizability. Our work explores new dimensions in image quality assessment, and demonstrates promising direction to achieve higher compression ratio for specific semantic analysis tasks.
Managing and Querying Image Annotation and Markup in XML.
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Managing and Querying Image Annotation and Markup in XML
Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel
2010-01-01
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167
Palmiero, Massimiliano; Di Matteo, Rosalia; Belardinelli, Marta Olivetti
2014-05-01
Two experiments comparing imaginative processing in different modalities and semantic processing were carried out to investigate the issue of whether conceptual knowledge can be represented in different format. Participants were asked to judge the similarity between visual images, auditory images, and olfactory images in the imaginative block, if two items belonged to the same category in the semantic block. Items were verbally cued in both experiments. The degree of similarity between the imaginative and semantic items was changed across experiments. Experiment 1 showed that the semantic processing was faster than the visual and the auditory imaginative processing, whereas no differentiation was possible between the semantic processing and the olfactory imaginative processing. Experiment 2 revealed that only the visual imaginative processing could be differentiated from the semantic processing in terms of accuracy. These results showed that the visual and auditory imaginative processing can be differentiated from the semantic processing, although both visual and auditory images strongly rely on semantic representations. On the contrary, no differentiation is possible within the olfactory domain. Results are discussed in the frame of the imagery debate.
A Gene Ontology Tutorial in Python.
Vesztrocy, Alex Warwick; Dessimoz, Christophe
2017-01-01
This chapter is a tutorial on using Gene Ontology resources in the Python programming language. This entails querying the Gene Ontology graph, retrieving Gene Ontology annotations, performing gene enrichment analyses, and computing basic semantic similarity between GO terms. An interactive version of the tutorial, including solutions, is available at http://gohandbook.org .
Semantic Annotation of Ubiquitous Learning Environments
ERIC Educational Resources Information Center
Weal, M. J.; Michaelides, D. T.; Page, K.; De Roure, D. C.; Monger, E.; Gobbi, M.
2012-01-01
Skills-based learning environments are used to promote the acquisition of practical skills as well as decision making, communication, and problem solving. It is important to provide feedback to the students from these sessions and observations of their actions may inform the assessment process and help researchers to better understand the learning…
Target-Oriented High-Resolution SAR Image Formation via Semantic Information Guided Regularizations
NASA Astrophysics Data System (ADS)
Hou, Biao; Wen, Zaidao; Jiao, Licheng; Wu, Qian
2018-04-01
Sparsity-regularized synthetic aperture radar (SAR) imaging framework has shown its remarkable performance to generate a feature enhanced high resolution image, in which a sparsity-inducing regularizer is involved by exploiting the sparsity priors of some visual features in the underlying image. However, since the simple prior of low level features are insufficient to describe different semantic contents in the image, this type of regularizer will be incapable of distinguishing between the target of interest and unconcerned background clutters. As a consequence, the features belonging to the target and clutters are simultaneously affected in the generated image without concerning their underlying semantic labels. To address this problem, we propose a novel semantic information guided framework for target oriented SAR image formation, which aims at enhancing the interested target scatters while suppressing the background clutters. Firstly, we develop a new semantics-specific regularizer for image formation by exploiting the statistical properties of different semantic categories in a target scene SAR image. In order to infer the semantic label for each pixel in an unsupervised way, we moreover induce a novel high-level prior-driven regularizer and some semantic causal rules from the prior knowledge. Finally, our regularized framework for image formation is further derived as a simple iteratively reweighted $\\ell_1$ minimization problem which can be conveniently solved by many off-the-shelf solvers. Experimental results demonstrate the effectiveness and superiority of our framework for SAR image formation in terms of target enhancement and clutters suppression, compared with the state of the arts. Additionally, the proposed framework opens a new direction of devoting some machine learning strategies to image formation, which can benefit the subsequent decision making tasks.
Assessment of disease named entity recognition on a corpus of annotated sentences.
Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich
2008-04-11
In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).
ERIC Educational Resources Information Center
Kurtz, Michael J.; Eichorn, Guenther; Accomazzi, Alberto; Grant, Carolyn S.; Demleitner, Markus; Murray, Stephen S.; Jones, Michael L. W.; Gay, Geri K.; Rieger, Robert H.; Millman, David; Bruggemann-Klein, Anne; Klein, Rolf; Landgraf, Britta; Wang, James Ze; Li, Jia; Chan, Desmond; Wiederhold, Gio; Pitti, Daniel V.
1999-01-01
Includes six articles that discuss a digital library for astronomy; comparing evaluations of digital collection efforts; cross-organizational access management of Web-based resources; searching scientific bibliographic databases based on content-based relations between documents; semantics-sensitive retrieval for digital picture libraries; and…
Lifting Events in RDF from Interactions with Annotated Web Pages
NASA Astrophysics Data System (ADS)
Stühmer, Roland; Anicic, Darko; Sen, Sinan; Ma, Jun; Schmidt, Kay-Uwe; Stojanovic, Nenad
In this paper we present a method and an implementation for creating and processing semantic events from interaction with Web pages which opens possibilities to build event-driven applications for the (Semantic) Web. Events, simple or complex, are models for things that happen e.g., when a user interacts with a Web page. Events are consumed in some meaningful way e.g., for monitoring reasons or to trigger actions such as responses. In order for receiving parties to understand events e.g., comprehend what has led to an event, we propose a general event schema using RDFS. In this schema we cover the composition of complex events and event-to-event relationships. These events can then be used to route semantic information about an occurrence to different recipients helping in making the Semantic Web active. Additionally, we present an architecture for detecting and composing events in Web clients. For the contents of events we show a way of how they are enriched with semantic information about the context in which they occurred. The paper is presented in conjunction with the use case of Semantic Advertising, which extends traditional clickstream analysis by introducing semantic short-term profiling, enabling discovery of the current interest of a Web user and therefore supporting advertisement providers in responding with more relevant advertisements.
Goede, Patricia A.; Lauman, Jason R.; Cochella, Christopher; Katzman, Gregory L.; Morton, David A.; Albertine, Kurt H.
2004-01-01
Use of digital medical images has become common over the last several years, coincident with the release of inexpensive, mega-pixel quality digital cameras and the transition to digital radiology operation by hospitals. One problem that clinicians, medical educators, and basic scientists encounter when handling images is the difficulty of using business and graphic arts commercial-off-the-shelf (COTS) software in multicontext authoring and interactive teaching environments. The authors investigated and developed software-supported methodologies to help clinicians, medical educators, and basic scientists become more efficient and effective in their digital imaging environments. The software that the authors developed provides the ability to annotate images based on a multispecialty methodology for annotation and visual knowledge representation. This annotation methodology is designed by consensus, with contributions from the authors and physicians, medical educators, and basic scientists in the Departments of Radiology, Neurobiology and Anatomy, Dermatology, and Ophthalmology at the University of Utah. The annotation methodology functions as a foundation for creating, using, reusing, and extending dynamic annotations in a context-appropriate, interactive digital environment. The annotation methodology supports the authoring process as well as output and presentation mechanisms. The annotation methodology is the foundation for a Windows implementation that allows annotated elements to be represented as structured eXtensible Markup Language and stored separate from the image(s). PMID:14527971
Semantic framework for mapping object-oriented model to semantic web languages
Ježek, Petr; Mouček, Roman
2015-01-01
The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework. PMID:25762923
Semantic framework for mapping object-oriented model to semantic web languages.
Ježek, Petr; Mouček, Roman
2015-01-01
The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework.
Noppeney, Uta; Price, Cathy J
2003-01-01
This paper considers how functional neuro-imaging can be used to investigate the organization of the semantic system and the limitations associated with this technique. The majority of the functional imaging studies of the semantic system have looked for divisions by varying stimulus category. These studies have led to divergent results and no clear anatomical hypotheses have emerged to account for the dissociations seen in behavioral studies. Only a few functional imaging studies have used task as a variable to differentiate the neural correlates of semantic features more directly. We extend these findings by presenting a new study that contrasts tasks that differentially weight sensory (color and taste) and verbally learned (origin) semantic features. Irrespective of the type of semantic feature retrieved, a common semantic system was activated as demonstrated in many previous studies. In addition, the retrieval of verbally learned, but not sensory-experienced, features enhanced activation in medial and lateral posterior parietal areas. We attribute these "verbally learned" effects to differences in retrieval strategy and conclude that evidence for segregation of semantic features at an anatomical level remains weak. We believe that functional imaging has the potential to increase our understanding of the neuronal infrastructure that sustains semantic processing but progress may require multiple experiments until a consistent explanatory framework emerges.
Detecting Surgical Tools by Modelling Local Appearance and Global Shape.
Bouget, David; Benenson, Rodrigo; Omran, Mohamed; Riffaud, Laurent; Schiele, Bernt; Jannin, Pierre
2015-12-01
Detecting tools in surgical videos is an important ingredient for context-aware computer-assisted surgical systems. To this end, we present a new surgical tool detection dataset and a method for joint tool detection and pose estimation in 2d images. Our two-stage pipeline is data-driven and relaxes strong assumptions made by previous works regarding the geometry, number, and position of tools in the image. The first stage classifies each pixel based on local appearance only, while the second stage evaluates a tool-specific shape template to enforce global shape. Both local appearance and global shape are learned from training data. Our method is validated on a new surgical tool dataset of 2 476 images from neurosurgical microscopes, which is made freely available. It improves over existing datasets in size, diversity and detail of annotation. We show that our method significantly improves over competitive baselines from the computer vision field. We achieve 15% detection miss-rate at 10(-1) false positives per image (for the suction tube) over our surgical tool dataset. Results indicate that performing semantic labelling as an intermediate task is key for high quality detection.
2011-01-01
Background The practice and research of medicine generates considerable quantities of data and model resources (DMRs). Although in principle biomedical resources are re-usable, in practice few can currently be shared. In particular, the clinical communities in physiology and pharmacology research, as well as medical education, (i.e. PPME communities) are facing considerable operational and technical obstacles in sharing data and models. Findings We outline the efforts of the PPME communities to achieve automated semantic interoperability for clinical resource documentation in collaboration with the RICORDO project. Current community practices in resource documentation and knowledge management are overviewed. Furthermore, requirements and improvements sought by the PPME communities to current documentation practices are discussed. The RICORDO plan and effort in creating a representational framework and associated open software toolkit for the automated management of PPME metadata resources is also described. Conclusions RICORDO is providing the PPME community with tools to effect, share and reason over clinical resource annotations. This work is contributing to the semantic interoperability of DMRs through ontology-based annotation by (i) supporting more effective navigation and re-use of clinical DMRs, as well as (ii) sustaining interoperability operations based on the criterion of biological similarity. Operations facilitated by RICORDO will range from automated dataset matching to model merging and managing complex simulation workflows. In effect, RICORDO is contributing to community standards for resource sharing and interoperability. PMID:21878109
Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach.
Panahiazar, Maryam; Sheth, Amit P; Ranabahu, Ajith; Vos, Rutger A; Leebens-Mack, Jim
2013-01-01
Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.
An integrative approach to inferring biologically meaningful gene modules.
Cho, Ji-Hoon; Wang, Kai; Galas, David J
2011-07-26
The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
NASA Astrophysics Data System (ADS)
Xu, Y.; Sun, Z.; Boerner, R.; Koch, T.; Hoegner, L.; Stilla, U.
2018-04-01
In this work, we report a novel way of generating ground truth dataset for analyzing point cloud from different sensors and the validation of algorithms. Instead of directly labeling large amount of 3D points requiring time consuming manual work, a multi-resolution 3D voxel grid for the testing site is generated. Then, with the help of a set of basic labeled points from the reference dataset, we can generate a 3D labeled space of the entire testing site with different resolutions. Specifically, an octree-based voxel structure is applied to voxelize the annotated reference point cloud, by which all the points are organized by 3D grids of multi-resolutions. When automatically annotating the new testing point clouds, a voting based approach is adopted to the labeled points within multiple resolution voxels, in order to assign a semantic label to the 3D space represented by the voxel. Lastly, robust line- and plane-based fast registration methods are developed for aligning point clouds obtained via various sensors. Benefiting from the labeled 3D spatial information, we can easily create new annotated 3D point clouds of different sensors of the same scene directly by considering the corresponding labels of 3D space the points located, which would be convenient for the validation and evaluation of algorithms related to point cloud interpretation and semantic segmentation.
Bettembourg, Charles; Diot, Christian; Dameron, Olivier
2015-01-01
Background The analysis of gene annotations referencing back to Gene Ontology plays an important role in the interpretation of high-throughput experiments results. This analysis typically involves semantic similarity and particularity measures that quantify the importance of the Gene Ontology annotations. However, there is currently no sound method supporting the interpretation of the similarity and particularity values in order to determine whether two genes are similar or whether one gene has some significant particular function. Interpretation is frequently based either on an implicit threshold, or an arbitrary one (typically 0.5). Here we investigate a method for determining thresholds supporting the interpretation of the results of a semantic comparison. Results We propose a method for determining the optimal similarity threshold by minimizing the proportions of false-positive and false-negative similarity matches. We compared the distributions of the similarity values of pairs of similar genes and pairs of non-similar genes. These comparisons were performed separately for all three branches of the Gene Ontology. In all situations, we found overlap between the similar and the non-similar distributions, indicating that some similar genes had a similarity value lower than the similarity value of some non-similar genes. We then extend this method to the semantic particularity measure and to a similarity measure applied to the ChEBI ontology. Thresholds were evaluated over the whole HomoloGene database. For each group of homologous genes, we computed all the similarity and particularity values between pairs of genes. Finally, we focused on the PPAR multigene family to show that the similarity and particularity patterns obtained with our thresholds were better at discriminating orthologs and paralogs than those obtained using default thresholds. Conclusion We developed a method for determining optimal semantic similarity and particularity thresholds. We applied this method on the GO and ChEBI ontologies. Qualitative analysis using the thresholds on the PPAR multigene family yielded biologically-relevant patterns. PMID:26230274
Leveraging the crowd for annotation of retinal images.
Leifman, George; Swedish, Tristan; Roesch, Karin; Raskar, Ramesh
2015-01-01
Medical data presents a number of challenges. It tends to be unstructured, noisy and protected. To train algorithms to understand medical images, doctors can label the condition associated with a particular image, but obtaining enough labels can be difficult. We propose an annotation approach which starts with a small pool of expertly annotated images and uses their expertise to rate the performance of crowd-sourced annotations. In this paper we demonstrate how to apply our approach for annotation of large-scale datasets of retinal images. We introduce a novel data validation procedure which is designed to cope with noisy ground-truth data and with non-consistent input from both experts and crowd-workers.
Application of whole slide image markup and annotation for pathologist knowledge capture.
Campbell, Walter S; Foster, Kirk W; Hinrichs, Steven H
2013-01-01
The ability to transfer image markup and annotation data from one scanned image of a slide to a newly acquired image of the same slide within a single vendor platform was investigated. The goal was to study the ability to use image markup and annotation data files as a mechanism to capture and retain pathologist knowledge without retaining the entire whole slide image (WSI) file. Accepted mathematical principles were investigated as a method to overcome variations in scans of the same glass slide and to accurately associate image markup and annotation data across different WSI of the same glass slide. Trilateration was used to link fixed points within the image and slide to the placement of markups and annotations of the image in a metadata file. Variation in markup and annotation placement between WSI of the same glass slide was reduced from over 80 μ to less than 4 μ in the x-axis and from 17 μ to 6 μ in the y-axis (P < 0.025). This methodology allows for the creation of a highly reproducible image library of histopathology images and interpretations for educational and research use.
Application of whole slide image markup and annotation for pathologist knowledge capture
Campbell, Walter S.; Foster, Kirk W.; Hinrichs, Steven H.
2013-01-01
Objective: The ability to transfer image markup and annotation data from one scanned image of a slide to a newly acquired image of the same slide within a single vendor platform was investigated. The goal was to study the ability to use image markup and annotation data files as a mechanism to capture and retain pathologist knowledge without retaining the entire whole slide image (WSI) file. Methods: Accepted mathematical principles were investigated as a method to overcome variations in scans of the same glass slide and to accurately associate image markup and annotation data across different WSI of the same glass slide. Trilateration was used to link fixed points within the image and slide to the placement of markups and annotations of the image in a metadata file. Results: Variation in markup and annotation placement between WSI of the same glass slide was reduced from over 80 μ to less than 4 μ in the x-axis and from 17 μ to 6 μ in the y-axis (P < 0.025). Conclusion: This methodology allows for the creation of a highly reproducible image library of histopathology images and interpretations for educational and research use. PMID:23599902
Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.
Tchechmedjiev, Andon; Abdaoui, Amine; Emonet, Vincent; Melzi, Soumia; Jonnagaddala, Jitendra; Jonquet, Clement
2018-06-01
Second use of clinical data commonly involves annotating biomedical text with terminologies and ontologies. The National Center for Biomedical Ontology Annotator is a frequently used annotation service, originally designed for biomedical data, but not very suitable for clinical text annotation. In order to add new functionalities to the NCBO Annotator without hosting or modifying the original Web service, we have designed a proxy architecture that enables seamless extensions by pre-processing of the input text and parameters, and post processing of the annotations. We have then implemented enhanced functionalities for annotating and indexing free text such as: scoring, detection of context (negation, experiencer, temporality), new output formats and coarse-grained concept recognition (with UMLS Semantic Groups). In this paper, we present the NCBO Annotator+, a Web service which incorporates these new functionalities as well as a small set of evaluation results for concept recognition and clinical context detection on two standard evaluation tasks (Clef eHealth 2017, SemEval 2014). The Annotator+ has been successfully integrated into the SIFR BioPortal platform-an implementation of NCBO BioPortal for French biomedical terminologies and ontologies-to annotate English text. A Web user interface is available for testing and ontology selection (http://bioportal.lirmm.fr/ncbo_annotatorplus); however the Annotator+ is meant to be used through the Web service application programming interface (http://services.bioportal.lirmm.fr/ncbo_annotatorplus). The code is openly available, and we also provide a Docker packaging to enable easy local deployment to process sensitive (e.g. clinical) data in-house (https://github.com/sifrproject). andon.tchechmedjiev@lirmm.fr. Supplementary data are available at Bioinformatics online.
Cohen, Trevor; Blatter, Brett; Patel, Vimla
2008-01-01
Cognitive studies reveal that less-than-expert clinicians are less able to recognize meaningful patterns of data in clinical narratives. Accordingly, psychiatric residents early in training fail to attend to information that is relevant to diagnosis and the assessment of dangerousness. This manuscript presents cognitively motivated methodology for the simulation of expert ability to organize relevant findings supporting intermediate diagnostic hypotheses. Latent Semantic Analysis is used to generate a semantic space from which meaningful associations between psychiatric terms are derived. Diagnostically meaningful clusters are modeled as geometric structures within this space and compared to elements of psychiatric narrative text using semantic distance measures. A learning algorithm is defined that alters components of these geometric structures in response to labeled training data. Extraction and classification of relevant text segments is evaluated against expert annotation, with system-rater agreement approximating rater-rater agreement. A range of biomedical informatics applications for these methods are suggested. PMID:18455483
Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.
2016-12-01
Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality and completeness. The inventory is accessible at http://cinergi.sdsc.edu, and the CINERGI project web page is http://earthcube.org/group/cinergi
QTLTableMiner++: semantic mining of QTL tables in scientific articles.
Singh, Gurnoor; Kuzniar, Arnold; van Mulligen, Erik M; Gavai, Anand; Bachem, Christian W; Visser, Richard G F; Finkers, Richard
2018-05-25
A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner ++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats.
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.
Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J
2014-01-01
Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
Ontological interpretation of biomedical database content.
Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan
2017-06-26
Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.
Measurement of Learning Process by Semantic Annotation Technique on Bloom's Taxonomy Vocabulary
ERIC Educational Resources Information Center
Yanchinda, Jirawit; Yodmongkol, Pitipong; Chakpitak, Nopasit
2016-01-01
A lack of science and technology knowledge understanding of most rural people who had the highest education at elementary education level more than others level is unsuccessfully transferred appropriate technology knowledge for rural sustainable development. This study provides the measurement of the learning process by on Bloom's Taxonomy…
Student Query Trend Assessment with Semantical Annotation and Artificial Intelligent Multi-Agents
ERIC Educational Resources Information Center
Malik, Kaleem Razzaq; Mir, Rizwan Riaz; Farhan, Muhammad; Rafiq, Tariq; Aslam, Muhammad
2017-01-01
Research in era of data representation to contribute and improve key data policy involving the assessment of learning, training and English language competency. Students are required to communicate in English with high level impact using language and influence. The electronic technology works to assess students' questions positively enabling…
Global Journal of Computer Science and Technology. Volume 1.2
ERIC Educational Resources Information Center
Dixit, R. K.
2009-01-01
Articles in this issue of "Global Journal of Computer Science and Technology" include: (1) Input Data Processing Techniques in Intrusion Detection Systems--Short Review (Suhair H. Amer and John A. Hamilton, Jr.); (2) Semantic Annotation of Stock Photography for CBIR Using MPEG-7 standards (R. Balasubramani and V. Kannan); (3) An Experimental Study…
Harvesting Intelligence in Multimedia Social Tagging Systems
NASA Astrophysics Data System (ADS)
Giannakidou, Eirini; Kaklidou, Foteini; Chatzilari, Elisavet; Kompatsiaris, Ioannis; Vakali, Athena
As more people adopt tagging practices, social tagging systems tend to form rich knowledge repositories that enable the extraction of patterns reflecting the way content semantics is perceived by the web users. This is of particular importance, especially in the case of multimedia content, since the availability of such content in the web is very high and its efficient retrieval using textual annotations or content-based automatically extracted metadata still remains a challenge. It is argued that complementing multimedia analysis techniques with knowledge drawn from web social annotations may facilitate multimedia content management. This chapter focuses on analyzing tagging patterns and combining them with content feature extraction methods, generating, thus, intelligence from multimedia social tagging systems. Emphasis is placed on using all available "tracks" of knowledge, that is tag co-occurrence together with semantic relations among tags and low-level features of the content. Towards this direction, a survey on the theoretical background and the adopted practices for analysis of multimedia social content are presented. A case study from Flickr illustrates the efficiency of the proposed approach.
The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease
Groza, Tudor; Köhler, Sebastian; Moldenhauer, Dawid; ...
2015-06-25
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000more » rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.« less
Imageability and semantic association in the representation and processing of event verbs.
Xu, Xu; Kang, Chunyan; Guo, Taomei
2016-05-01
This study examined the relative salience of imageability (the degree to which a word evokes mental imagery) versus semantic association (the density of semantic network in which a word is embedded) in the representation and processing of four types of event verbs: sensory, cognitive, speech, and motor verbs. ERP responses were recorded, while 34 university students performed on a lexical decision task. Analysis focused primarily on amplitude differences across verb conditions within the N400 time window where activities are considered representing meaning activation. Variation in N400 amplitude across four types of verbs was found significantly associated with the level of imageability, but not the level of semantic association. The findings suggest imageability as a more salient factor relative to semantic association in the processing of these verbs. The role of semantic association and the representation of speech verbs are also discussed.
K-Nearest Neighbors Relevance Annotation Model for Distance Education
ERIC Educational Resources Information Center
Ke, Xiao; Li, Shaozi; Cao, Donglin
2011-01-01
With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is…
An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.
Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard
2012-02-01
The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.
Guardia, Gabriela D A; Ferreira Pires, Luís; da Silva, Eduardo G; de Farias, Cléver R G
2017-02-01
Gene expression studies often require the combined use of a number of analysis tools. However, manual integration of analysis tools can be cumbersome and error prone. To support a higher level of automation in the integration process, efforts have been made in the biomedical domain towards the development of semantic web services and supporting composition environments. Yet, most environments consider only the execution of simple service behaviours and requires users to focus on technical details of the composition process. We propose a novel approach to the semantic composition of gene expression analysis services that addresses the shortcomings of the existing solutions. Our approach includes an architecture designed to support the service composition process for gene expression analysis, and a flexible strategy for the (semi) automatic composition of semantic web services. Finally, we implement a supporting platform called SemanticSCo to realize the proposed composition approach and demonstrate its functionality by successfully reproducing a microarray study documented in the literature. The SemanticSCo platform provides support for the composition of RESTful web services semantically annotated using SAWSDL. Our platform also supports the definition of constraints/conditions regarding the order in which service operations should be invoked, thus enabling the definition of complex service behaviours. Our proposed solution for semantic web service composition takes into account the requirements of different stakeholders and addresses all phases of the service composition process. It also provides support for the definition of analysis workflows at a high-level of abstraction, thus enabling users to focus on biological research issues rather than on the technical details of the composition process. The SemanticSCo source code is available at https://github.com/usplssb/SemanticSCo. Copyright © 2017 Elsevier Inc. All rights reserved.
Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F
2012-01-01
The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
Hierarchical layered and semantic-based image segmentation using ergodicity map
NASA Astrophysics Data System (ADS)
Yadegar, Jacob; Liu, Xiaoqing
2010-04-01
Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images.
Albarqouni, Shadi; Baur, Christoph; Achilles, Felix; Belagiannis, Vasileios; Demirci, Stefanie; Navab, Nassir
2016-05-01
The lack of publicly available ground-truth data has been identified as the major challenge for transferring recent developments in deep learning to the biomedical imaging domain. Though crowdsourcing has enabled annotation of large scale databases for real world images, its application for biomedical purposes requires a deeper understanding and hence, more precise definition of the actual annotation task. The fact that expert tasks are being outsourced to non-expert users may lead to noisy annotations introducing disagreement between users. Despite being a valuable resource for learning annotation models from crowdsourcing, conventional machine-learning methods may have difficulties dealing with noisy annotations during training. In this manuscript, we present a new concept for learning from crowds that handle data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet). Besides, we present an experimental study on learning from crowds designed to answer the following questions. 1) Can deep CNN be trained with data collected from crowdsourcing? 2) How to adapt the CNN to train on multiple types of annotation datasets (ground truth and crowd-based)? 3) How does the choice of annotation and aggregation affect the accuracy? Our experimental setup involved Annot8, a self-implemented web-platform based on Crowdflower API realizing image annotation tasks for a publicly available biomedical image database. Our results give valuable insights into the functionality of deep CNN learning from crowd annotations and prove the necessity of data aggregation integration.
Sahota, Michael; Leung, Betty; Dowdell, Stephanie; Velan, Gary M
2016-12-12
Students in biomedical disciplines require understanding of normal and abnormal microscopic appearances of human tissues (histology and histopathology). For this purpose, practical classes in these disciplines typically use virtual microscopy, viewing digitised whole slide images in web browsers. To enhance engagement, tools have been developed to enable individual or collaborative annotation of whole slide images within web browsers. To date, there have been no studies that have critically compared the impact on learning of individual and collaborative annotations on whole slide images. Junior and senior students engaged in Pathology practical classes within Medical Science and Medicine programs participated in cross-over trials of individual and collaborative annotation activities. Students' understanding of microscopic morphology was compared using timed online quizzes, while students' perceptions of learning were evaluated using an online questionnaire. For senior medical students, collaborative annotation of whole slide images was superior for understanding key microscopic features when compared to individual annotation; whilst being at least equivalent to individual annotation for junior medical science students. Across cohorts, students agreed that the annotation activities provided a user-friendly learning environment that met their flexible learning needs, improved efficiency, provided useful feedback, and helped them to set learning priorities. Importantly, these activities were also perceived to enhance motivation and improve understanding. Collaborative annotation improves understanding of microscopic morphology for students with sufficient background understanding of the discipline. These findings have implications for the deployment of annotation activities in biomedical curricula, and potentially for postgraduate training in Anatomical Pathology.
Solbrig, Harold R; Chute, Christopher G
2012-01-01
Objective The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups. Materials and methods The CDEs of the National Cancer Institute (NCI) Cancer Data Standards Repository, the NCI Thesaurus (NCIt) concepts and the UMLS semantic network were integrated using a semantic web-based framework for a SPARQL-enabled evaluation. First, the set of CDE-permissible values with corresponding meanings in external controlled terminologies were isolated. The corresponding value meanings were then evaluated against their NCI- or UMLS-generated semantic network mapping to determine whether all of the meanings fell within the same semantic group. Results Of the enumerated CDEs in the Cancer Data Standards Repository, 3093 (26.2%) had elements drawn from more than one UMLS semantic group. A random sample (n=100) of this set of elements indicated that 17% of them were likely to have been misclassified. Discussion The use of existing semantic web tools can support a high-throughput mechanism for evaluating the quality of large CDE collections. This study demonstrates that the involvement of multiple semantic groups in an enumerated value domain of a CDE is an effective anchor to trigger an auditing point for quality evaluation activities. Conclusion This approach produces a useful quality assurance mechanism for a clinical study CDE repository. PMID:22511016
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.
Mazandu, Gaston K; Chimusa, Emile R; Mbiyavanga, Mamana; Mulder, Nicola J
2016-02-01
Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Building a biomedical semantic network in Wikipedia with Semantic Wiki Links
Good, Benjamin M.; Clarke, Erik L.; Loguercio, Salvatore; Su, Andrew I.
2012-01-01
Wikipedia is increasingly used as a platform for collaborative data curation, but its current technical implementation has significant limitations that hinder its use in biocuration applications. Specifically, while editors can easily link between two articles in Wikipedia to indicate a relationship, there is no way to indicate the nature of that relationship in a way that is computationally accessible to the system or to external developers. For example, in addition to noting a relationship between a gene and a disease, it would be useful to differentiate the cases where genetic mutation or altered expression causes the disease. Here, we introduce a straightforward method that allows Wikipedia editors to embed computable semantic relations directly in the context of current Wikipedia articles. In addition, we demonstrate two novel applications enabled by the presence of these new relationships. The first is a dynamically generated information box that can be rendered on all semantically enhanced Wikipedia articles. The second is a prototype gene annotation system that draws its content from the gene-centric articles on Wikipedia and exposes the new semantic relationships to enable previously impossible, user-defined queries. Database URL: http://en.wikipedia.org/wiki/Portal:Gene_Wiki PMID:22434829
Building a biomedical semantic network in Wikipedia with Semantic Wiki Links.
Good, Benjamin M; Clarke, Erik L; Loguercio, Salvatore; Su, Andrew I
2012-01-01
Wikipedia is increasingly used as a platform for collaborative data curation, but its current technical implementation has significant limitations that hinder its use in biocuration applications. Specifically, while editors can easily link between two articles in Wikipedia to indicate a relationship, there is no way to indicate the nature of that relationship in a way that is computationally accessible to the system or to external developers. For example, in addition to noting a relationship between a gene and a disease, it would be useful to differentiate the cases where genetic mutation or altered expression causes the disease. Here, we introduce a straightforward method that allows Wikipedia editors to embed computable semantic relations directly in the context of current Wikipedia articles. In addition, we demonstrate two novel applications enabled by the presence of these new relationships. The first is a dynamically generated information box that can be rendered on all semantically enhanced Wikipedia articles. The second is a prototype gene annotation system that draws its content from the gene-centric articles on Wikipedia and exposes the new semantic relationships to enable previously impossible, user-defined queries. DATABASE URL: http://en.wikipedia.org/wiki/Portal:Gene_Wiki.
NASA Astrophysics Data System (ADS)
Frommholz, D.; Linkiewicz, M.; Poznanska, A. M.
2016-06-01
This paper proposes an in-line method for the simplified reconstruction of city buildings from nadir and oblique aerial images that at the same time are being used for multi-source texture mapping with minimal resampling. Further, the resulting unrectified texture atlases are analyzed for façade elements like windows to be reintegrated into the original 3D models. Tests on real-world data of Heligoland/ Germany comprising more than 800 buildings exposed a median positional deviation of 0.31 m at the façades compared to the cadastral map, a correctness of 67% for the detected windows and good visual quality when being rendered with GPU-based perspective correction. As part of the process building reconstruction takes the oriented input images and transforms them into dense point clouds by semi-global matching (SGM). The point sets undergo local RANSAC-based regression and topology analysis to detect adjacent planar surfaces and determine their semantics. Based on this information the roof, wall and ground surfaces found get intersected and limited in their extension to form a closed 3D building hull. For texture mapping the hull polygons are projected into each possible input bitmap to find suitable color sources regarding the coverage and resolution. Occlusions are detected by ray-casting a full-scale digital surface model (DSM) of the scene and stored in pixel-precise visibility maps. These maps are used to derive overlap statistics and radiometric adjustment coefficients to be applied when the visible image parts for each building polygon are being copied into a compact texture atlas without resampling whenever possible. The atlas bitmap is passed to a commercial object-based image analysis (OBIA) tool running a custom rule set to identify windows on the contained façade patches. Following multi-resolution segmentation and classification based on brightness and contrast differences potential window objects are evaluated against geometric constraints and conditionally grown, fused and filtered morphologically. The output polygons are vectorized and reintegrated into the previously reconstructed buildings by sparsely ray-tracing their vertices. Finally the enhanced 3D models get stored as textured geometry for visualization and semantically annotated "LOD-2.5" CityGML objects for GIS applications.
A Revised Semantic Differential Scale Distinguishing between Negative and Positive God Images
ERIC Educational Resources Information Center
Francis, Leslie J.; Robbins, Mandy; Gibson, Harry M.
2006-01-01
A sample of 755 school pupils between the ages of 11 and 18 years completed the Benson and Spilka semantic differential measure of God images. Factor analysis indicated the advantages of re-scoring the measure as an eight item unidimensional index, defining semantic space relating to God images ranging from negative affect to positive affect.…
Matching Alternative Addresses: a Semantic Web Approach
NASA Astrophysics Data System (ADS)
Ariannamazi, S.; Karimipour, F.; Hakimpour, F.
2015-12-01
Rapid development of crowd-sourcing or volunteered geographic information (VGI) provides opportunities for authoritatives that deal with geospatial information. Heterogeneity of multiple data sources and inconsistency of data types is a key characteristics of VGI datasets. The expansion of cities resulted in the growing number of POIs in the OpenStreetMap, a well-known VGI source, which causes the datasets to outdate in short periods of time. These changes made to spatial and aspatial attributes of features such as names and addresses might cause confusion or ambiguity in the processes that require feature's literal information like addressing and geocoding. VGI sources neither will conform specific vocabularies nor will remain in a specific schema for a long period of time. As a result, the integration of VGI sources is crucial and inevitable in order to avoid duplication and the waste of resources. Information integration can be used to match features and qualify different annotation alternatives for disambiguation. This study enhances the search capabilities of geospatial tools with applications able to understand user terminology to pursuit an efficient way for finding desired results. Semantic web is a capable tool for developing technologies that deal with lexical and numerical calculations and estimations. There are a vast amount of literal-spatial data representing the capability of linguistic information in knowledge modeling, but these resources need to be harmonized based on Semantic Web standards. The process of making addresses homogenous generates a helpful tool based on spatial data integration and lexical annotation matching and disambiguating.
Participation, Deliberate Learning and Discourses of Learning Online
ERIC Educational Resources Information Center
Barton, David
2012-01-01
This paper uses a study of the photo-sharing website Flickr to examine new online spaces for writing. On this site, people write titles and descriptions for their photos, they annotate their photos with semantic tags, they provide profiles of themselves and they comment on other people's photos. In these activities, people are engaging in new…
USDA-ARS?s Scientific Manuscript database
Such Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO) and more recently by Medical Subject Headings (MeSH). MeSH is the National Library of Medicine's controlled vocabulary and it is making ...
Semantic Annotation of Resources to Learn with Connected Things
ERIC Educational Resources Information Center
Bouchereau, Aymeric; Roxin, Ioan
2017-01-01
Computer systems tend to be ubiquitous as they become more integrated in our everyday activities, embedded in tables, shoes, watch and plenty of others connected things (CT). In the e-learning field, the transformations induced by the Internet of Things (IoT) allow individuals to learn whenever they want, accessing a quantity of diverse digital…
Roberts, Kirk; Shooshan, Sonya E; Rodriguez, Laritza; Abhyankar, Swapna; Kilicoglu, Halil; Demner-Fushman, Dina
2015-12-01
This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the system's ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics. Instead, we incorporated high-quality data to train the models by annotating additional information on top of a standard corpus. Despite the relative simplicity of the system, it achieves the highest scores (micro- and macro-F1, and micro- and macro-recall) out of the 20 participants in the 2014 i2b2/UTHealth Shared Task. This system obtains a micro- (macro-) precision of 0.8951 (0.8965), recall of 0.9625 (0.9611), and F1-measure of 0.9276 (0.9277). Additionally, we perform a series of experiments to assess the value of the annotated data we created. These experiments show how manually-labeled negative annotations can improve information extraction performance, demonstrating the importance of high-quality, fine-grained natural language annotations. Copyright © 2015 Elsevier Inc. All rights reserved.
Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model. PMID:29320579
Cao, Jianfang; Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model.
Vempati, Uma D; Chung, Caty; Mader, Chris; Koleti, Amar; Datar, Nakul; Vidović, Dušica; Wrobel, David; Erickson, Sean; Muhlich, Jeremy L; Berriz, Gabriel; Benes, Cyril H; Subramanian, Aravind; Pillai, Ajay; Shamu, Caroline E; Schürer, Stephan C
2014-06-01
The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide transcriptional, and phenotypic cellular response signatures to a variety of small-molecule and genetic perturbations with the goal of creating a sustainable, widely applicable, and readily accessible systems biology knowledge resource. Integration and analysis of diverse LINCS data sets depend on the availability of sufficient metadata to describe the assays and screening results and on their syntactic, structural, and semantic consistency. Here we report metadata specifications for the most important molecular and cellular components and recommend them for adoption beyond the LINCS project. We focus on the minimum required information to model LINCS assays and results based on a number of use cases, and we recommend controlled terminologies and ontologies to annotate assays with syntactic consistency and semantic integrity. We also report specifications for a simple annotation format (SAF) to describe assays and screening results based on our metadata specifications with explicit controlled vocabularies. SAF specifically serves to programmatically access and exchange LINCS data as a prerequisite for a distributed information management infrastructure. We applied the metadata specifications to annotate large numbers of LINCS cell lines, proteins, and small molecules. The resources generated and presented here are freely available. © 2014 Society for Laboratory Automation and Screening.
An integrative approach to inferring biologically meaningful gene modules
2011-01-01
Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051
MachineProse: an Ontological Framework for Scientific Assertions
Dinakarpandian, Deendayal; Lee, Yugyung; Vishwanath, Kartik; Lingambhotla, Rohini
2006-01-01
Objective: The idea of testing a hypothesis is central to the practice of biomedical research. However, the results of testing a hypothesis are published mainly in the form of prose articles. Encoding the results as scientific assertions that are both human and machine readable would greatly enhance the synergistic growth and dissemination of knowledge. Design: We have developed MachineProse (MP), an ontological framework for the concise specification of scientific assertions. MP is based on the idea of an assertion constituting a fundamental unit of knowledge. This is in contrast to current approaches that use discrete concept terms from domain ontologies for annotation and assertions are only inferred heuristically. Measurements: We use illustrative examples to highlight the advantages of MP over the use of the Medical Subject Headings (MeSH) system and keywords in indexing scientific articles. Results: We show how MP makes it possible to carry out semantic annotation of publications that is machine readable and allows for precise search capabilities. In addition, when used by itself, MP serves as a knowledge repository for emerging discoveries. A prototype for proof of concept has been developed that demonstrates the feasibility and novel benefits of MP. As part of the MP framework, we have created an ontology of relationship types with about 100 terms optimized for the representation of scientific assertions. Conclusion: MachineProse is a novel semantic framework that we believe may be used to summarize research findings, annotate biomedical publications, and support sophisticated searches. PMID:16357355
The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images
Mitry, Danny; Zutis, Kris; Dhillon, Baljean; Peto, Tunde; Hayat, Shabina; Khaw, Kay-Tee; Morgan, James E.; Moncur, Wendy; Trucco, Emanuele; Foster, Paul J.
2016-01-01
Purpose Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. Methods We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. Results In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%–74%) and 87% (95% CI, 86%–88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91–0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25. Conclusions This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. Translational Relevance The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis. PMID:27668130
The Accuracy and Reliability of Crowdsource Annotations of Digital Retinal Images.
Mitry, Danny; Zutis, Kris; Dhillon, Baljean; Peto, Tunde; Hayat, Shabina; Khaw, Kay-Tee; Morgan, James E; Moncur, Wendy; Trucco, Emanuele; Foster, Paul J
2016-09-01
Crowdsourcing is based on outsourcing computationally intensive tasks to numerous individuals in the online community who have no formal training. Our aim was to develop a novel online tool designed to facilitate large-scale annotation of digital retinal images, and to assess the accuracy of crowdsource grading using this tool, comparing it to expert classification. We used 100 retinal fundus photograph images with predetermined disease criteria selected by two experts from a large cohort study. The Amazon Mechanical Turk Web platform was used to drive traffic to our site so anonymous workers could perform a classification and annotation task of the fundus photographs in our dataset after a short training exercise. Three groups were assessed: masters only, nonmasters only and nonmasters with compulsory training. We calculated the sensitivity, specificity, and area under the curve (AUC) of receiver operating characteristic (ROC) plots for all classifications compared to expert grading, and used the Dice coefficient and consensus threshold to assess annotation accuracy. In total, we received 5389 annotations for 84 images (excluding 16 training images) in 2 weeks. A specificity and sensitivity of 71% (95% confidence interval [CI], 69%-74%) and 87% (95% CI, 86%-88%) was achieved for all classifications. The AUC in this study for all classifications combined was 0.93 (95% CI, 0.91-0.96). For image annotation, a maximal Dice coefficient (∼0.6) was achieved with a consensus threshold of 0.25. This study supports the hypothesis that annotation of abnormalities in retinal images by ophthalmologically naive individuals is comparable to expert annotation. The highest AUC and agreement with expert annotation was achieved in the nonmasters with compulsory training group. The use of crowdsourcing as a technique for retinal image analysis may be comparable to expert graders and has the potential to deliver timely, accurate, and cost-effective image analysis.
Rebholz-Schuhmann, Dietrich; Grabmüller, Christoph; Kavaliauskas, Silvestras; Croset, Samuel; Woollard, Peter; Backofen, Rolf; Filsell, Wendy; Clark, Dominic
2014-07-01
In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker. Copyright © 2013. Published by Elsevier Ltd.
Ontology-aided Data Fusion (Invited)
NASA Astrophysics Data System (ADS)
Raskin, R.
2009-12-01
An ontology provides semantic descriptions that are analogous to those in a dictionary, but are readable by both computers and humans. A data or service is semantically annotated when it is formally associated with elements of an ontology. The ESIP Federation Semantic Web Cluster has developed a set of ontologies to describe datatypes and data services that can be used to support automated data fusion. The service ontology includes descriptors of the service function, its inputs/outputs, and its invocation method. The datatype descriptors resemble typical metadata fields (data format, data model, data structure, originator, etc.) augmented with descriptions of the meaning of the data. These ontologies, in combination with the SWEET science ontology, enable a registered data fusion service to be chained together and implemented that is scientifically meaningful based on machine understanding of the associated data and services. This presentation describes initial results and experiences in automated data fusion.
How Chinese Semantics Capability Improves Interpretation in Visual Communication
ERIC Educational Resources Information Center
Cheng, Chu-Yu; Ou, Yang-Kun; Kin, Ching-Lung
2017-01-01
A visual representation involves delivering messages through visually communicated images. The study assumed that semantic recognition can affect visual interpretation ability, and the result showed that students graduating from a general high school achieve satisfactory results in semantic recognition and image interpretation tasks than students…
Knowledge Annotations in Scientific Workflows: An Implementation in Kepler
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gandara, Aida G.; Chin, George; Pinheiro Da Silva, Paulo
2011-07-20
Abstract. Scientic research products are the result of long-term collaborations between teams. Scientic workfows are capable of helping scientists in many ways including the collection of information as to howresearch was conducted, e.g. scientic workfow tools often collect and manage information about datasets used and data transformations. However,knowledge about why data was collected is rarely documented in scientic workflows. In this paper we describe a prototype system built to support the collection of scientic expertise that infuences scientic analysis. Through evaluating a scientic research eort underway at Pacific Northwest National Laboratory, we identied features that would most benefit PNNL scientistsmore » in documenting how and why they conduct their research making this information available to the entire team. The prototype system was built by enhancing the Kepler Scientic Work-flow System to create knowledge-annotated scientic workfows and topublish them as semantic annotations.« less
Supervised learning of tools for content-based search of image databases
NASA Astrophysics Data System (ADS)
Delanoy, Richard L.
1996-03-01
A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.
Minimizing the semantic gap in biomedical content-based image retrieval
NASA Astrophysics Data System (ADS)
Guan, Haiying; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2010-03-01
A major challenge in biomedical Content-Based Image Retrieval (CBIR) is to achieve meaningful mappings that minimize the semantic gap between the high-level biomedical semantic concepts and the low-level visual features in images. This paper presents a comprehensive learning-based scheme toward meeting this challenge and improving retrieval quality. The article presents two algorithms: a learning-based feature selection and fusion algorithm and the Ranking Support Vector Machine (Ranking SVM) algorithm. The feature selection algorithm aims to select 'good' features and fuse them using different similarity measurements to provide a better representation of the high-level concepts with the low-level image features. Ranking SVM is applied to learn the retrieval rank function and associate the selected low-level features with query concepts, given the ground-truth ranking of the training samples. The proposed scheme addresses four major issues in CBIR to improve the retrieval accuracy: image feature extraction, selection and fusion, similarity measurements, the association of the low-level features with high-level concepts, and the generation of the rank function to support high-level semantic image retrieval. It models the relationship between semantic concepts and image features, and enables retrieval at the semantic level. We apply it to the problem of vertebra shape retrieval from a digitized spine x-ray image set collected by the second National Health and Nutrition Examination Survey (NHANES II). The experimental results show an improvement of up to 41.92% in the mean average precision (MAP) over conventional image similarity computation methods.
Wang, Jian; Xie, Dong; Lin, Hongfei; Yang, Zhihao; Zhang, Yijia
2012-06-21
Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification. A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics. The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
Automatic multi-label annotation of abdominal CT images using CBIR
NASA Astrophysics Data System (ADS)
Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2017-03-01
We present a technique to annotate multiple organs shown in 2-D abdominal/pelvic CT images using CBIR. This annotation task is motivated by our research interests in visual question-answering (VQA). We aim to apply results from this effort in Open-iSM, a multimodal biomedical search engine developed by the National Library of Medicine (NLM). Understanding visual content of biomedical images is a necessary step for VQA. Though sufficient annotational information about an image may be available in related textual metadata, not all may be useful as descriptive tags, particularly for anatomy on the image. In this paper, we develop and evaluate a multi-label image annotation method using CBIR. We evaluate our method on two 2-D CT image datasets we generated from 3-D volumetric data obtained from a multi-organ segmentation challenge hosted in MICCAI 2015. Shape and spatial layout information is used to encode visual characteristics of the anatomy. We adapt a weighted voting scheme to assign multiple labels to the query image by combining the labels of the images identified as similar by the method. Key parameters that may affect the annotation performance, such as the number of images used in the label voting and the threshold for excluding labels that have low weights, are studied. The method proposes a coarse-to-fine retrieval strategy which integrates the classification with the nearest-neighbor search. Results from our evaluation (using the MICCAI CT image datasets as well as figures from Open-i) are presented.
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
Comprehension of concrete and abstract words in semantic dementia
Jefferies, Elizabeth; Patterson, Karalyn; Jones, Roy W.; Lambon Ralph, Matthew A.
2009-01-01
The vast majority of brain-injured patients with semantic impairment have better comprehension of concrete than abstract words. In contrast, several patients with semantic dementia (SD), who show circumscribed atrophy of the anterior temporal lobes bilaterally, have been reported to show reverse imageability effects, i.e., relative preservation of abstract knowledge. Although these reports largely concern individual patients, some researchers have recently proposed that superior comprehension of abstract concepts is a characteristic feature of SD. This would imply that the anterior temporal lobes are particularly crucial for processing sensory aspects of semantic knowledge, which are associated with concrete not abstract concepts. However, functional neuroimaging studies of healthy participants do not unequivocally predict reverse imageability effects in SD because the temporal poles sometimes show greater activation for more abstract concepts. We examined a case-series of eleven SD patients on a synonym judgement test that orthogonally varied the frequency and imageability of the items. All patients had higher success rates for more imageable as well as more frequent words, suggesting that (a) the anterior temporal lobes underpin semantic knowledge for both concrete and abstract concepts, (b) more imageable items – perhaps due to their richer multimodal representations – are typically more robust in the face of global semantic degradation and (c) reverse imageability effects are not a characteristic feature of SD. PMID:19586212
Garcia Castro, Leyla Jael; Berlanga, Rafael; Garcia, Alexander
2015-10-01
Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript. We have benchmarked three similarity metrics - BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles. We have found that the PubMed Related Articles similarity metric is the most suitable for full-text articles annotated with UMLS concepts. For similarity values above 0.8, all metrics exhibited an F1 around 0.2 and a recall around 0.1; BM25 showed the highest precision close to 1; in all cases the concept-based metrics performed better than the word-stem-based one. Our experiments show that similarity values vary when considering only title-and-abstract versus full-text similarity. Therefore, analyses based on full-text become useful when a given research requires going beyond title and abstract, particularly regarding connectivity across articles. Visualization available at ljgarcia.github.io/semsim.benchmark/, data available at http://dx.doi.org/10.5281/zenodo.13323. Copyright © 2015 Elsevier Inc. All rights reserved.
Semantic Annotation of Complex Text Structures in Problem Reports
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Throop, David R.; Fleming, Land D.
2011-01-01
Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.
HTML5 microdata as a semantic container for medical information exchange.
Kimura, Eizen; Kobayashi, Shinji; Ishihara, Ken
2014-01-01
Achieving interoperability between clinical electronic medical records (EMR) systems and cloud computing systems is challenging because of the lack of a universal reference method as a standard for information exchange with a secure connection. Here we describe an information exchange scheme using HTML5 microdata, where the standard semantic container is an HTML document. We embed HL7 messages describing laboratory test results in the microdata. We also annotate items in the clinical research report with the microdata. We mapped the laboratory test result data into the clinical research report using an HL7 selector specified in the microdata. This scheme can provide secure cooperation between the cloud-based service and the EMR system.
Alignment of the UMLS semantic network with BioTop: methodology and assessment.
Schulz, Stefan; Beisswanger, Elena; van den Hoek, László; Bodenreider, Olivier; van Mulligen, Erik M
2009-06-15
For many years, the Unified Medical Language System (UMLS) semantic network (SN) has been used as an upper-level semantic framework for the categorization of terms from terminological resources in biomedicine. BioTop has recently been developed as an upper-level ontology for the biomedical domain. In contrast to the SN, it is founded upon strict ontological principles, using OWL DL as a formal representation language, which has become standard in the semantic Web. In order to make logic-based reasoning available for the resources annotated or categorized with the SN, a mapping ontology was developed aligning the SN with BioTop. The theoretical foundations and the practical realization of the alignment are being described, with a focus on the design decisions taken, the problems encountered and the adaptations of BioTop that became necessary. For evaluation purposes, UMLS concept pairs obtained from MEDLINE abstracts by a named entity recognition system were tested for possible semantic relationships. Furthermore, all semantic-type combinations that occur in the UMLS Metathesaurus were checked for satisfiability. The effort-intensive alignment process required major design changes and enhancements of BioTop and brought up several design errors that could be fixed. A comparison between a human curator and the ontology yielded only a low agreement. Ontology reasoning was also used to successfully identify 133 inconsistent semantic-type combinations. BioTop, the OWL DL representation of the UMLS SN, and the mapping ontology are available at http://www.purl.org/biotop/.
Rogers, Timothy T; Hocking, Julia; Noppeney, Uta; Mechelli, Andrea; Gorno-Tempini, Maria Luisa; Patterson, Karalyn; Price, Cathy J
2006-09-01
Studies of semantic impairment arising from brain disease suggest that the anterior temporal lobes are critical for semantic abilities in humans; yet activation of these regions is rarely reported in functional imaging studies of healthy controls performing semantic tasks. Here, we combined neuropsychological and PET functional imaging data to show that when healthy subjects identify concepts at a specific level, the regions activated correspond to the site of maximal atrophy in patients with relatively pure semantic impairment. The stimuli were color photographs of common animals or vehicles, and the task was category verification at specific (e.g., robin), intermediate (e.g., bird), or general (e.g., animal) levels. Specific, relative to general, categorization activated the antero-lateral temporal cortices bilaterally, despite matching of these experimental conditions for difficulty. Critically, in patients with atrophy in precisely these areas, the most pronounced deficit was in the retrieval of specific semantic information.
Jumping across biomedical contexts using compressive data fusion
Zitnik, Marinka; Zupan, Blaz
2016-01-01
Motivation: The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects—such as a gene and a disease—can be related in different ways, for example, directly via gene–disease associations or indirectly via functional annotations, chemicals and pathways. Different ways of relating these objects carry different semantic meanings. However, traditional methods disregard these semantics and thus cannot fully exploit their value in data modeling. Results: We present Medusa, an approach to detect size-k modules of objects that, taken together, appear most significant to another set of objects. Medusa operates on large-scale collections of heterogeneous datasets and explicitly distinguishes between diverse data semantics. It advances research along two dimensions: it builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program. Medusa is flexible in choosing or combining semantic meanings and provides theoretical guarantees about detection quality. In a systematic study on 310 complex diseases, we show the effectiveness of Medusa in associating genes with diseases and detecting disease modules. We demonstrate that in predicting gene–disease associations Medusa compares favorably to methods that ignore diverse semantic meanings. We find that the utility of different semantics depends on disease categories and that, overall, Medusa recovers disease modules more accurately when combining different semantics. Availability and implementation: Source code is at http://github.com/marinkaz/medusa Contact: marinka@cs.stanford.edu, blaz.zupan@fri.uni-lj.si PMID:27307649
NASA Astrophysics Data System (ADS)
Petrie, C.; Margaria, T.; Lausen, H.; Zaremba, M.
Explores trade-offs among existing approaches. Reveals strengths and weaknesses of proposed approaches, as well as which aspects of the problem are not yet covered. Introduces software engineering approach to evaluating semantic web services. Service-Oriented Computing is one of the most promising software engineering trends because of the potential to reduce the programming effort for future distributed industrial systems. However, only a small part of this potential rests on the standardization of tools offered by the web services stack. The larger part of this potential rests upon the development of sufficient semantics to automate service orchestration. Currently there are many different approaches to semantic web service descriptions and many frameworks built around them. A common understanding, evaluation scheme, and test bed to compare and classify these frameworks in terms of their capabilities and shortcomings, is necessary to make progress in developing the full potential of Service-Oriented Computing. The Semantic Web Services Challenge is an open source initiative that provides a public evaluation and certification of multiple frameworks on common industrially-relevant problem sets. This edited volume reports on the first results in developing common understanding of the various technologies intended to facilitate the automation of mediation, choreography and discovery for Web Services using semantic annotations. Semantic Web Services Challenge: Results from the First Year is designed for a professional audience composed of practitioners and researchers in industry. Professionals can use this book to evaluate SWS technology for their potential practical use. The book is also suitable for advanced-level students in computer science.
BiOSS: A system for biomedical ontology selection.
Martínez-Romero, Marcos; Vázquez-Naya, José M; Pereira, Javier; Pazos, Alejandro
2014-04-01
In biomedical informatics, ontologies are considered a key technology for annotating, retrieving and sharing the huge volume of publicly available data. Due to the increasing amount, complexity and variety of existing biomedical ontologies, choosing the ones to be used in a semantic annotation problem or to design a specific application is a difficult task. As a consequence, the design of approaches and tools addressed to facilitate the selection of biomedical ontologies is becoming a priority. In this paper we present BiOSS, a novel system for the selection of biomedical ontologies. BiOSS evaluates the adequacy of an ontology to a given domain according to three different criteria: (1) the extent to which the ontology covers the domain; (2) the semantic richness of the ontology in the domain; (3) the popularity of the ontology in the biomedical community. BiOSS has been applied to 5 representative problems of ontology selection. It also has been compared to existing methods and tools. Results are promising and show the usefulness of BiOSS to solve real-world ontology selection problems. BiOSS is openly available both as a web tool and a web service. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Deep learning based beat event detection in action movie franchises
NASA Astrophysics Data System (ADS)
Ejaz, N.; Khan, U. A.; Martínez-del-Amor, M. A.; Sparenberg, H.
2018-04-01
Automatic understanding and interpretation of movies can be used in a variety of ways to semantically manage the massive volumes of movies data. "Action Movie Franchises" dataset is a collection of twenty Hollywood action movies from five famous franchises with ground truth annotations at shot and beat level of each movie. In this dataset, the annotations are provided for eleven semantic beat categories. In this work, we propose a deep learning based method to classify shots and beat-events on this dataset. The training dataset for each of the eleven beat categories is developed and then a Convolution Neural Network is trained. After finding the shot boundaries, key frames are extracted for each shot and then three classification labels are assigned to each key frame. The classification labels for each of the key frames in a particular shot are then used to assign a unique label to each shot. A simple sliding window based method is then used to group adjacent shots having the same label in order to find a particular beat event. The results of beat event classification are presented based on criteria of precision, recall, and F-measure. The results are compared with the existing technique and significant improvements are recorded.
Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G
2010-01-01
We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853
ChemicalTagger: A tool for semantic text-mining in chemistry.
Hawizy, Lezan; Jessop, David M; Adams, Nico; Murray-Rust, Peter
2011-05-16
The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.
Development Issues on Linked Data Weblog Enrichment
NASA Astrophysics Data System (ADS)
Ruiz-Rube, Iván; Cornejo, Carlos M.; Dodero, Juan Manuel; García, Vicente M.
In this paper, we describe the issues found during the development of LinkedBlog, a Linked Data extension for WordPress blogs. This extension enables to enrich text-based and video information contained in blog entries with RDF triples that are suitable to be stored, managed and exploited by other web-based applications. The issues have to do with the generality, usability, tracking, depth, security, trustiness and performance of the linked data enrichment process. The presented annotation approach aims at maintaining web-based contents independent from the underlying ontological model, by providing a loosely coupled RDFa-based approach in the linked data application. Finally, we detail how the performance of annotations can be improved through a semantic reasoner.
OntoMaton: a bioportal powered ontology widget for Google Spreadsheets.
Maguire, Eamonn; González-Beltrán, Alejandra; Whetzel, Patricia L; Sansone, Susanna-Assunta; Rocca-Serra, Philippe
2013-02-15
Data collection in spreadsheets is ubiquitous, but current solutions lack support for collaborative semantic annotation that would promote shared and interdisciplinary annotation practices, supporting geographically distributed players. OntoMaton is an open source solution that brings ontology lookup and tagging capabilities into a cloud-based collaborative editing environment, harnessing Google Spreadsheets and the NCBO Web services. It is a general purpose, format-agnostic tool that may serve as a component of the ISA software suite. OntoMaton can also be used to assist the ontology development process. OntoMaton is freely available from Google widgets under the CPAL open source license; documentation and examples at: https://github.com/ISA-tools/OntoMaton.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
Use of Co-occurrences for Temporal Expressions Annotation
NASA Astrophysics Data System (ADS)
Craveiro, Olga; Macedo, Joaquim; Madeira, Henrique
The annotation or extraction of temporal information from text documents is becoming increasingly important in many natural language processing applications such as text summarization, information retrieval, question answering, etc.. This paper presents an original method for easy recognition of temporal expressions in text documents. The method creates semantically classified temporal patterns, using word co-occurrences obtained from training corpora and a pre-defined seed keywords set, derived from the used language temporal references. A participation on a Portuguese named entity evaluation contest showed promising effectiveness and efficiency results. This approach can be adapted to recognize other type of expressions or languages, within other contexts, by defining the suitable word sets and training corpora.
SADI, SHARE, and the in silico scientific method
2010-01-01
Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies
Köhler, Sebastian; Schulz, Marcel H.; Krawitz, Peter; Bauer, Sebastian; Dölken, Sandra; Ott, Claus E.; Mundlos, Christine; Horn, Denise; Mundlos, Stefan; Robinson, Peter N.
2009-01-01
The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign p values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders. PMID:19800049
Medical Image Analysis by Cognitive Information Systems - a Review.
Ogiela, Lidia; Takizawa, Makoto
2016-10-01
This publication presents a review of medical image analysis systems. The paradigms of cognitive information systems will be presented by examples of medical image analysis systems. The semantic processes present as it is applied to different types of medical images. Cognitive information systems were defined on the basis of methods for the semantic analysis and interpretation of information - medical images - applied to cognitive meaning of medical images contained in analyzed data sets. Semantic analysis was proposed to analyzed the meaning of data. Meaning is included in information, for example in medical images. Medical image analysis will be presented and discussed as they are applied to various types of medical images, presented selected human organs, with different pathologies. Those images were analyzed using different classes of cognitive information systems. Cognitive information systems dedicated to medical image analysis was also defined for the decision supporting tasks. This process is very important for example in diagnostic and therapy processes, in the selection of semantic aspects/features, from analyzed data sets. Those features allow to create a new way of analysis.
NASA Astrophysics Data System (ADS)
Wang, Z.; Li, T.; Pan, L.; Kang, Z.
2017-09-01
With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image classification. We use Multiple Kernel Maximum Mean Discrepancy (MK-MMD) as a distance measure to find common and special features of RGB and D images in the network to enhance performance of classification automatically. To explore better methods of applying MMD, we designed two strategies; the first calculates MMD for each feature map, and the other calculates MMD for whole batch features. Based on the result of classification, we use the full connect CRFs for the semantic segmentation. The experimental results show that our method can achieve a good performance on indoor RGB-D image semantic segmentation.
The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation model.
Mongkolwat, Pattanasak; Kleper, Vladimir; Talbot, Skip; Rubin, Daniel
2014-12-01
Knowledge contained within in vivo imaging annotated by human experts or computer programs is typically stored as unstructured text and separated from other associated information. The National Cancer Informatics Program (NCIP) Annotation and Image Markup (AIM) Foundation information model is an evolution of the National Institute of Health's (NIH) National Cancer Institute's (NCI) Cancer Bioinformatics Grid (caBIG®) AIM model. The model applies to various image types created by various techniques and disciplines. It has evolved in response to the feedback and changing demands from the imaging community at NCI. The foundation model serves as a base for other imaging disciplines that want to extend the type of information the model collects. The model captures physical entities and their characteristics, imaging observation entities and their characteristics, markups (two- and three-dimensional), AIM statements, calculations, image source, inferences, annotation role, task context or workflow, audit trail, AIM creator details, equipment used to create AIM instances, subject demographics, and adjudication observations. An AIM instance can be stored as a Digital Imaging and Communications in Medicine (DICOM) structured reporting (SR) object or Extensible Markup Language (XML) document for further processing and analysis. An AIM instance consists of one or more annotations and associated markups of a single finding along with other ancillary information in the AIM model. An annotation describes information about the meaning of pixel data in an image. A markup is a graphical drawing placed on the image that depicts a region of interest. This paper describes fundamental AIM concepts and how to use and extend AIM for various imaging disciplines.
Document image retrieval through word shape coding.
Lu, Shijian; Li, Linlin; Tan, Chew Lim
2008-11-01
This paper presents a document retrieval technique that is capable of searching document images without OCR (optical character recognition). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Developing national on-line services to annotate and analyse underwater imagery in a research cloud
NASA Astrophysics Data System (ADS)
Proctor, R.; Langlois, T.; Friedman, A.; Davey, B.
2017-12-01
Fish image annotation data is currently collected by various research, management and academic institutions globally (+100,000's hours of deployments) with varying degrees of standardisation and limited formal collaboration or data synthesis. We present a case study of how national on-line services, developed within a domain-oriented research cloud, have been used to annotate habitat images and synthesise fish annotation data sets collected using Autonomous Underwater Vehicles (AUVs) and baited remote underwater stereo-video (stereo-BRUV). Two developing software tools have been brought together in the marine science cloud to provide marine biologists with a powerful service for image annotation. SQUIDLE+ is an online platform designed for exploration, management and annotation of georeferenced images & video data. It provides a flexible annotation framework allowing users to work with their preferred annotation schemes. We have used SQUIDLE+ to sample the habitat composition and complexity of images of the benthos collected using stereo-BRUV. GlobalArchive is designed to be a centralised repository of aquatic ecological survey data with design principles including ease of use, secure user access, flexible data import, and the collection of any sampling and image analysis information. To easily share and synthesise data we have implemented data sharing protocols, including Open Data and synthesis Collaborations, and a spatial map to explore global datasets and filter to create a synthesis. These tools in the science cloud, together with a virtual desktop analysis suite offering python and R environments offer an unprecedented capability to deliver marine biodiversity information of value to marine managers and scientists alike.
NASA Astrophysics Data System (ADS)
Chmiel, P.; Ganzha, M.; Jaworska, T.; Paprzycki, M.
2017-10-01
Nowadays, as a part of systematic growth of volume, and variety, of information that can be found on the Internet, we observe also dramatic increase in sizes of available image collections. There are many ways to help users browsing / selecting images of interest. One of popular approaches are Content-Based Image Retrieval (CBIR) systems, which allow users to search for images that match their interests, expressed in the form of images (query by example). However, we believe that image search and retrieval could take advantage of semantic technologies. We have decided to test this hypothesis. Specifically, on the basis of knowledge captured in the CBIR, we have developed a domain ontology of residential real estate (detached houses, in particular). This allows us to semantically represent each image (and its constitutive architectural elements) represented within the CBIR. The proposed ontology was extended to capture not only the elements resulting from image segmentation, but also "spatial relations" between them. As a result, a new approach to querying the image database (semantic querying) has materialized, thus extending capabilities of the developed system.
Wu, Zhenyu; Xu, Yuan; Yang, Yunong; Zhang, Chunhong; Zhu, Xinning; Ji, Yang
2017-02-20
Web of Things (WoT) facilitates the discovery and interoperability of Internet of Things (IoT) devices in a cyber-physical system (CPS). Moreover, a uniform knowledge representation of physical resources is quite necessary for further composition, collaboration, and decision-making process in CPS. Though several efforts have integrated semantics with WoT, such as knowledge engineering methods based on semantic sensor networks (SSN), it still could not represent the complex relationships between devices when dynamic composition and collaboration occur, and it totally depends on manual construction of a knowledge base with low scalability. In this paper, to addresses these limitations, we propose the semantic Web of Things (SWoT) framework for CPS (SWoT4CPS). SWoT4CPS provides a hybrid solution with both ontological engineering methods by extending SSN and machine learning methods based on an entity linking (EL) model. To testify to the feasibility and performance, we demonstrate the framework by implementing a temperature anomaly diagnosis and automatic control use case in a building automation system. Evaluation results on the EL method show that linking domain knowledge to DBpedia has a relative high accuracy and the time complexity is at a tolerant level. Advantages and disadvantages of SWoT4CPS with future work are also discussed.
Image Annotation and Topic Extraction Using Super-Word Latent Dirichlet Allocation
2013-09-01
an image can be used to improve automated image annotation performance over existing generalized annotators. Second, image anno - 3 tations can be used...the other variables. The first ratio in the sampling Equation 2.18 uses word frequency by total words, φ̂ (w) j . The second ratio divides word...topics by total words in that document θ̂ (d) j . Both leave out the current assignment of zi and the results are used to randomly choose a new topic
Image annotation based on positive-negative instances learning
NASA Astrophysics Data System (ADS)
Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping
2017-07-01
Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.
Image Retrieval by Color Semantics with Incomplete Knowledge.
ERIC Educational Resources Information Center
Corridoni, Jacopo M.; Del Bimbo, Alberto; Vicario, Enrico
1998-01-01
Presents a system which supports image retrieval by high-level chromatic contents, the sensations that color accordances generate on the observer. Surveys Itten's theory of color semantics and discusses image description and query specification. Presents examples of visual querying. (AEF)
Semantic Information Extraction of Lanes Based on Onboard Camera Videos
NASA Astrophysics Data System (ADS)
Tang, L.; Deng, T.; Ren, C.
2018-04-01
In the field of autonomous driving, semantic information of lanes is very important. This paper proposes a method of automatic detection of lanes and extraction of semantic information from onboard camera videos. The proposed method firstly detects the edges of lanes by the grayscale gradient direction, and improves the Probabilistic Hough transform to fit them; then, it uses the vanishing point principle to calculate the lane geometrical position, and uses lane characteristics to extract lane semantic information by the classification of decision trees. In the experiment, 216 road video images captured by a camera mounted onboard a moving vehicle were used to detect lanes and extract lane semantic information. The results show that the proposed method can accurately identify lane semantics from video images.
Rutllant, Josep
2016-01-01
Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value. PMID:27200191
Irizarry, Kristopher J L; Rutllant, Josep
2016-01-01
Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value.
ERIC Educational Resources Information Center
Garcia-Barriocanal, Elena; Sicilia, Miguel-Angel; Sanchez-Alonso, Salvador; Lytras, Miltiadis
2011-01-01
Web 2.0 technologies can be considered a loosely defined set of Web application styles that foster a kind of media consumer more engaged, and usually active in creating and maintaining Internet contents. Thus, Web 2.0 applications have resulted in increased user participation and massive user-generated (or user-published) open multimedia content,…
Active Wiki Knowledge Repository
2012-10-01
data using SPARQL queries or RESTful web-services; ‘gardening’ tools for examining the semantically tagged content in the wiki; high-level language tool...Tagging & RDF triple-store Fusion and inferences for collaboration Tools for Consuming Data SPARQL queries or RESTful WS Inference & Gardening tools...other stores using AW SPARQL queries and rendering templates; and 4) Interactively share maps and other content using annotation tools to post notes
Semantics and technologies in modern design of interior stairs
NASA Astrophysics Data System (ADS)
Kukhta, M.; Sokolov, A.; Pelevin, E.
2015-10-01
Use of metal in the design of interior stairs presents new features for shaping, and can be implemented using different technologies. The article discusses the features of design and production technologies of forged metal spiral staircase considering the image semantics based on the historical and cultural heritage. To achieve the objective was applied structural- semantic method (to identify the organization of structure and semantic features of the artistic image), engineering methods (to justify the construction of the object), anthropometry method and ergonomics (to provide usability), methods of comparative analysis (to reveale the features of the way the ladder in different periods of culture). According to the research results are as follows. Was revealed the semantics influence on the design of interior staircase that is based on the World Tree image. Also was suggested rational calculation of steps to ensure the required strength. And finally was presented technology, providing the realization of the artistic image. In the practical part of the work is presented version of forged staircase.
Federated ontology-based queries over cancer data
2012-01-01
Background Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. Results Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. Conclusions To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. PMID:22373043
Concept annotation in the CRAFT corpus.
Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E
2012-07-09
Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Concept annotation in the CRAFT corpus
2012-01-01
Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. PMID:22776079
Crowdtruth validation: a new paradigm for validating algorithms that rely on image correspondences.
Maier-Hein, Lena; Kondermann, Daniel; Roß, Tobias; Mersmann, Sven; Heim, Eric; Bodenstedt, Sebastian; Kenngott, Hannes Götz; Sanchez, Alexandro; Wagner, Martin; Preukschas, Anas; Wekerle, Anna-Laura; Helfert, Stefanie; März, Keno; Mehrabi, Arianeb; Speidel, Stefanie; Stock, Christian
2015-08-01
Feature tracking and 3D surface reconstruction are key enabling techniques to computer-assisted minimally invasive surgery. One of the major bottlenecks related to training and validation of new algorithms is the lack of large amounts of annotated images that fully capture the wide range of anatomical/scene variance in clinical practice. To address this issue, we propose a novel approach to obtaining large numbers of high-quality reference image annotations at low cost in an extremely short period of time. The concept is based on outsourcing the correspondence search to a crowd of anonymous users from an online community (crowdsourcing) and comprises four stages: (1) feature detection, (2) correspondence search via crowdsourcing, (3) merging multiple annotations per feature by fitting Gaussian finite mixture models, (4) outlier removal using the result of the clustering as input for a second annotation task. On average, 10,000 annotations were obtained within 24 h at a cost of $100. The annotation of the crowd after clustering and before outlier removal was of expert quality with a median distance of about 1 pixel to a publically available reference annotation. The threshold for the outlier removal task directly determines the maximum annotation error, but also the number of points removed. Our concept is a novel and effective method for fast, low-cost and highly accurate correspondence generation that could be adapted to various other applications related to large-scale data annotation in medical image computing and computer-assisted interventions.
Visual analytics for semantic queries of TerraSAR-X image content
NASA Astrophysics Data System (ADS)
Espinoza-Molina, Daniela; Alonso, Kevin; Datcu, Mihai
2015-10-01
With the continuous image product acquisition of satellite missions, the size of the image archives is considerably increasing every day as well as the variety and complexity of their content, surpassing the end-user capacity to analyse and exploit them. Advances in the image retrieval field have contributed to the development of tools for interactive exploration and extraction of the images from huge archives using different parameters like metadata, key-words, and basic image descriptors. Even though we count on more powerful tools for automated image retrieval and data analysis, we still face the problem of understanding and analyzing the results. Thus, a systematic computational analysis of these results is required in order to provide to the end-user a summary of the archive content in comprehensible terms. In this context, visual analytics combines automated analysis with interactive visualizations analysis techniques for an effective understanding, reasoning and decision making on the basis of very large and complex datasets. Moreover, currently several researches are focused on associating the content of the images with semantic definitions for describing the data in a format to be easily understood by the end-user. In this paper, we present our approach for computing visual analytics and semantically querying the TerraSAR-X archive. Our approach is mainly composed of four steps: 1) the generation of a data model that explains the information contained in a TerraSAR-X product. The model is formed by primitive descriptors and metadata entries, 2) the storage of this model in a database system, 3) the semantic definition of the image content based on machine learning algorithms and relevance feedback, and 4) querying the image archive using semantic descriptors as query parameters and computing the statistical analysis of the query results. The experimental results shows that with the help of visual analytics and semantic definitions we are able to explain the image content using semantic terms and the relations between them answering questions such as what is the percentage of urban area in a region? or what is the distribution of water bodies in a city?
Semi-automated ontology generation and evolution
NASA Astrophysics Data System (ADS)
Stirtzinger, Anthony P.; Anken, Craig S.
2009-05-01
Extending the notion of data models or object models, ontology can provide rich semantic definition not only to the meta-data but also to the instance data of domain knowledge, making these semantic definitions available in machine readable form. However, the generation of an effective ontology is a difficult task involving considerable labor and skill. This paper discusses an Ontology Generation and Evolution Processor (OGEP) aimed at automating this process, only requesting user input when un-resolvable ambiguous situations occur. OGEP directly attacks the main barrier which prevents automated (or self learning) ontology generation: the ability to understand the meaning of artifacts and the relationships the artifacts have to the domain space. OGEP leverages existing lexical to ontological mappings in the form of WordNet, and Suggested Upper Merged Ontology (SUMO) integrated with a semantic pattern-based structure referred to as the Semantic Grounding Mechanism (SGM) and implemented as a Corpus Reasoner. The OGEP processing is initiated by a Corpus Parser performing a lexical analysis of the corpus, reading in a document (or corpus) and preparing it for processing by annotating words and phrases. After the Corpus Parser is done, the Corpus Reasoner uses the parts of speech output to determine the semantic meaning of a word or phrase. The Corpus Reasoner is the crux of the OGEP system, analyzing, extrapolating, and evolving data from free text into cohesive semantic relationships. The Semantic Grounding Mechanism provides a basis for identifying and mapping semantic relationships. By blending together the WordNet lexicon and SUMO ontological layout, the SGM is given breadth and depth in its ability to extrapolate semantic relationships between domain entities. The combination of all these components results in an innovative approach to user assisted semantic-based ontology generation. This paper will describe the OGEP technology in the context of the architectural components referenced above and identify a potential technology transition path to Scott AFB's Tanker Airlift Control Center (TACC) which serves as the Air Operations Center (AOC) for the Air Mobility Command (AMC).
Towards a semantic PACS: Using Semantic Web technology to represent imaging data.
Van Soest, Johan; Lustberg, Tim; Grittner, Detlef; Marshall, M Scott; Persoon, Lucas; Nijsten, Bas; Feltens, Peter; Dekker, Andre
2014-01-01
The DICOM standard is ubiquitous within medicine. However, improved DICOM semantics would significantly enhance search operations. Furthermore, databases of current PACS systems are not flexible enough for the demands within image analysis research. In this paper, we investigated if we can use Semantic Web technology, to store and represent metadata of DICOM image files, as well as linking additional computational results to image metadata. Therefore, we developed a proof of concept containing two applications: one to store commonly used DICOM metadata in an RDF repository, and one to calculate imaging biomarkers based on DICOM images, and store the biomarker values in an RDF repository. This enabled us to search for all patients with a gross tumor volume calculated to be larger than 50 cc. We have shown that we can successfully store the DICOM metadata in an RDF repository and are refining our proof of concept with regards to volume naming, value representation, and the applications themselves.
Automated geospatial Web Services composition based on geodata quality requirements
NASA Astrophysics Data System (ADS)
Cruz, Sérgio A. B.; Monteiro, Antonio M. V.; Santos, Rafael
2012-10-01
Service-Oriented Architecture and Web Services technologies improve the performance of activities involved in geospatial analysis with a distributed computing architecture. However, the design of the geospatial analysis process on this platform, by combining component Web Services, presents some open issues. The automated construction of these compositions represents an important research topic. Some approaches to solving this problem are based on AI planning methods coupled with semantic service descriptions. This work presents a new approach using AI planning methods to improve the robustness of the produced geospatial Web Services composition. For this purpose, we use semantic descriptions of geospatial data quality requirements in a rule-based form. These rules allow the semantic annotation of geospatial data and, coupled with the conditional planning method, this approach represents more precisely the situations of nonconformities with geodata quality that may occur during the execution of the Web Service composition. The service compositions produced by this method are more robust, thus improving process reliability when working with a composition of chained geospatial Web Services.
Semantic retrieval and navigation in clinical document collections.
Kreuzthaler, Markus; Daumke, Philipp; Schulz, Stefan
2015-01-01
Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool of 1,696 de-identified discharge summaries was used for prototyping. A natural language processing toolset for document annotation (based on the text-mining framework UIMA) and indexing (Solr) was used to support a browser-based platform for document import, search and navigation. The integrated search engine combines free text and concept-based querying, supported by dynamically generated facets (diagnoses, procedures, medications, lab values, and body parts). The prototype demonstrates the feasibility of semantic document enrichment within document collections of a single patient. Originally conceived as an add-on for the clinical workplace, this technology could also be adapted to support personalised health record platforms, as well as cross-patient search for cohort building and other secondary use scenarios.
Web information retrieval based on ontology
NASA Astrophysics Data System (ADS)
Zhang, Jian
2013-03-01
The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
NASA Astrophysics Data System (ADS)
Lin, Po-Chuan; Chen, Bo-Wei; Chang, Hangbae
2016-07-01
This study presents a human-centric technique for social video expansion based on semantic processing and graph analysis. The objective is to increase metadata of an online video and to explore related information, thereby facilitating user browsing activities. To analyze the semantic meaning of a video, shots and scenes are firstly extracted from the video on the server side. Subsequently, this study uses annotations along with ConceptNet to establish the underlying framework. Detailed metadata, including visual objects and audio events among the predefined categories, are indexed by using the proposed method. Furthermore, relevant online media associated with each category are also analyzed to enrich the existing content. With the above-mentioned information, users can easily browse and search the content according to the link analysis and its complementary knowledge. Experiments on a video dataset are conducted for evaluation. The results show that our system can achieve satisfactory performance, thereby demonstrating the feasibility of the proposed idea.
Dutta, Pritha; Basu, Subhadip; Kundu, Mahantapas
2017-03-31
The semantic similarity between two interacting proteins can be estimated by combining the similarity scores of the GO terms associated with the proteins. Greater number of similar GO annotations between two proteins indicates greater interaction affinity. Existing semantic similarity measures make use of the GO graph structure, the information content of GO terms, or a combination of both. In this paper, we present a hybrid approach which utilizes both the topological features of the GO graph and information contents of the GO terms. More specifically, we 1) consider a fuzzy clustering of the GO graph based on the level of association of the GO terms, 2) estimate the GO term memberships to each cluster center based on the respective shortest path lengths, and 3) assign weightage to GO term pairs on the basis of their dissimilarity with respect to the cluster centers. We test the performance of our semantic similarity measure against seven other previously published similarity measures using benchmark protein-protein interaction datasets of Homo sapiens and Saccharomyces cerevisiae based on sequence similarity, Pfam similarity, area under ROC curve and F1 measure.
Marsch, Amanda F; Espiritu, Baltazar; Groth, John; Hutchens, Kelli A
2014-06-01
With today's technology, paraffin-embedded, hematoxylin & eosin-stained pathology slides can be scanned to generate high quality virtual slides. Using proprietary software, digital images can also be annotated with arrows, circles and boxes to highlight certain diagnostic features. Previous studies assessing digital microscopy as a teaching tool did not involve the annotation of digital images. The objective of this study was to compare the effectiveness of annotated digital pathology slides versus non-annotated digital pathology slides as a teaching tool during dermatology and pathology residencies. A study group composed of 31 dermatology and pathology residents was asked to complete an online pre-quiz consisting of 20 multiple choice style questions, each associated with a static digital pathology image. After completion, participants were given access to an online tutorial composed of digitally annotated pathology slides and subsequently asked to complete a post-quiz. A control group of 12 residents completed a non-annotated version of the tutorial. Nearly all participants in the study group improved their quiz score, with an average improvement of 17%, versus only 3% (P = 0.005) in the control group. These results support the notion that annotated digital pathology slides are superior to non-annotated slides for the purpose of resident education. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The KIT Motion-Language Dataset.
Plappert, Matthias; Mandery, Christian; Asfour, Tamim
2016-12-01
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations
Miyao, Yusuke; Collier, Nigel
2017-01-01
Background Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. Objective This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. Methods We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. Results The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. Conclusions We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. PMID:28468748
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.
Alvaro, Nestor; Miyao, Yusuke; Collier, Nigel
2017-05-03
Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. ©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017.
CHEMICAL STRUCTURE INDEXING OF TOXICITY DATA ON ...
Standardized chemical structure annotation of public toxicity databases and information resources is playing an increasingly important role in the 'flattening' and integration of diverse sets of biological activity data on the Internet. This review discusses public initiatives that are accelerating the pace of this transformation, with particular reference to toxicology-related chemical information. Chemical content annotators, structure locator services, large structure/data aggregator web sites, structure browsers, International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) codes, toxicity data models and public chemical/biological activity profiling initiatives are all playing a role in overcoming barriers to the integration of toxicity data, and are bringing researchers closer to the reality of a mineable chemical Semantic Web. An example of this integration of data is provided by the collaboration among researchers involved with the Distributed Structure-Searchable Toxicity (DSSTox) project, the Carcinogenic Potency Project, projects at the National Cancer Institute and the PubChem database. Standardizing chemical structure annotation of public toxicity databases
NASA Astrophysics Data System (ADS)
van Noort, Thomas; Achten, Peter; Plasmeijer, Rinus
We present a typical synergy between dynamic types (dynamics) and generalised algebraic datatypes (GADTs). The former provides a clean approach to integrating dynamic typing in a statically typed language. It allows values to be wrapped together with their type in a uniform package, deferring type unification until run time using a pattern match annotated with the desired type. The latter allows for the explicit specification of constructor types, as to enforce their structural validity. In contrast to ADTs, GADTs are heterogeneous structures since each constructor type is implicitly universally quantified. Unfortunately, pattern matching only enforces structural validity and does not provide instantiation information on polymorphic types. Consequently, functions that manipulate such values, such as a type-safe update function, are cumbersome due to boilerplate type representation administration. In this paper we focus on improving such functions by providing a new GADT annotation via a natural synergy with dynamics. We formally define the semantics of the annotation and touch on novel other applications of this technique such as type dispatching and enforcing type equality invariants on GADT values.
Latent Semantic Analysis as a Method of Content-Based Image Retrieval in Medical Applications
ERIC Educational Resources Information Center
Makovoz, Gennadiy
2010-01-01
The research investigated whether a Latent Semantic Analysis (LSA)-based approach to image retrieval can map pixel intensity into a smaller concept space with good accuracy and reasonable computational cost. From a large set of M computed tomography (CT) images, a retrieval query found all images for a particular patient based on semantic…
Visual Pattern Analysis in Histopathology Images Using Bag of Features
NASA Astrophysics Data System (ADS)
Cruz-Roa, Angel; Caicedo, Juan C.; González, Fabio A.
This paper presents a framework to analyse visual patterns in a collection of medical images in a two stage procedure. First, a set of representative visual patterns from the image collection is obtained by constructing a visual-word dictionary under a bag-of-features approach. Second, an analysis of the relationships between visual patterns and semantic concepts in the image collection is performed. The most important visual patterns for each semantic concept are identified using correlation analysis. A matrix visualization of the structure and organization of the image collection is generated using a cluster analysis. The experimental evaluation was conducted on a histopathology image collection and results showed clear relationships between visual patterns and semantic concepts, that in addition, are of easy interpretation and understanding.
Peters, Frédéric; Majerus, Steve; De Baerdemaeker, Julie; Salmon, Eric; Collette, Fabienne
2009-12-01
A decrease in verbal short-term memory (STM) capacity is consistently observed in patients with Alzheimer's disease (AD). Although this impairment has been mainly attributed to attentional deficits during encoding and maintenance, the progressive deterioration of semantic knowledge in early stages of AD may also be an important determinant of poor STM performance. The aim of this study was to examine the influence of semantic knowledge on verbal short-term memory storage capacity in normal aging and in AD by exploring the impact of word imageability on STM performance. Sixteen patients suffering from mild AD, 16 healthy elderly subjects and 16 young subjects performed an immediate serial recall task using word lists containing high or low imageability words. All participant groups recalled more high imageability words than low imageability words, but the effect of word imageability on verbal STM was greater in AD patients than in both the young and the elderly control groups. More precisely, AD patients showed a marked decrease in STM performance when presented with lists of low imageability words, whereas recall of high imageability words was relatively well preserved. Furthermore, AD patients displayed an abnormal proportion of phonological errors in the low imageability condition. Overall, these results indicate that the support of semantic knowledge on STM performance was impaired for lists of low imageability words in AD patients. More generally, these findings suggest that the deterioration of semantic knowledge is partly responsible for the poor verbal short-term storage capacity observed in AD.
Annotation of UAV surveillance video
NASA Astrophysics Data System (ADS)
Howlett, Todd; Robertson, Mark A.; Manthey, Dan; Krol, John
2004-08-01
Significant progress toward the development of a video annotation capability is presented in this paper. Research and development of an object tracking algorithm applicable for UAV video is described. Object tracking is necessary for attaching the annotations to the objects of interest. A methodology and format is defined for encoding video annotations using the SMPTE Key-Length-Value encoding standard. This provides the following benefits: a non-destructive annotation, compliance with existing standards, video playback in systems that are not annotation enabled and support for a real-time implementation. A model real-time video annotation system is also presented, at a high level, using the MPEG-2 Transport Stream as the transmission medium. This work was accomplished to meet the Department of Defense"s (DoD"s) need for a video annotation capability. Current practices for creating annotated products are to capture a still image frame, annotate it using an Electric Light Table application, and then pass the annotated image on as a product. That is not adequate for reporting or downstream cueing. It is too slow and there is a severe loss of information. This paper describes a capability for annotating directly on the video.
Cooper, Laurel; Meier, Austin; Laporte, Marie-Angélique; Elser, Justin L; Mungall, Chris; Sinn, Brandon T; Cavaliere, Dario; Carbon, Seth; Dunn, Nathan A; Smith, Barry; Qu, Botong; Preece, Justin; Zhang, Eugene; Todorovic, Sinisa; Gkoutos, Georgios; Doonan, John H; Stevenson, Dennis W; Arnaud, Elizabeth
2018-01-01
Abstract The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository. PMID:29186578
Semantic similarity analysis of protein data: assessment with biological features and issues.
Guzzi, Pietro H; Mina, Marco; Guerra, Concettina; Cannataro, Mario
2012-09-01
The integration of proteomics data with biological knowledge is a recent trend in bioinformatics. A lot of biological information is available and is spread on different sources and encoded in different ontologies (e.g. Gene Ontology). Annotating existing protein data with biological information may enable the use (and the development) of algorithms that use biological ontologies as framework to mine annotated data. Recently many methodologies and algorithms that use ontologies to extract knowledge from data, as well as to analyse ontologies themselves have been proposed and applied to other fields. Conversely, the use of such annotations for the analysis of protein data is a relatively novel research area that is currently becoming more and more central in research. Existing approaches span from the definition of the similarity among genes and proteins on the basis of the annotating terms, to the definition of novel algorithms that use such similarities for mining protein data on a proteome-wide scale. This work, after the definition of main concept of such analysis, presents a systematic discussion and comparison of main approaches. Finally, remaining challenges, as well as possible future directions of research are presented.
@Note: a workbench for biomedical text mining.
Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel
2009-08-01
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.
Semantic annotation of Web data applied to risk in food.
Hignette, Gaëlle; Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Doussot, David; Haemmerlé, Ollivier; Mettler, Eric; Soler, Lydie
2008-11-30
A preliminary step to risk in food assessment is the gathering of experimental data. In the framework of the Sym'Previus project (http://www.symprevius.org), a complete data integration system has been designed, grouping data provided by industrial partners and data extracted from papers published in the main scientific journals of the domain. Those data have been classified by means of a predefined vocabulary, called ontology. Our aim is to complement the database with data extracted from the Web. In the framework of the WebContent project (www.webcontent.fr), we have designed a semi-automatic acquisition tool, called @WEB, which retrieves scientific documents from the Web. During the @WEB process, data tables are extracted from the documents and then annotated with the ontology. We focus on the data tables as they contain, in general, a synthesis of data published in the documents. In this paper, we explain how the columns of the data tables are automatically annotated with data types of the ontology and how the relations represented by the table are recognised. We also give the results of our experimentation to assess the quality of such an annotation.
ChemicalTagger: A tool for semantic text-mining in chemistry
2011-01-01
Background The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Results We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). Conclusions It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision. PMID:21575201
Case Studies in Describing Scientific Research Efforts as Linked Data
NASA Astrophysics Data System (ADS)
Gandara, A.; Villanueva-Rosales, N.; Gates, A.
2013-12-01
The Web is growing with numerous scientific resources, prompting increased efforts in information management to consider integration and exchange of scientific resources. Scientists have many options to share scientific resources on the Web; however, existing options provide limited support to scientists in annotating and relating research resources resulting from a scientific research effort. Moreover, there is no systematic approach to documenting scientific research and sharing it on the Web. This research proposes the Collect-Annotate-Refine-Publish (CARP) Methodology as an approach for guiding documentation of scientific research on the Semantic Web as scientific collections. Scientific collections are structured descriptions about scientific research that make scientific results accessible based on context. In addition, scientific collections enhance the Linked Data data space and can be queried by machines. Three case studies were conducted on research efforts at the Cyber-ShARE Research Center of Excellence in order to assess the effectiveness of the methodology to create scientific collections. The case studies exposed the challenges and benefits of leveraging the Semantic Web and Linked Data data space to facilitate access, integration and processing of Web-accessible scientific resources and research documentation. As such, we present the case study findings and lessons learned in documenting scientific research using CARP.
Enhancing clinical concept extraction with distributional semantics
Cohen, Trevor; Wu, Stephen; Gonzalez, Graciela
2011-01-01
Extracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text. The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task. The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged f-measure for exact match increased from 80.3% to 82.3% and the micro-averaged f-measure based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data. PMID:22085698
Synergetic computer and holonics - information dynamics of a semantic computer
NASA Astrophysics Data System (ADS)
Shimizu, H.; Yamaguchi, Y.
1987-12-01
The dynamics of semantic information in biosystem is studied based on holons, generators of mutual relations. Any biosystem has an internal world, a so-called "self", which has an intrinsic purpose rendering the system continuously alive and developed as much as possible against a fluctuating external world. External signals to the system through sensory organs are classified by the self into two basic categories, semantic information with some meaning and value for the purpose and inputs from background and noise sources. Due to this breaking of semantic symmetry, any input signals are transformed into a figure and background, respectively. As a typical example, the visual perception of vertebrates is studied. For such semantic transformation the external signal is first decomposed and converted into a number of elementary signs named "syntons" which are then transmitted into a sensory area of cortex corresponding to an image synthesizer. The synthesizer is a sort of autonomic parallel processor composed of autonomic units, "holons", which are characterized by many internal modes. Syntons are fed into the holons one by one. A set of the elementary meanings, the so-called "semons", provided to the synton are encoded in the internal modes of the holon; that is, each internal mode encodes a semon. A dynamic information theory for the transformation of external signals to semantic information is developed based on our model which we call holovision. Holovision is a dynamic model of visual perception that processes an autonomic ability to self-organize visual images. Autonomous oscillators are utilized as the line processors to encode line elements with specific orientations in their phases as semons. An information space is defined according to the assembly of holons; the spatial plane on which holons are arranged is a syntactic subspace while the internal modes of the holons span a semantic subspace in the orthogonal direction. In this information space, the image of a figure is self-organized - as a sort of spatiotemporal pattern - by autonomic coordinations of the holons that select relevant internal modes, accompanied with compression of irrelevant syntons that correspond to the background. Holons coded by a synton are relevantly connected by means of coherent relations, i.e., dynamic connections with time-coherence, in order to represent the image that varies in time depending on the instantaneous state of the external object. These connections depend on the internal modes that are cooperatively selectively selected by the holons. The image is regarded as a bridge between the external and internal world that has both external and internal consistency. The meaning of the image, i.e., transformed semantic information, is spontaneously transferred from semantic items that have a coherent relation with the image, and the external signal is perceived by the self through the image. We demonstrate that images are indeed self-organized in holovision in the previously described sense. Simulated processes of the creation of semantic information in holovision are shown to display typical features of the forgoing steps of information compression. Based on these results, we propose quantitative indices that represent the value of semantic information in the image processor as well as in the memory.
Wu, Zhenyu; Xu, Yuan; Yang, Yunong; Zhang, Chunhong; Zhu, Xinning; Ji, Yang
2017-01-01
Web of Things (WoT) facilitates the discovery and interoperability of Internet of Things (IoT) devices in a cyber-physical system (CPS). Moreover, a uniform knowledge representation of physical resources is quite necessary for further composition, collaboration, and decision-making process in CPS. Though several efforts have integrated semantics with WoT, such as knowledge engineering methods based on semantic sensor networks (SSN), it still could not represent the complex relationships between devices when dynamic composition and collaboration occur, and it totally depends on manual construction of a knowledge base with low scalability. In this paper, to addresses these limitations, we propose the semantic Web of Things (SWoT) framework for CPS (SWoT4CPS). SWoT4CPS provides a hybrid solution with both ontological engineering methods by extending SSN and machine learning methods based on an entity linking (EL) model. To testify to the feasibility and performance, we demonstrate the framework by implementing a temperature anomaly diagnosis and automatic control use case in a building automation system. Evaluation results on the EL method show that linking domain knowledge to DBpedia has a relative high accuracy and the time complexity is at a tolerant level. Advantages and disadvantages of SWoT4CPS with future work are also discussed. PMID:28230725
NASA Astrophysics Data System (ADS)
Amit, S. N. K.; Saito, S.; Sasaki, S.; Kiyoki, Y.; Aoki, Y.
2015-04-01
Google earth with high-resolution imagery basically takes months to process new images before online updates. It is a time consuming and slow process especially for post-disaster application. The objective of this research is to develop a fast and effective method of updating maps by detecting local differences occurred over different time series; where only region with differences will be updated. In our system, aerial images from Massachusetts's road and building open datasets, Saitama district datasets are used as input images. Semantic segmentation is then applied to input images. Semantic segmentation is a pixel-wise classification of images by implementing deep neural network technique. Deep neural network technique is implemented due to being not only efficient in learning highly discriminative image features such as road, buildings etc., but also partially robust to incomplete and poorly registered target maps. Then, aerial images which contain semantic information are stored as database in 5D world map is set as ground truth images. This system is developed to visualise multimedia data in 5 dimensions; 3 dimensions as spatial dimensions, 1 dimension as temporal dimension, and 1 dimension as degenerated dimensions of semantic and colour combination dimension. Next, ground truth images chosen from database in 5D world map and a new aerial image with same spatial information but different time series are compared via difference extraction method. The map will only update where local changes had occurred. Hence, map updating will be cheaper, faster and more effective especially post-disaster application, by leaving unchanged region and only update changed region.
Autobiographical memory and well-being in aging: The central role of semantic self-images.
Rathbone, Clare J; Holmes, Emily A; Murphy, Susannah E; Ellis, Judi A
2015-05-01
Higher levels of well-being are associated with longer life expectancies and better physical health. Previous studies suggest that processes involving the self and autobiographical memory are related to well-being, yet these relationships are poorly understood. The present study tested 32 older and 32 younger adults using scales measuring well-being and the affective valence of two types of autobiographical memory: episodic autobiographical memories and semantic self-images. Results showed that valence of semantic self-images, but not episodic autobiographical memories, was highly correlated with well-being, particularly in older adults. In contrast, well-being in older adults was unrelated to performance across a range of standardised memory tasks. These results highlight the role of semantic self-images in well-being, and have implications for the development of therapeutic interventions for well-being in aging. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Luo, Jiebo; Boutell, Matthew
2005-05-01
Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.
Mohanty, Sambit K; Mistry, Amita T; Amin, Waqas; Parwani, Anil V; Pople, Andrew K; Schmandt, Linda; Winters, Sharon B; Milliken, Erin; Kim, Paula; Whelan, Nancy B; Farhat, Ghada; Melamed, Jonathan; Taioli, Emanuela; Dhir, Rajiv; Pass, Harvey I; Becich, Michael J
2008-04-08
Recent advances in genomics, proteomics, and the increasing demands for biomarker validation studies have catalyzed changes in the landscape of cancer research, fueling the development of tissue banks for translational research. A result of this transformation is the need for sufficient quantities of clinically annotated and well-characterized biospecimens to support the growing needs of the cancer research community. Clinical annotation allows samples to be better matched to the research question at hand and ensures that experimental results are better understood and can be verified. To facilitate and standardize such annotation in bio-repositories, we have combined three accepted and complementary sets of data standards: the College of American Pathologists (CAP) Cancer Checklists, the protocols recommended by the Association of Directors of Anatomic and Surgical Pathology (ADASP) for pathology data, and the North American Association of Central Cancer Registry (NAACCR) elements for epidemiology, therapy and follow-up data. Combining these approaches creates a set of International Standards Organization (ISO) - compliant Common Data Elements (CDEs) for the mesothelioma tissue banking initiative supported by the National Institute for Occupational Safety and Health (NIOSH) of the Center for Disease Control and Prevention (CDC). The purpose of the project is to develop a core set of data elements for annotating mesothelioma specimens, following standards established by the CAP checklist, ADASP cancer protocols, and the NAACCR elements. We have associated these elements with modeling architecture to enhance both syntactic and semantic interoperability. The system has a Java-based multi-tiered architecture based on Unified Modeling Language (UML). Common Data Elements were developed using controlled vocabulary, ontology and semantic modeling methodology. The CDEs for each case are of different types: demographic, epidemiologic data, clinical history, pathology data including block level annotation, and follow-up data including treatment, recurrence and vital status. The end result of such an effort would eventually provide an increased sample set to the researchers, and makes the system interoperable between institutions. The CAP, ADASP and the NAACCR elements represent widely established data elements that are utilized in many cancer centers. Herein, we have shown these representations can be combined and formalized to create a core set of annotations for banked mesothelioma specimens. Because these data elements are collected as part of the normal workflow of a medical center, data sets developed on the basis of these elements can be easily implemented and maintained.
Triage by ranking to support the curation of protein interactions
Pasche, Emilie; Gobeill, Julien; Rech de Laval, Valentine; Gleizes, Anne; Michel, Pierre-André; Bairoch, Amos
2017-01-01
Abstract Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5 PMID:29220432
Mohanty, Sambit K; Mistry, Amita T; Amin, Waqas; Parwani, Anil V; Pople, Andrew K; Schmandt, Linda; Winters, Sharon B; Milliken, Erin; Kim, Paula; Whelan, Nancy B; Farhat, Ghada; Melamed, Jonathan; Taioli, Emanuela; Dhir, Rajiv; Pass, Harvey I; Becich, Michael J
2008-01-01
Background Recent advances in genomics, proteomics, and the increasing demands for biomarker validation studies have catalyzed changes in the landscape of cancer research, fueling the development of tissue banks for translational research. A result of this transformation is the need for sufficient quantities of clinically annotated and well-characterized biospecimens to support the growing needs of the cancer research community. Clinical annotation allows samples to be better matched to the research question at hand and ensures that experimental results are better understood and can be verified. To facilitate and standardize such annotation in bio-repositories, we have combined three accepted and complementary sets of data standards: the College of American Pathologists (CAP) Cancer Checklists, the protocols recommended by the Association of Directors of Anatomic and Surgical Pathology (ADASP) for pathology data, and the North American Association of Central Cancer Registry (NAACCR) elements for epidemiology, therapy and follow-up data. Combining these approaches creates a set of International Standards Organization (ISO) – compliant Common Data Elements (CDEs) for the mesothelioma tissue banking initiative supported by the National Institute for Occupational Safety and Health (NIOSH) of the Center for Disease Control and Prevention (CDC). Methods The purpose of the project is to develop a core set of data elements for annotating mesothelioma specimens, following standards established by the CAP checklist, ADASP cancer protocols, and the NAACCR elements. We have associated these elements with modeling architecture to enhance both syntactic and semantic interoperability. The system has a Java-based multi-tiered architecture based on Unified Modeling Language (UML). Results Common Data Elements were developed using controlled vocabulary, ontology and semantic modeling methodology. The CDEs for each case are of different types: demographic, epidemiologic data, clinical history, pathology data including block level annotation, and follow-up data including treatment, recurrence and vital status. The end result of such an effort would eventually provide an increased sample set to the researchers, and makes the system interoperable between institutions. Conclusion The CAP, ADASP and the NAACCR elements represent widely established data elements that are utilized in many cancer centers. Herein, we have shown these representations can be combined and formalized to create a core set of annotations for banked mesothelioma specimens. Because these data elements are collected as part of the normal workflow of a medical center, data sets developed on the basis of these elements can be easily implemented and maintained. PMID:18397527
Rupp, Kyle; Roos, Matthew; Milsap, Griffin; Caceres, Carlos; Ratto, Christopher; Chevillet, Mark; Crone, Nathan E; Wolmetz, Michael
2017-03-01
Non-invasive neuroimaging studies have shown that semantic category and attribute information are encoded in neural population activity. Electrocorticography (ECoG) offers several advantages over non-invasive approaches, but the degree to which semantic attribute information is encoded in ECoG responses is not known. We recorded ECoG while patients named objects from 12 semantic categories and then trained high-dimensional encoding models to map semantic attributes to spectral-temporal features of the task-related neural responses. Using these semantic attribute encoding models, untrained objects were decoded with accuracies comparable to whole-brain functional Magnetic Resonance Imaging (fMRI), and we observed that high-gamma activity (70-110Hz) at basal occipitotemporal electrodes was associated with specific semantic dimensions (manmade-animate, canonically large-small, and places-tools). Individual patient results were in close agreement with reports from other imaging modalities on the time course and functional organization of semantic processing along the ventral visual pathway during object recognition. The semantic attribute encoding model approach is critical for decoding objects absent from a training set, as well as for studying complex semantic encodings without artificially restricting stimuli to a small number of semantic categories. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
The Semantic Web: From Representation to Realization
NASA Astrophysics Data System (ADS)
Thórisson, Kristinn R.; Spivack, Nova; Wissner, James M.
A semantically-linked web of electronic information - the Semantic Web - promises numerous benefits including increased precision in automated information sorting, searching, organizing and summarizing. Realizing this requires significantly more reliable meta-information than is readily available today. It also requires a better way to represent information that supports unified management of diverse data and diverse Manipulation methods: from basic keywords to various types of artificial intelligence, to the highest level of intelligent manipulation - the human mind. How this is best done is far from obvious. Relying solely on hand-crafted annotation and ontologies, or solely on artificial intelligence techniques, seems less likely for success than a combination of the two. In this paper describe an integrated, complete solution to these challenges that has already been implemented and tested with hundreds of thousands of users. It is based on an ontological representational level we call SemCards that combines ontological rigour with flexible user interface constructs. SemCards are machine- and human-readable digital entities that allow non-experts to create and use semantic content, while empowering machines to better assist and participate in the process. SemCards enable users to easily create semantically-grounded data that in turn acts as examples for automation processes, creating a positive iterative feedback loop of metadata creation and refinement between user and machine. They provide a holistic solution to the Semantic Web, supporting powerful management of the full lifecycle of data, including its creation, retrieval, classification, sorting and sharing. We have implemented the SemCard technology on the semantic Web site Twine.com, showing that the technology is indeed versatile and scalable. Here we present the key ideas behind SemCards and describe the initial implementation of the technology.
Automatic Semantic Orientation of Adjectives for Indonesian Language Using PMI-IR and Clustering
NASA Astrophysics Data System (ADS)
Riyanti, Dewi; Arif Bijaksana, M.; Adiwijaya
2018-03-01
We present our work in the area of sentiment analysis for Indonesian language. We focus on bulding automatic semantic orientation using available resources in Indonesian. In this research we used Indonesian corpus that contains 9 million words from kompas.txt and tempo.txt that manually tagged and annotated with of part-of-speech tagset. And then we construct a dataset by taking all the adjectives from the corpus, removing the adjective with no orientation. The set contained 923 adjective words. This systems will include several steps such as text pre-processing and clustering. The text pre-processing aims to increase the accuracy. And finally clustering method will classify each word to related sentiment which is positive or negative. With improvements to the text preprocessing, can be achieved 72% of accuracy.
Borlawsky, Tara B.; Dhaval, Rakesh; Hastings, Shannon L.; Payne, Philip R. O.
2009-01-01
In October 2006, the National Institutes of Health launched a new national consortium, funded through Clinical and Translational Science Awards (CTSA), with the primary objective of improving the conduct and efficiency of the inherently multi-disciplinary field of translational research. To help meet this goal, the Ohio State University Center for Clinical and Translational Science has launched a knowledge management initiative that is focused on facilitating widespread semantic interoperability among administrative, basic science, clinical and research computing systems, both internally and among the translational research community at-large, through the integration of domain-specific standard terminologies and ontologies with local annotations. This manuscript describes an agile framework that builds upon prevailing knowledge engineering and semantic interoperability methods, and will be implemented as part this initiative. PMID:21347164
Borlawsky, Tara B; Dhaval, Rakesh; Hastings, Shannon L; Payne, Philip R O
2009-03-01
In October 2006, the National Institutes of Health launched a new national consortium, funded through Clinical and Translational Science Awards (CTSA), with the primary objective of improving the conduct and efficiency of the inherently multi-disciplinary field of translational research. To help meet this goal, the Ohio State University Center for Clinical and Translational Science has launched a knowledge management initiative that is focused on facilitating widespread semantic interoperability among administrative, basic science, clinical and research computing systems, both internally and among the translational research community at-large, through the integration of domain-specific standard terminologies and ontologies with local annotations. This manuscript describes an agile framework that builds upon prevailing knowledge engineering and semantic interoperability methods, and will be implemented as part this initiative.
NASA Astrophysics Data System (ADS)
Smart, Philip D.; Quinn, Jonathan A.; Jones, Christopher B.
The combination of mobile communication technology with location and orientation aware digital cameras has introduced increasing interest in the exploitation of 3D city models for applications such as augmented reality and automated image captioning. The effectiveness of such applications is, at present, severely limited by the often poor quality of semantic annotation of the 3D models. In this paper, we show how freely available sources of georeferenced Web 2.0 information can be used for automated enrichment of 3D city models. Point referenced names of prominent buildings and landmarks mined from Wikipedia articles and from the OpenStreetMaps digital map and Geonames gazetteer have been matched to the 2D ground plan geometry of a 3D city model. In order to address the ambiguities that arise in the associations between these sources and the city model, we present procedures to merge potentially related buildings and implement fuzzy matching between reference points and building polygons. An experimental evaluation demonstrates the effectiveness of the presented methods.
Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports
Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225
Automatic identification of critical follow-up recommendation sentences in radiology reports.
Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.
Serial and semantic encoding of lists of words in schizophrenia patients with visual hallucinations.
Brébion, Gildas; Ohlsen, Ruth I; Pilowsky, Lyn S; David, Anthony S
2011-03-30
Previous research has suggested that visual hallucinations in schizophrenia are associated with abnormal salience of visual mental images. Since visual imagery is used as a mnemonic strategy to learn lists of words, increased visual imagery might impede the other commonly used strategies of serial and semantic encoding. We had previously published data on the serial and semantic strategies implemented by patients when learning lists of concrete words with different levels of semantic organisation (Brébion et al., 2004). In this paper we present a re-analysis of these data, aiming at investigating the associations between learning strategies and visual hallucinations. Results show that the patients with visual hallucinations presented less serial clustering in the non-organisable list than the other patients. In the semantically organisable list with typical instances, they presented both less serial and less semantic clustering than the other patients. Thus, patients with visual hallucinations demonstrate reduced use of serial and semantic encoding in the lists made up of fairly familiar concrete words, which enable the formation of mental images. Although these results are preliminary, we propose that this different processing of the lists stems from the abnormal salience of the mental images such patients experience from the word stimuli. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhou, Xiangrong; Takayama, Ryosuke; Wang, Song; Zhou, Xinxin; Hara, Takeshi; Fujita, Hiroshi
2017-02-01
We have proposed an end-to-end learning approach that trained a deep convolutional neural network (CNN) for automatic CT image segmentation, which accomplished a voxel-wised multiple classification to directly map each voxel on 3D CT images to an anatomical label automatically. The novelties of our proposed method were (1) transforming the anatomical structures segmentation on 3D CT images into a majority voting of the results of 2D semantic image segmentation on a number of 2D-slices from different image orientations, and (2) using "convolution" and "deconvolution" networks to achieve the conventional "coarse recognition" and "fine extraction" functions which were integrated into a compact all-in-one deep CNN for CT image segmentation. The advantage comparing to previous works was its capability to accomplish real-time image segmentations on 2D slices of arbitrary CT-scan-range (e.g. body, chest, abdomen) and produced correspondingly-sized output. In this paper, we propose an improvement of our proposed approach by adding an organ localization module to limit CT image range for training and testing deep CNNs. A database consisting of 240 3D CT scans and a human annotated ground truth was used for training (228 cases) and testing (the remaining 12 cases). We applied the improved method to segment pancreas and left kidney regions, respectively. The preliminary results showed that the accuracies of the segmentation results were improved significantly (pancreas was 34% and kidney was 8% increased in Jaccard index from our previous results). The effectiveness and usefulness of proposed improvement for CT image segmentations were confirmed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; Franklin, Lyndsey; Tratz, Stephen C.
2008-04-01
Frame Analysis has come to play an increasingly stronger role in the study of social movements in Sociology and Political Science. While significant steps have been made in providing a theory of frames and framing, a systematic characterization of the frame concept is still largely lacking and there are no rec-ognized criteria and methods that can be used to identify and marshal frame evi-dence reliably and in a time and cost effective manner. Consequently, current Frame Analysis work is still too reliant on manual annotation and subjective inter-pretation. The goal of this paper is to present an approach to themore » representation, acquisition and analysis of frame evidence which leverages Content Analysis, In-formation Extraction and Semantic Search methods to provide a systematic treat-ment of a Frame Analysis and automate frame annotation.« less
Data Acquisition and Linguistic Resources
NASA Astrophysics Data System (ADS)
Strassel, Stephanie; Christianson, Caitlin; McCary, John; Staderman, William; Olive, Joseph
All human language technology demands substantial quantities of data for system training and development, plus stable benchmark data to measure ongoing progress. While creation of high quality linguistic resources is both costly and time consuming, such data has the potential to profoundly impact not just a single evaluation program but language technology research in general. GALE's challenging performance targets demand linguistic data on a scale and complexity never before encountered. Resources cover multiple languages (Arabic, Chinese, and English) and multiple genres -- both structured (newswire and broadcast news) and unstructured (web text, including blogs and newsgroups, and broadcast conversation). These resources include significant volumes of monolingual text and speech, parallel text, and transcribed audio combined with multiple layers of linguistic annotation, ranging from word aligned parallel text and Treebanks to rich semantic annotation.
Individual differences in white matter microstructure predict semantic control.
Nugiel, Tehila; Alm, Kylie H; Olson, Ingrid R
2016-12-01
In everyday conversation, we make many rapid choices between competing concepts and words in order to convey our intent. This process is termed semantic control, and it is thought to rely on information transmission between a distributed semantic store in the temporal lobes and a more discrete region, optimized for retrieval and selection, in the left inferior frontal gyrus. Here, we used diffusion tensor imaging in a group of neurologically normal young adults to investigate the relationship between semantic control and white matter tracts that have been implicated in semantic memory retrieval. Participants completed a verb generation task that taps semantic control (Snyder & Munakata, 2008; Snyder et al., 2010) and underwent a diffusion imaging scan. Deterministic tractography was performed to compute indices representing the microstructural properties of the inferior fronto-occipital fasciculus (IFOF), the uncinate fasciculus (UF), and the inferior longitudinal fasciculus (ILF). Microstructural measures of the UF failed to predict semantic control performance. However, there was a significant relationship between microstructure of the left IFOF and ILF and individual differences in semantic control. Our findings support the view put forth by Duffau (2013) that the IFOF is a key structural pathway in semantic retrieval.
Extracting BI-RADS Features from Portuguese Clinical Texts.
Nassif, Houssam; Cunha, Filipe; Moreira, Inês C; Cruz-Correia, Ricardo; Sousa, Eliana; Page, David; Burnside, Elizabeth; Dutra, Inês
2012-01-01
In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation by a specialist in mammography. Our results show that our parser's performance is comparable to the manual method.
Geo-Coding for the Mapping of Documents and Social Media Messages
2013-08-22
O.L. (2007). UBC-ALM: Combining KNN with SVD for WSD. Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague...and Yarowsky, D. (1992). One sense per discourse. In Proceedings of the 4th DARPA Speech and Natural Language Workshop. pp. 233-237, 1992. Retrieved...Part-of- Speech Tagging for Twitter: Annotation, Features, and Experiments. Proceedings of the Annual Meeting of the Association for Computational
Fedorov, Andriy; Clunie, David; Ulrich, Ethan; Bauer, Christian; Wahle, Andreas; Brown, Bartley; Onken, Michael; Riesmeier, Jörg; Pieper, Steve; Kikinis, Ron; Buatti, John; Beichel, Reinhard R
2016-01-01
Background. Imaging biomarkers hold tremendous promise for precision medicine clinical applications. Development of such biomarkers relies heavily on image post-processing tools for automated image quantitation. Their deployment in the context of clinical research necessitates interoperability with the clinical systems. Comparison with the established outcomes and evaluation tasks motivate integration of the clinical and imaging data, and the use of standardized approaches to support annotation and sharing of the analysis results and semantics. We developed the methodology and tools to support these tasks in Positron Emission Tomography and Computed Tomography (PET/CT) quantitative imaging (QI) biomarker development applied to head and neck cancer (HNC) treatment response assessment, using the Digital Imaging and Communications in Medicine (DICOM(®)) international standard and free open-source software. Methods. Quantitative analysis of PET/CT imaging data collected on patients undergoing treatment for HNC was conducted. Processing steps included Standardized Uptake Value (SUV) normalization of the images, segmentation of the tumor using manual and semi-automatic approaches, automatic segmentation of the reference regions, and extraction of the volumetric segmentation-based measurements. Suitable components of the DICOM standard were identified to model the various types of data produced by the analysis. A developer toolkit of conversion routines and an Application Programming Interface (API) were contributed and applied to create a standards-based representation of the data. Results. DICOM Real World Value Mapping, Segmentation and Structured Reporting objects were utilized for standards-compliant representation of the PET/CT QI analysis results and relevant clinical data. A number of correction proposals to the standard were developed. The open-source DICOM toolkit (DCMTK) was improved to simplify the task of DICOM encoding by introducing new API abstractions. Conversion and visualization tools utilizing this toolkit were developed. The encoded objects were validated for consistency and interoperability. The resulting dataset was deposited in the QIN-HEADNECK collection of The Cancer Imaging Archive (TCIA). Supporting tools for data analysis and DICOM conversion were made available as free open-source software. Discussion. We presented a detailed investigation of the development and application of the DICOM model, as well as the supporting open-source tools and toolkits, to accommodate representation of the research data in QI biomarker development. We demonstrated that the DICOM standard can be used to represent the types of data relevant in HNC QI biomarker development, and encode their complex relationships. The resulting annotated objects are amenable to data mining applications, and are interoperable with a variety of systems that support the DICOM standard.
AutoBD: Automated Bi-Level Description for Scalable Fine-Grained Visual Categorization.
Yao, Hantao; Zhang, Shiliang; Yan, Chenggang; Zhang, Yongdong; Li, Jintao; Tian, Qi
Compared with traditional image classification, fine-grained visual categorization is a more challenging task, because it targets to classify objects belonging to the same species, e.g. , classify hundreds of birds or cars. In the past several years, researchers have made many achievements on this topic. However, most of them are heavily dependent on the artificial annotations, e.g., bounding boxes, part annotations, and so on . The requirement of artificial annotations largely hinders the scalability and application. Motivated to release such dependence, this paper proposes a robust and discriminative visual description named Automated Bi-level Description (AutoBD). "Bi-level" denotes two complementary part-level and object-level visual descriptions, respectively. AutoBD is "automated," because it only requires the image-level labels of training images and does not need any annotations for testing images. Compared with the part annotations labeled by the human, the image-level labels can be easily acquired, which thus makes AutoBD suitable for large-scale visual categorization. Specifically, the part-level description is extracted by identifying the local region saliently representing the visual distinctiveness. The object-level description is extracted from object bounding boxes generated with a co-localization algorithm. Although only using the image-level labels, AutoBD outperforms the recent studies on two public benchmark, i.e. , classification accuracy achieves 81.6% on CUB-200-2011 and 88.9% on Car-196, respectively. On the large-scale Birdsnap data set, AutoBD achieves the accuracy of 68%, which is currently the best performance to the best of our knowledge.Compared with traditional image classification, fine-grained visual categorization is a more challenging task, because it targets to classify objects belonging to the same species, e.g. , classify hundreds of birds or cars. In the past several years, researchers have made many achievements on this topic. However, most of them are heavily dependent on the artificial annotations, e.g., bounding boxes, part annotations, and so on . The requirement of artificial annotations largely hinders the scalability and application. Motivated to release such dependence, this paper proposes a robust and discriminative visual description named Automated Bi-level Description (AutoBD). "Bi-level" denotes two complementary part-level and object-level visual descriptions, respectively. AutoBD is "automated," because it only requires the image-level labels of training images and does not need any annotations for testing images. Compared with the part annotations labeled by the human, the image-level labels can be easily acquired, which thus makes AutoBD suitable for large-scale visual categorization. Specifically, the part-level description is extracted by identifying the local region saliently representing the visual distinctiveness. The object-level description is extracted from object bounding boxes generated with a co-localization algorithm. Although only using the image-level labels, AutoBD outperforms the recent studies on two public benchmark, i.e. , classification accuracy achieves 81.6% on CUB-200-2011 and 88.9% on Car-196, respectively. On the large-scale Birdsnap data set, AutoBD achieves the accuracy of 68%, which is currently the best performance to the best of our knowledge.
Liu, Dan; Liu, Xuejun; Wu, Yiguang
2018-04-24
This paper presents an effective approach for depth reconstruction from a single image through the incorporation of semantic information and local details from the image. A unified framework for depth acquisition is constructed by joining a deep Convolutional Neural Network (CNN) and a continuous pairwise Conditional Random Field (CRF) model. Semantic information and relative depth trends of local regions inside the image are integrated into the framework. A deep CNN network is firstly used to automatically learn a hierarchical feature representation of the image. To get more local details in the image, the relative depth trends of local regions are incorporated into the network. Combined with semantic information of the image, a continuous pairwise CRF is then established and is used as the loss function of the unified model. Experiments on real scenes demonstrate that the proposed approach is effective and that the approach obtains satisfactory results.
CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging.
Held, Michael; Schmitz, Michael H A; Fischer, Bernd; Walter, Thomas; Neumann, Beate; Olma, Michael H; Peter, Matthias; Ellenberg, Jan; Gerlich, Daniel W
2010-09-01
Fluorescence time-lapse imaging has become a powerful tool to investigate complex dynamic processes such as cell division or intracellular trafficking. Automated microscopes generate time-resolved imaging data at high throughput, yet tools for quantification of large-scale movie data are largely missing. Here we present CellCognition, a computational framework to annotate complex cellular dynamics. We developed a machine-learning method that combines state-of-the-art classification with hidden Markov modeling for annotation of the progression through morphologically distinct biological states. Incorporation of time information into the annotation scheme was essential to suppress classification noise at state transitions and confusion between different functional states with similar morphology. We demonstrate generic applicability in different assays and perturbation conditions, including a candidate-based RNA interference screen for regulators of mitotic exit in human cells. CellCognition is published as open source software, enabling live-cell imaging-based screening with assays that directly score cellular dynamics.
Picture grammars in classification and semantic interpretation of 3D coronary vessels visualisations
NASA Astrophysics Data System (ADS)
Ogiela, M. R.; Tadeusiewicz, R.; Trzupek, M.
2009-09-01
The work presents the new opportunity for making semantic descriptions and analysis of medical structures, especially coronary vessels CT spatial reconstructions, with the use of AI graph-based linguistic formalisms. In the paper there will be discussed the manners of applying methods of computational intelligence to the development of a syntactic semantic description of spatial visualisations of the heart's coronary vessels. Such descriptions may be used for both smart ordering of images while archiving them and for their semantic searches in medical multimedia databases. Presented methodology of analysis can furthermore be used for attaining other goals related performance of computer-assisted semantic interpretation of selected elements and/or the entire 3D structure of the coronary vascular tree. These goals are achieved through the use of graph-based image formalisms based on IE graphs generating grammars that allow discovering and automatic semantic interpretation of irregularities visualised on the images obtained during diagnostic examinations of the heart muscle. The basis for the construction of 3D reconstructions of biological objects used in this work are visualisations obtained from helical CT scans, yet the method itself may be applied also for other methods of medical 3D images acquisition. The obtained semantic information makes it possible to make a description of the structure focused on the semantics of various morphological forms of the visualised vessels from the point of view of the operation of coronary circulation and the blood supply of the heart muscle. Thanks to these, the analysis conducted allows fast and — to a great degree — automated interpretation of the semantics of various morphological changes in the coronary vascular tree, and especially makes it possible to detect these stenoses in the lumen of the vessels that can cause critical decrease in blood supply to extensive or especially important fragments of the heart muscle.
Exploiting semantics for sensor re-calibration in event detection systems
NASA Astrophysics Data System (ADS)
Vaisenberg, Ronen; Ji, Shengyue; Hore, Bijit; Mehrotra, Sharad; Venkatasubramanian, Nalini
2008-01-01
Event detection from a video stream is becoming an important and challenging task in surveillance and sentient systems. While computer vision has been extensively studied to solve different kinds of detection problems over time, it is still a hard problem and even in a controlled environment only simple events can be detected with a high degree of accuracy. Instead of struggling to improve event detection using image processing only, we bring in semantics to direct traditional image processing. Semantics are the underlying facts that hide beneath video frames, which can not be "seen" directly by image processing. In this work we demonstrate that time sequence semantics can be exploited to guide unsupervised re-calibration of the event detection system. We present an instantiation of our ideas by using an appliance as an example--Coffee Pot level detection based on video data--to show that semantics can guide the re-calibration of the detection model. This work exploits time sequence semantics to detect when re-calibration is required to automatically relearn a new detection model for the newly evolved system state and to resume monitoring with a higher rate of accuracy.
Lesion Detection in CT Images Using Deep Learning Semantic Segmentation Technique
NASA Astrophysics Data System (ADS)
Kalinovsky, A.; Liauchuk, V.; Tarasau, A.
2017-05-01
In this paper, the problem of automatic detection of tuberculosis lesion on 3D lung CT images is considered as a benchmark for testing out algorithms based on a modern concept of Deep Learning. For training and testing of the algorithms a domestic dataset of 338 3D CT scans of tuberculosis patients with manually labelled lesions was used. The algorithms which are based on using Deep Convolutional Networks were implemented and applied in three different ways including slice-wise lesion detection in 2D images using semantic segmentation, slice-wise lesion detection in 2D images using sliding window technique as well as straightforward detection of lesions via semantic segmentation in whole 3D CT scans. The algorithms demonstrate superior performance compared to algorithms based on conventional image analysis methods.
Learning to rank image tags with limited training examples.
Songhe Feng; Zheyun Feng; Rong Jin
2015-04-01
With an increasing number of images that are available in social media, image annotation has emerged as an important research topic due to its application in image matching and retrieval. Most studies cast image annotation into a multilabel classification problem. The main shortcoming of this approach is that it requires a large number of training images with clean and complete annotations in order to learn a reliable model for tag prediction. We address this limitation by developing a novel approach that combines the strength of tag ranking with the power of matrix recovery. Instead of having to make a binary decision for each tag, our approach ranks tags in the descending order of their relevance to the given image, significantly simplifying the problem. In addition, the proposed method aggregates the prediction models for different tags into a matrix, and casts tag ranking into a matrix recovery problem. It introduces the matrix trace norm to explicitly control the model complexity, so that a reliable prediction model can be learned for tag ranking even when the tag space is large and the number of training images is limited. Experiments on multiple well-known image data sets demonstrate the effectiveness of the proposed framework for tag ranking compared with the state-of-the-art approaches for image annotation and tag ranking.
Semantics-Based Composition of Integrated Cardiomyocyte Models Motivated by Real-World Use Cases.
Neal, Maxwell L; Carlson, Brian E; Thompson, Christopher T; James, Ryan C; Kim, Karam G; Tran, Kenneth; Crampin, Edmund J; Cook, Daniel L; Gennari, John H
2015-01-01
Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen's semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the "Pandit-Hinch-Niederer" (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach.
Semantics-Based Composition of Integrated Cardiomyocyte Models Motivated by Real-World Use Cases
Neal, Maxwell L.; Carlson, Brian E.; Thompson, Christopher T.; James, Ryan C.; Kim, Karam G.; Tran, Kenneth; Crampin, Edmund J.; Cook, Daniel L.; Gennari, John H.
2015-01-01
Semantics-based model composition is an approach for generating complex biosimulation models from existing components that relies on capturing the biological meaning of model elements in a machine-readable fashion. This approach allows the user to work at the biological rather than computational level of abstraction and helps minimize the amount of manual effort required for model composition. To support this compositional approach, we have developed the SemGen software, and here report on SemGen’s semantics-based merging capabilities using real-world modeling use cases. We successfully reproduced a large, manually-encoded, multi-model merge: the “Pandit-Hinch-Niederer” (PHN) cardiomyocyte excitation-contraction model, previously developed using CellML. We describe our approach for annotating the three component models used in the PHN composition and for merging them at the biological level of abstraction within SemGen. We demonstrate that we were able to reproduce the original PHN model results in a semi-automated, semantics-based fashion and also rapidly generate a second, novel cardiomyocyte model composed using an alternative, independently-developed tension generation component. We discuss the time-saving features of our compositional approach in the context of these merging exercises, the limitations we encountered, and potential solutions for enhancing the approach. PMID:26716837
Constructing Adverse Outcome Pathways: a Demonstration of ...
Adverse outcome pathway (AOP) provides a conceptual framework to evaluate and integrate chemical toxicity and its effects across the levels of biological organization. As such, it is essential to develop a resource-efficient and effective approach to extend molecular initiating events (MIEs) of chemicals to their downstream phenotypes of a greater regulatory relevance. A number of ongoing public phenomics (high throughput phenotyping) efforts have been generating abundant phenotypic data annotated with ontology terms. These phenotypes can be analyzed semantically and linked to MIEs of interest, all in the context of a knowledge base integrated from a variety of ontologies for various species and knowledge domains. In such analyses, two phenotypic profiles (PPs; anchored by genes or diseases) each characterized by multiple ontology terms are compared for their semantic similarities within a common ontology graph, but across boundaries of species and knowledge domains. Taking advantage of publicly available ontologies and software tool kits, we have implemented an OS-Mapping (Ontology-based Semantics Mapping) approach as a Java application, and constructed a network of 19383 PPs as nodes with edges weighed by their pairwise semantic similarity scores. Individual PPs were assembled from public phenomics data. Out of possible 1.87×108 pairwise connections among these nodes, about 71% of them have similarity scores between 0.2 and the maximum possible of 1.0.
Semantic classification of business images
NASA Astrophysics Data System (ADS)
Erol, Berna; Hull, Jonathan J.
2006-01-01
Digital cameras are becoming increasingly common for capturing information in business settings. In this paper, we describe a novel method for classifying images into the following semantic classes: document, whiteboard, business card, slide, and regular images. Our method is based on combining low-level image features, such as text color, layout, and handwriting features with high-level OCR output analysis. Several Support Vector Machine Classifiers are combined for multi-class classification of input images. The system yields 95% accuracy in classification.
Exemplar-Based Image and Video Stylization Using Fully Convolutional Semantic Features.
Zhu, Feida; Yan, Zhicheng; Bu, Jiajun; Yu, Yizhou
2017-05-10
Color and tone stylization in images and videos strives to enhance unique themes with artistic color and tone adjustments. It has a broad range of applications from professional image postprocessing to photo sharing over social networks. Mainstream photo enhancement softwares, such as Adobe Lightroom and Instagram, provide users with predefined styles, which are often hand-crafted through a trial-and-error process. Such photo adjustment tools lack a semantic understanding of image contents and the resulting global color transform limits the range of artistic styles it can represent. On the other hand, stylistic enhancement needs to apply distinct adjustments to various semantic regions. Such an ability enables a broader range of visual styles. In this paper, we first propose a novel deep learning architecture for exemplar-based image stylization, which learns local enhancement styles from image pairs. Our deep learning architecture consists of fully convolutional networks (FCNs) for automatic semantics-aware feature extraction and fully connected neural layers for adjustment prediction. Image stylization can be efficiently accomplished with a single forward pass through our deep network. To extend our deep network from image stylization to video stylization, we exploit temporal superpixels (TSPs) to facilitate the transfer of artistic styles from image exemplars to videos. Experiments on a number of datasets for image stylization as well as a diverse set of video clips demonstrate the effectiveness of our deep learning architecture.
Content-Based Management of Image Databases in the Internet Age
ERIC Educational Resources Information Center
Kleban, James Theodore
2010-01-01
The Internet Age has seen the emergence of richly annotated image data collections numbering in the billions of items. This work makes contributions in three primary areas which aid the management of this data: image representation, efficient retrieval, and annotation based on content and metadata. The contributions are as follows. First,…
Tiede, Dirk; Baraldi, Andrea; Sudmanns, Martin; Belgiu, Mariana; Lang, Stefan
2017-01-01
ABSTRACT Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model. PMID:29098143
Lahiani, Amal; Klaiman, Eldad; Grimm, Oliver
2018-01-01
Context: Medical diagnosis and clinical decisions rely heavily on the histopathological evaluation of tissue samples, especially in oncology. Historically, classical histopathology has been the gold standard for tissue evaluation and assessment by pathologists. The most widely and commonly used dyes in histopathology are hematoxylin and eosin (H&E) as most malignancies diagnosis is largely based on this protocol. H&E staining has been used for more than a century to identify tissue characteristics and structures morphologies that are needed for tumor diagnosis. In many cases, as tissue is scarce in clinical studies, fluorescence imaging is necessary to allow staining of the same specimen with multiple biomarkers simultaneously. Since fluorescence imaging is a relatively new technology in the pathology landscape, histopathologists are not used to or trained in annotating or interpreting these images. Aims, Settings and Design: To allow pathologists to annotate these images without the need for additional training, we designed an algorithm for the conversion of fluorescence images to brightfield H&E images. Subjects and Methods: In this algorithm, we use fluorescent nuclei staining to reproduce the hematoxylin information and natural tissue autofluorescence to reproduce the eosin information avoiding the necessity to specifically stain the proteins or intracellular structures with an additional fluorescence stain. Statistical Analysis Used: Our method is based on optimizing a transform function from fluorescence to H&E images using least mean square optimization. Results: It results in high quality virtual H&E digital images that can easily and efficiently be analyzed by pathologists. We validated our results with pathologists by making them annotate tumor in real and virtual H&E whole slide images and we obtained promising results. Conclusions: Hence, we provide a solution that enables pathologists to assess tissue and annotate specific structures based on multiplexed fluorescence images. PMID:29531846
Lahiani, Amal; Klaiman, Eldad; Grimm, Oliver
2018-01-01
Medical diagnosis and clinical decisions rely heavily on the histopathological evaluation of tissue samples, especially in oncology. Historically, classical histopathology has been the gold standard for tissue evaluation and assessment by pathologists. The most widely and commonly used dyes in histopathology are hematoxylin and eosin (H&E) as most malignancies diagnosis is largely based on this protocol. H&E staining has been used for more than a century to identify tissue characteristics and structures morphologies that are needed for tumor diagnosis. In many cases, as tissue is scarce in clinical studies, fluorescence imaging is necessary to allow staining of the same specimen with multiple biomarkers simultaneously. Since fluorescence imaging is a relatively new technology in the pathology landscape, histopathologists are not used to or trained in annotating or interpreting these images. To allow pathologists to annotate these images without the need for additional training, we designed an algorithm for the conversion of fluorescence images to brightfield H&E images. In this algorithm, we use fluorescent nuclei staining to reproduce the hematoxylin information and natural tissue autofluorescence to reproduce the eosin information avoiding the necessity to specifically stain the proteins or intracellular structures with an additional fluorescence stain. Our method is based on optimizing a transform function from fluorescence to H&E images using least mean square optimization. It results in high quality virtual H&E digital images that can easily and efficiently be analyzed by pathologists. We validated our results with pathologists by making them annotate tumor in real and virtual H&E whole slide images and we obtained promising results. Hence, we provide a solution that enables pathologists to assess tissue and annotate specific structures based on multiplexed fluorescence images.
2011-01-01
Background Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information. Results Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase. Conclusions By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research. PMID:21595881
Chepelev, Leonid L; Dumontier, Michel
2011-05-19
Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information. Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase. By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research.
Translation-aware semantic segmentation via conditional least-square generative adversarial networks
NASA Astrophysics Data System (ADS)
Zhang, Mi; Hu, Xiangyun; Zhao, Like; Pang, Shiyan; Gong, Jinqi; Luo, Min
2017-10-01
Semantic segmentation has recently made rapid progress in the field of remote sensing and computer vision. However, many leading approaches cannot simultaneously translate label maps to possible source images with a limited number of training images. The core issue is insufficient adversarial information to interpret the inverse process and proper objective loss function to overcome the vanishing gradient problem. We propose the use of conditional least squares generative adversarial networks (CLS-GAN) to delineate visual objects and solve these problems. We trained the CLS-GAN network for semantic segmentation to discriminate dense prediction information either from training images or generative networks. We show that the optimal objective function of CLS-GAN is a special class of f-divergence and yields a generator that lies on the decision boundary of discriminator that reduces possible vanished gradient. We also demonstrate the effectiveness of the proposed architecture at translating images from label maps in the learning process. Experiments on a limited number of high resolution images, including close-range and remote sensing datasets, indicate that the proposed method leads to the improved semantic segmentation accuracy and can simultaneously generate high quality images from label maps.
Combined semantic and similarity search in medical image databases
NASA Astrophysics Data System (ADS)
Seifert, Sascha; Thoma, Marisa; Stegmaier, Florian; Hammon, Matthias; Kramer, Martin; Huber, Martin; Kriegel, Hans-Peter; Cavallaro, Alexander; Comaniciu, Dorin
2011-03-01
The current diagnostic process at hospitals is mainly based on reviewing and comparing images coming from multiple time points and modalities in order to monitor disease progression over a period of time. However, for ambiguous cases the radiologist deeply relies on reference literature or second opinion. Although there is a vast amount of acquired images stored in PACS systems which could be reused for decision support, these data sets suffer from weak search capabilities. Thus, we present a search methodology which enables the physician to fulfill intelligent search scenarios on medical image databases combining ontology-based semantic and appearance-based similarity search. It enabled the elimination of 12% of the top ten hits which would arise without taking the semantic context into account.
Fully convolutional network with cluster for semantic segmentation
NASA Astrophysics Data System (ADS)
Ma, Xiao; Chen, Zhongbi; Zhang, Jianlin
2018-04-01
At present, image semantic segmentation technology has been an active research topic for scientists in the field of computer vision and artificial intelligence. Especially, the extensive research of deep neural network in image recognition greatly promotes the development of semantic segmentation. This paper puts forward a method based on fully convolutional network, by cluster algorithm k-means. The cluster algorithm using the image's low-level features and initializing the cluster centers by the super-pixel segmentation is proposed to correct the set of points with low reliability, which are mistakenly classified in great probability, by the set of points with high reliability in each clustering regions. This method refines the segmentation of the target contour and improves the accuracy of the image segmentation.
Towards the VWO Annotation Service: a Success Story of the IMAGE RPI Expert Rating System
NASA Astrophysics Data System (ADS)
Reinisch, B. W.; Galkin, I. A.; Fung, S. F.; Benson, R. F.; Kozlov, A. V.; Khmyrov, G. M.; Garcia, L. N.
2010-12-01
Interpretation of Heliophysics wave data requires specialized knowledge of wave phenomena. Users of the virtual wave observatory (VWO) will greatly benefit from a data annotation service that will allow querying of data by phenomenon type, thus helping accomplish the VWO goal to make Heliophysics wave data searchable, understandable, and usable by the scientific community. Individual annotations can be sorted by phenomenon type and reduced into event lists (catalogs). However, in contrast to the event lists, annotation records allow a greater flexibility of collaborative management by more easily admitting operations of addition, revision, or deletion. They can therefore become the building blocks for an interactive Annotation Service with a suitable graphic user interface to the VWO middleware. The VWO Annotation Service vision is an interactive, collaborative sharing of domain expert knowledge with fellow scientists and students alike. An effective prototype of the VWO Annotation Service has been in operation at the University of Massachusetts Lowell since 2001. An expert rating system (ERS) was developed for annotating the IMAGE radio plasma imager (RPI) active sounding data containing 1.2 million plasmagrams. The RPI data analysts can use ERS to submit expert ratings of plasmagram features, such as presence of echo traces resulted from reflected RPI signals from distant plasma structures. Since its inception in 2001, the RPI ERS has accumulated 7351 expert plasmagram ratings in 16 phenomenon categories, together with free-text descriptions and other metadata. In addition to human expert ratings, the system holds 225,125 ratings submitted by the CORPRAL data prospecting software that employs a model of the human pre-attentive vision to select images potentially containing interesting features. The annotation records proved to be instrumental in a number of investigations where manual data exploration would have been prohibitively tedious and expensive. Especially useful are queries of the annotation database for successive plasmagrams containing echo traces. Several success stories of the RPI ERS using this capability will be discussed, particularly in terms of how they may be extended to develop the VWO Annotation Service.
A shortest-path graph kernel for estimating gene product semantic similarity.
Alvarez, Marco A; Qi, Xiaojun; Yan, Changhui
2011-07-29
Existing methods for calculating semantic similarity between gene products using the Gene Ontology (GO) often rely on external resources, which are not part of the ontology. Consequently, changes in these external resources like biased term distribution caused by shifting of hot research topics, will affect the calculation of semantic similarity. One way to avoid this problem is to use semantic methods that are "intrinsic" to the ontology, i.e. independent of external knowledge. We present a shortest-path graph kernel (spgk) method that relies exclusively on the GO and its structure. In spgk, a gene product is represented by an induced subgraph of the GO, which consists of all the GO terms annotating it. Then a shortest-path graph kernel is used to compute the similarity between two graphs. In a comprehensive evaluation using a benchmark dataset, spgk compares favorably with other methods that depend on external resources. Compared with simUI, a method that is also intrinsic to GO, spgk achieves slightly better results on the benchmark dataset. Statistical tests show that the improvement is significant when the resolution and EC similarity correlation coefficient are used to measure the performance, but is insignificant when the Pfam similarity correlation coefficient is used. Spgk uses a graph kernel method in polynomial time to exploit the structure of the GO to calculate semantic similarity between gene products. It provides an alternative to both methods that use external resources and "intrinsic" methods with comparable performance.
Huang, Jingshan; Gutierrez, Fernando; Strachan, Harrison J; Dou, Dejing; Huang, Weili; Smith, Barry; Blake, Judith A; Eilbeck, Karen; Natale, Darren A; Lin, Yu; Wu, Bin; Silva, Nisansa de; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming; Ruttenberg, Alan
2016-01-01
As a special class of non-coding RNAs (ncRNAs), microRNAs (miRNAs) perform important roles in numerous biological and pathological processes. The realization of miRNA functions depends largely on how miRNAs regulate specific target genes. It is therefore critical to identify, analyze, and cross-reference miRNA-target interactions to better explore and delineate miRNA functions. Semantic technologies can help in this regard. We previously developed a miRNA domain-specific application ontology, Ontology for MIcroRNA Target (OMIT), whose goal was to serve as a foundation for semantic annotation, data integration, and semantic search in the miRNA field. In this paper we describe our continuing effort to develop the OMIT, and demonstrate its use within a semantic search system, OmniSearch, designed to facilitate knowledge capture of miRNA-target interaction data. Important changes in the current version OMIT are summarized as: (1) following a modularized ontology design (with 2559 terms imported from the NCRO ontology); (2) encoding all 1884 human miRNAs (vs. 300 in previous versions); and (3) setting up a GitHub project site along with an issue tracker for more effective community collaboration on the ontology development. The OMIT ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/omit.owl. The OmniSearch system is also free and open to all users, accessible at: http://omnisearch.soc.southalabama.edu/index.php/Software.
A health analytics semantic ETL service for obesity surveillance.
Poulymenopoulou, M; Papakonstantinou, D; Malamateniou, F; Vassilacopoulos, G
2015-01-01
The increasingly large amount of data produced in healthcare (e.g. collected through health information systems such as electronic medical records - EMRs or collected through novel data sources such as personal health records - PHRs, social media, web resources) enable the creation of detailed records about people's health, sentiments and activities (e.g. physical activity, diet, sleep quality) that can be used in the public health area among others. However, despite the transformative potential of big data in public health surveillance there are several challenges in integrating big data. In this paper, the interoperability challenge is tackled and a semantic Extract Transform Load (ETL) service is proposed that seeks to semantically annotate big data to result into valuable data for analysis. This service is considered as part of a health analytics engine on the cloud that interacts with existing healthcare information exchange networks, like the Integrating the Healthcare Enterprise (IHE), PHRs, sensors, mobile applications, and other web resources to retrieve patient health, behavioral and daily activity data. The semantic ETL service aims at semantically integrating big data for use by analytic mechanisms. An illustrative implementation of the service on big data which is potentially relevant to human obesity, enables using appropriate analytic techniques (e.g. machine learning, text mining) that are expected to assist in identifying patterns and contributing factors (e.g. genetic background, social, environmental) for this social phenomenon and, hence, drive health policy changes and promote healthy behaviors where residents live, work, learn, shop and play.
Spanier, A B; Caplan, N; Sosna, J; Acar, B; Joskowicz, L
2018-01-01
The goal of medical content-based image retrieval (M-CBIR) is to assist radiologists in the decision-making process by retrieving medical cases similar to a given image. One of the key interests of radiologists is lesions and their annotations, since the patient treatment depends on the lesion diagnosis. Therefore, a key feature of M-CBIR systems is the retrieval of scans with the most similar lesion annotations. To be of value, M-CBIR systems should be fully automatic to handle large case databases. We present a fully automatic end-to-end method for the retrieval of CT scans with similar liver lesion annotations. The input is a database of abdominal CT scans labeled with liver lesions, a query CT scan, and optionally one radiologist-specified lesion annotation of interest. The output is an ordered list of the database CT scans with the most similar liver lesion annotations. The method starts by automatically segmenting the liver in the scan. It then extracts a histogram-based features vector from the segmented region, learns the features' relative importance, and ranks the database scans according to the relative importance measure. The main advantages of our method are that it fully automates the end-to-end querying process, that it uses simple and efficient techniques that are scalable to large datasets, and that it produces quality retrieval results using an unannotated CT scan. Our experimental results on 9 CT queries on a dataset of 41 volumetric CT scans from the 2014 Image CLEF Liver Annotation Task yield an average retrieval accuracy (Normalized Discounted Cumulative Gain index) of 0.77 and 0.84 without/with annotation, respectively. Fully automatic end-to-end retrieval of similar cases based on image information alone, rather that on disease diagnosis, may help radiologists to better diagnose liver lesions.
Semantic and topological classification of images in magnetically guided capsule endoscopy
NASA Astrophysics Data System (ADS)
Mewes, P. W.; Rennert, P.; Juloski, A. L.; Lalande, A.; Angelopoulou, E.; Kuth, R.; Hornegger, J.
2012-03-01
Magnetically-guided capsule endoscopy (MGCE) is a nascent technology with the goal to allow the steering of a capsule endoscope inside a water filled stomach through an external magnetic field. We developed a classification cascade for MGCE images with groups images in semantic and topological categories. Results can be used in a post-procedure review or as a starting point for algorithms classifying pathologies. The first semantic classification step discards over-/under-exposed images as well as images with a large amount of debris. The second topological classification step groups images with respect to their position in the upper gastrointestinal tract (mouth, esophagus, stomach, duodenum). In the third stage two parallel classifications steps distinguish topologically different regions inside the stomach (cardia, fundus, pylorus, antrum, peristaltic view). For image classification, global image features and local texture features were applied and their performance was evaluated. We show that the third classification step can be improved by a bubble and debris segmentation because it limits feature extraction to discriminative areas only. We also investigated the impact of segmenting intestinal folds on the identification of different semantic camera positions. The results of classifications with a support-vector-machine show the significance of color histogram features for the classification of corrupted images (97%). Features extracted from intestinal fold segmentation lead only to a minor improvement (3%) in discriminating different camera positions.
McCarthy, Laura Mary; Kalinyak-Fliszar, Michelene; Kohen, Francine; Martin, Nadine
2017-01-01
Deep dysphasia is a relatively rare subcategory of aphasia, characterised by word repetition impairment and a profound auditory-verbal short-term memory (STM) limitation. Repetition of words is better than nonwords (lexicality effect) and better for high-image than low-image words (imageability effect). Another related language impairment profile is phonological dysphasia, which includes all of the characteristics of deep dysphasia except for the occurrence of semantic errors in single word repetition. The overlap in symptoms of deep and phonological dysphasia has led to the hypothesis that they share the same root cause, impaired maintenance of activated representation of words, but that they differ in severity of that impairment, with deep dysphasia being more severe. We report a single-subject multiple baseline, multiple probe treatment study of a person who presented with a pattern of repetition that was consistent with the continuum of deep-phonological dysphasia: imageability and lexicality effects in repetition of single and multiple words and semantic errors in repetition of multiple-word utterances. The aim of this treatment study was to improve access to and repetition of low-imageability words by embedding them in modifier-noun phrases that enhanced their imageability. The treatment involved repetition of abstract noun pairs. We created modifier-abstract noun phrases that increased the semantic and syntactic cohesiveness of the words in the pair. For example, the phrases "long distance" and "social exclusion" were developed to improve repetition of the abstract pair "distance-exclusion". The goal of this manipulation was to increase the probability of accessing lexical and semantic representations of abstract words in repetition by enriching their semantic -syntactic context. We predicted that this increase in accessibility would be maintained when the words were repeated as pairs, but without the contextual phrase. Treatment outcomes indicated that increasing the semantic and syntactic cohesiveness of low-imageability and low-frequency words later improved this participant's ability to repeat those words when presented in isolation. This treatment approach to improving access to abstract word pairs for repetition was successful for our participant with phonological dysphasia. The approach exemplifies the potential value in manipulating linguistic characteristics of stimuli in ways that improve access between phonological and lexical-semantic levels of representation. Additionally, this study demonstrates how principles of a cognitive model of word processing can be used to guide treatment of word processing impairments in aphasia.
Suggestion-Induced Modulation of Semantic Priming during Functional Magnetic Resonance Imaging
Ulrich, Martin; Kiefer, Markus; Bongartz, Walter; Grön, Georg; Hoenig, Klaus
2015-01-01
Using functional magnetic resonance imaging during a primed visual lexical decision task, we investigated the neural and functional mechanisms underlying modulations of semantic word processing through hypnotic suggestions aimed at altering lexical processing of primes. The priming task was to discriminate between target words and pseudowords presented 200 ms after the prime word which was semantically related or unrelated to the target. In a counterbalanced study design, each participant performed the task once at normal wakefulness and once after the administration of hypnotic suggestions to perceive the prime as a meaningless symbol of a foreign language. Neural correlates of priming were defined as significantly lower activations upon semantically related compared to unrelated trials. We found significant suggestive treatment-induced reductions in neural priming, albeit irrespective of the degree of suggestibility. Neural priming was attenuated upon suggestive treatment compared with normal wakefulness in brain regions supporting automatic (fusiform gyrus) and controlled semantic processing (superior and middle temporal gyri, pre- and postcentral gyri, and supplementary motor area). Hence, suggestions reduced semantic word processing by conjointly dampening both automatic and strategic semantic processes. PMID:25923740
Extracting BI-RADS Features from Portuguese Clinical Texts
Nassif, Houssam; Cunha, Filipe; Moreira, Inês C.; Cruz-Correia, Ricardo; Sousa, Eliana; Page, David; Burnside, Elizabeth; Dutra, Inês
2013-01-01
In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation by a specialist in mammography. Our results show that our parser’s performance is comparable to the manual method. PMID:23797461
Supporting in- and off-Hospital Patient Management Using a Web-based Integrated Software Platform.
Spyropoulos, Basile; Botsivali, Maria; Tzavaras, Aris; Pierros, Vasileios
2015-01-01
In this paper, a Web-based software platform appropriately designed to support the continuity of health care information and management for both in and out of hospital care is presented. The system has some additional features as it is the formation of continuity of care records and the transmission of referral letters with a semantically annotated web service. The platform's Web-orientation provides significant advantages, allowing for easily accomplished remote access.
New approach for cognitive analysis and understanding of medical patterns and visualizations
NASA Astrophysics Data System (ADS)
Ogiela, Marek R.; Tadeusiewicz, Ryszard
2003-11-01
This paper presents new opportunities for applying linguistic description of the picture merit content and AI methods to undertake tasks of the automatic understanding of images semantics in intelligent medical information systems. A successful obtaining of the crucial semantic content of the medical image may contribute considerably to the creation of new intelligent multimedia cognitive medical systems. Thanks to the new idea of cognitive resonance between stream of the data extracted from the image using linguistic methods and expectations taken from the representaion of the medical knowledge, it is possible to understand the merit content of the image even if teh form of the image is very different from any known pattern. This article proves that structural techniques of artificial intelligence may be applied in the case of tasks related to automatic classification and machine perception based on semantic pattern content in order to determine the semantic meaning of the patterns. In the paper are described some examples presenting ways of applying such techniques in the creation of cognitive vision systems for selected classes of medical images. On the base of scientific research described in the paper we try to build some new systems for collecting, storing, retrieving and intelligent interpreting selected medical images especially obtained in radiological and MRI examinations.
Semantic segmentation of mFISH images using convolutional networks.
Pardo, Esteban; Morgado, José Mário T; Malpica, Norberto
2018-04-30
Multicolor in situ hybridization (mFISH) is a karyotyping technique used to detect major chromosomal alterations using fluorescent probes and imaging techniques. Manual interpretation of mFISH images is a time consuming step that can be automated using machine learning; in previous works, pixel or patch wise classification was employed, overlooking spatial information which can help identify chromosomes. In this work, we propose a fully convolutional semantic segmentation network for the interpretation of mFISH images, which uses both spatial and spectral information to classify each pixel in an end-to-end fashion. The semantic segmentation network developed was tested on samples extracted from a public dataset using cross validation. Despite having no labeling information of the image it was tested on, our algorithm yielded an average correct classification ratio (CCR) of 87.41%. Previously, this level of accuracy was only achieved with state of the art algorithms when classifying pixels from the same image in which the classifier has been trained. These results provide evidence that fully convolutional semantic segmentation networks may be employed in the computer aided diagnosis of genetic diseases with improved performance over the current image analysis methods. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.
Deformably registering and annotating whole CLARITY brains to an atlas via masked LDDMM
NASA Astrophysics Data System (ADS)
Kutten, Kwame S.; Vogelstein, Joshua T.; Charon, Nicolas; Ye, Li; Deisseroth, Karl; Miller, Michael I.
2016-04-01
The CLARITY method renders brains optically transparent to enable high-resolution imaging in the structurally intact brain. Anatomically annotating CLARITY brains is necessary for discovering which regions contain signals of interest. Manually annotating whole-brain, terabyte CLARITY images is difficult, time-consuming, subjective, and error-prone. Automatically registering CLARITY images to a pre-annotated brain atlas offers a solution, but is difficult for several reasons. Removal of the brain from the skull and subsequent storage and processing cause variable non-rigid deformations, thus compounding inter-subject anatomical variability. Additionally, the signal in CLARITY images arises from various biochemical contrast agents which only sparsely label brain structures. This sparse labeling challenges the most commonly used registration algorithms that need to match image histogram statistics to the more densely labeled histological brain atlases. The standard method is a multiscale Mutual Information B-spline algorithm that dynamically generates an average template as an intermediate registration target. We determined that this method performs poorly when registering CLARITY brains to the Allen Institute's Mouse Reference Atlas (ARA), because the image histogram statistics are poorly matched. Therefore, we developed a method (Mask-LDDMM) for registering CLARITY images, that automatically finds the brain boundary and learns the optimal deformation between the brain and atlas masks. Using Mask-LDDMM without an average template provided better results than the standard approach when registering CLARITY brains to the ARA. The LDDMM pipelines developed here provide a fast automated way to anatomically annotate CLARITY images; our code is available as open source software at http://NeuroData.io.
TRECVID: the utility of a content-based video retrieval evaluation
NASA Astrophysics Data System (ADS)
Hauptmann, Alexander G.
2006-01-01
TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
Performance and Architecture Lab Modeling Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-06-19
Analytical application performance models are critical for diagnosing performance-limiting resources, optimizing systems, and designing machines. Creating models, however, is difficult. Furthermore, models are frequently expressed in forms that are hard to distribute and validate. The Performance and Architecture Lab Modeling tool, or Palm, is a modeling tool designed to make application modeling easier. Palm provides a source code modeling annotation language. Not only does the modeling language divide the modeling task into sub problems, it formally links an application's source code with its model. This link is important because a model's purpose is to capture application behavior. Furthermore, this linkmore » makes it possible to define rules for generating models according to source code organization. Palm generates hierarchical models according to well-defined rules. Given an application, a set of annotations, and a representative execution environment, Palm will generate the same model. A generated model is a an executable program whose constituent parts directly correspond to the modeled application. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. A model's hierarchy is defined by static and dynamic source code structure. Because Palm coordinates models and source code, Palm's models are 'first-class' and reproducible. Palm automates common modeling tasks. For instance, Palm incorporates measurements to focus attention, represent constant behavior, and validate models. Palm's workflow is as follows. The workflow's input is source code annotated with Palm modeling annotations. The most important annotation models an instance of a block of code. Given annotated source code, the Palm Compiler produces executables and the Palm Monitor collects a representative performance profile. The Palm Generator synthesizes a model based on the static and dynamic mapping of annotations to program behavior. The model -- an executable program -- is a hierarchical composition of annotation functions, synthesized functions, statistics for runtime values, and performance measurements.« less
Beyond Linear Syntax: An Image-Oriented Communication Aid
ERIC Educational Resources Information Center
Patel, Rupal; Pilato, Sam; Roy, Deb
2004-01-01
This article presents a novel AAC communication aid based on semantic rather than syntactic schema, leading to more natural message construction. Users interact with a two-dimensional spatially organized image schema, which depicts the semantic structure and contents of the message. An overview of the interface design is presented followed by…
Automatic Semantic Facilitation in Anterior Temporal Cortex Revealed through Multimodal Neuroimaging
Gramfort, Alexandre; Hämäläinen, Matti S.; Kuperberg, Gina R.
2013-01-01
A core property of human semantic processing is the rapid, facilitatory influence of prior input on extracting the meaning of what comes next, even under conditions of minimal awareness. Previous work has shown a number of neurophysiological indices of this facilitation, but the mapping between time course and localization—critical for separating automatic semantic facilitation from other mechanisms—has thus far been unclear. In the current study, we used a multimodal imaging approach to isolate early, bottom-up effects of context on semantic memory, acquiring a combination of electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) measurements in the same individuals with a masked semantic priming paradigm. Across techniques, the results provide a strikingly convergent picture of early automatic semantic facilitation. Event-related potentials demonstrated early sensitivity to semantic association between 300 and 500 ms; MEG localized the differential neural response within this time window to the left anterior temporal cortex, and fMRI localized the effect more precisely to the left anterior superior temporal gyrus, a region previously implicated in semantic associative processing. However, fMRI diverged from early EEG/MEG measures in revealing semantic enhancement effects within frontal and parietal regions, perhaps reflecting downstream attempts to consciously access the semantic features of the masked prime. Together, these results provide strong evidence that automatic associative semantic facilitation is realized as reduced activity within the left anterior superior temporal cortex between 300 and 500 ms after a word is presented, and emphasize the importance of multimodal neuroimaging approaches in distinguishing the contributions of multiple regions to semantic processing. PMID:24155321
Coggan, David D; Baker, Daniel H; Andrews, Timothy J
2016-01-01
Brain-imaging studies have found distinct spatial and temporal patterns of response to different object categories across the brain. However, the extent to which these categorical patterns of response reflect higher-level semantic or lower-level visual properties of the stimulus remains unclear. To address this question, we measured patterns of EEG response to intact and scrambled images in the human brain. Our rationale for using scrambled images is that they have many of the visual properties found in intact images, but do not convey any semantic information. Images from different object categories (bottle, face, house) were briefly presented (400 ms) in an event-related design. A multivariate pattern analysis revealed categorical patterns of response to intact images emerged ∼80-100 ms after stimulus onset and were still evident when the stimulus was no longer present (∼800 ms). Next, we measured the patterns of response to scrambled images. Categorical patterns of response to scrambled images also emerged ∼80-100 ms after stimulus onset. However, in contrast to the intact images, distinct patterns of response to scrambled images were mostly evident while the stimulus was present (∼400 ms). Moreover, scrambled images were able to account only for all the variance in the intact images at early stages of processing. This direct manipulation of visual and semantic content provides new insights into the temporal dynamics of object perception and the extent to which different stages of processing are dependent on lower-level or higher-level properties of the image.
Enrichment and Ranking of the YouTube Tag Space and Integration with the Linked Data Cloud
NASA Astrophysics Data System (ADS)
Choudhury, Smitashree; Breslin, John G.; Passant, Alexandre
The increase of personal digital cameras with video functionality and video-enabled camera phones has increased the amount of user-generated videos on the Web. People are spending more and more time viewing online videos as a major source of entertainment and "infotainment". Social websites allow users to assign shared free-form tags to user-generated multimedia resources, thus generating annotations for objects with a minimum amount of effort. Tagging allows communities to organise their multimedia items into browseable sets, but these tags may be poorly chosen and related tags may be omitted. Current techniques to retrieve, integrate and present this media to users are deficient and could do with improvement. In this paper, we describe a framework for semantic enrichment, ranking and integration of web video tags using Semantic Web technologies. Semantic enrichment of folksonomies can bridge the gap between the uncontrolled and flat structures typically found in user-generated content and structures provided by the Semantic Web. The enhancement of tag spaces with semantics has been accomplished through two major tasks: (1) a tag space expansion and ranking step; and (2) through concept matching and integration with the Linked Data cloud. We have explored social, temporal and spatial contexts to enrich and extend the existing tag space. The resulting semantic tag space is modelled via a local graph based on co-occurrence distances for ranking. A ranked tag list is mapped and integrated with the Linked Data cloud through the DBpedia resource repository. Multi-dimensional context filtering for tag expansion means that tag ranking is much easier and it provides less ambiguous tag to concept matching.
Semantic e-Science: From Microformats to Models
NASA Astrophysics Data System (ADS)
Lumb, L. I.; Freemantle, J. R.; Aldridge, K. D.
2009-05-01
A platform has been developed to transform semi-structured ASCII data into a representation based on the eXtensible Markup Language (XML). A subsequent transformation allows the XML-based representation to be rendered in the Resource Description Format (RDF). Editorial metadata, expressed as external annotations (via XML Pointer Language), also survives this transformation process (e.g., Lumb et al., http://dx.doi.org/10.1016/j.cageo.2008.03.009). Because the XML-to-RDF transformation uses XSLT (eXtensible Stylesheet Language Transformations), semantic microformats ultimately encode the scientific data (Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26). In building the relationship-centric representation in RDF, a Semantic Model of the scientific data is extracted. The systematic enhancement in the expressivity and richness of the scientific data results in representations of knowledge that are readily understood and manipulated by intelligent software agents. Thus scientists are able to draw upon various resources within and beyond their discipline to use in their scientific applications. Since the resulting Semantic Models are independent conceptualizations of the science itself, the representation of scientific knowledge and interaction with the same can stimulate insight from different perspectives. Using the Global Geodynamics Project (GGP) for the purpose of illustration, the introduction of GGP microformats enable a Semantic Model for the GGP that can be semantically queried (e.g., via SPARQL, http://www.w3.org/TR/rdf-sparql-query). Although the present implementation uses the Open Source Redland RDF Libraries (http://librdf.org/), the approach is generalizable to other platforms and to projects other than the GGP (e.g., Baker et al., Informatics and the 2007-2008 Electronic Geophysical Year, Eos Trans. Am. Geophys. Un., 89(48), 485-486, 2008).
Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network
Gießler, Paul; Ohnesorge-Radtke, Ursula; Spreckelsen, Cord
2015-01-01
Background The Semantically Annotated Media (SAM) project aims to provide a flexible platform for searching, browsing, and indexing medical learning objects (MLOs) based on a semantic network derived from established classification systems. Primarily, SAM supports the Aachen emedia skills lab, but SAM is ready for indexing distributed content and the Simple Knowledge Organizing System standard provides a means for easily upgrading or even exchanging SAM’s semantic network. There is a lack of research addressing the usability of MLO indexes or search portals like SAM and the user behavior with such platforms. Objective The purpose of this study was to assess the usability of SAM by investigating characteristic user behavior of medical students accessing MLOs via SAM. Methods In this study, we chose a mixed-methods approach. Lean usability testing was combined with usability inspection by having the participants complete four typical usage scenarios before filling out a questionnaire. The questionnaire was based on the IsoMetrics usability inventory. Direct user interaction with SAM (mouse clicks and pages accessed) was logged. Results The study analyzed the typical usage patterns and habits of students using a semantic network for accessing MLOs. Four scenarios capturing characteristics of typical tasks to be solved by using SAM yielded high ratings of usability items and showed good results concerning the consistency of indexing by different users. Long-tail phenomena emerge as they are typical for a collaborative Web 2.0 platform. Suitable but nonetheless rarely used keywords were assigned to MLOs by some users. Conclusions It is possible to develop a Web-based tool with high usability and acceptance for indexing and retrieval of MLOs. SAM can be applied to indexing multicentered repositories of MLOs collaboratively. PMID:27731860
Yoo, Min-Jung; Grozel, Clément; Kiritsis, Dimitris
2016-07-08
This paper describes our conceptual framework of closed-loop lifecycle information sharing for product-service in the Internet of Things (IoT). The framework is based on the ontology model of product-service and a type of IoT message standard, Open Messaging Interface (O-MI) and Open Data Format (O-DF), which ensures data communication. (1) BACKGROUND: Based on an existing product lifecycle management (PLM) methodology, we enhanced the ontology model for the purpose of integrating efficiently the product-service ontology model that was newly developed; (2) METHODS: The IoT message transfer layer is vertically integrated into a semantic knowledge framework inside which a Semantic Info-Node Agent (SINA) uses the message format as a common protocol of product-service lifecycle data transfer; (3) RESULTS: The product-service ontology model facilitates information retrieval and knowledge extraction during the product lifecycle, while making more information available for the sake of service business creation. The vertical integration of IoT message transfer, encompassing all semantic layers, helps achieve a more flexible and modular approach to knowledge sharing in an IoT environment; (4) Contribution: A semantic data annotation applied to IoT can contribute to enhancing collected data types, which entails a richer knowledge extraction. The ontology-based PLM model enables as well the horizontal integration of heterogeneous PLM data while breaking traditional vertical information silos; (5) CONCLUSION: The framework was applied to a fictive case study with an electric car service for the purpose of demonstration. For the purpose of demonstrating the feasibility of the approach, the semantic model is implemented in Sesame APIs, which play the role of an Internet-connected Resource Description Framework (RDF) database.
Yoo, Min-Jung; Grozel, Clément; Kiritsis, Dimitris
2016-01-01
This paper describes our conceptual framework of closed-loop lifecycle information sharing for product-service in the Internet of Things (IoT). The framework is based on the ontology model of product-service and a type of IoT message standard, Open Messaging Interface (O-MI) and Open Data Format (O-DF), which ensures data communication. (1) Background: Based on an existing product lifecycle management (PLM) methodology, we enhanced the ontology model for the purpose of integrating efficiently the product-service ontology model that was newly developed; (2) Methods: The IoT message transfer layer is vertically integrated into a semantic knowledge framework inside which a Semantic Info-Node Agent (SINA) uses the message format as a common protocol of product-service lifecycle data transfer; (3) Results: The product-service ontology model facilitates information retrieval and knowledge extraction during the product lifecycle, while making more information available for the sake of service business creation. The vertical integration of IoT message transfer, encompassing all semantic layers, helps achieve a more flexible and modular approach to knowledge sharing in an IoT environment; (4) Contribution: A semantic data annotation applied to IoT can contribute to enhancing collected data types, which entails a richer knowledge extraction. The ontology-based PLM model enables as well the horizontal integration of heterogeneous PLM data while breaking traditional vertical information silos; (5) Conclusion: The framework was applied to a fictive case study with an electric car service for the purpose of demonstration. For the purpose of demonstrating the feasibility of the approach, the semantic model is implemented in Sesame APIs, which play the role of an Internet-connected Resource Description Framework (RDF) database. PMID:27399717
Semantic Indexing of Medical Learning Objects: Medical Students' Usage of a Semantic Network.
Tix, Nadine; Gießler, Paul; Ohnesorge-Radtke, Ursula; Spreckelsen, Cord
2015-11-11
The Semantically Annotated Media (SAM) project aims to provide a flexible platform for searching, browsing, and indexing medical learning objects (MLOs) based on a semantic network derived from established classification systems. Primarily, SAM supports the Aachen emedia skills lab, but SAM is ready for indexing distributed content and the Simple Knowledge Organizing System standard provides a means for easily upgrading or even exchanging SAM's semantic network. There is a lack of research addressing the usability of MLO indexes or search portals like SAM and the user behavior with such platforms. The purpose of this study was to assess the usability of SAM by investigating characteristic user behavior of medical students accessing MLOs via SAM. In this study, we chose a mixed-methods approach. Lean usability testing was combined with usability inspection by having the participants complete four typical usage scenarios before filling out a questionnaire. The questionnaire was based on the IsoMetrics usability inventory. Direct user interaction with SAM (mouse clicks and pages accessed) was logged. The study analyzed the typical usage patterns and habits of students using a semantic network for accessing MLOs. Four scenarios capturing characteristics of typical tasks to be solved by using SAM yielded high ratings of usability items and showed good results concerning the consistency of indexing by different users. Long-tail phenomena emerge as they are typical for a collaborative Web 2.0 platform. Suitable but nonetheless rarely used keywords were assigned to MLOs by some users. It is possible to develop a Web-based tool with high usability and acceptance for indexing and retrieval of MLOs. SAM can be applied to indexing multicentered repositories of MLOs collaboratively.
NASA Astrophysics Data System (ADS)
Duerr, R.; Thessen, A.; Jenkins, C. J.; Palmer, M.; Myers, S.; Ramdeen, S.
2016-12-01
The ability to quickly find, easily use and effortlessly integrate data from a variety of sources is a grand challenge in Earth sciences, one around which entire research programs have been built. A myriad of approaches to tackling components of this challenge have been demonstrated, often with some success. Yet finding, assessing, accessing, using and integrating data remains a major challenge for many researchers. A technology that has shown promise in nearly every aspect of the challenge is semantics. Semantics has been shown to improve data discovery, facilitate assessment of a data set, and through adoption of the W3C's Linked Data Platform to have improved data integration and use at least for data amenable to that paradigm. Yet the creation of semantic resources has been slow. Why? Amongst a plethora of other reasons, it is because semantic expertise is rare in the Earth and Space sciences; the creation of semantic resources for even a single discipline is labor intensive and requires agreement within the discipline; best practices, methods and tools for supporting the creation and maintenance of the resources generated are in flux; and the human and financial capital needed are rarely available in the Earth sciences. However, other fields, such as biomedicine, have made considerable progress in these areas. The NSF-funded ClearEarth project is adapting the methods and tools from these communities for the Earth sciences in the expectation that doing so will enhance progress and the rate at which the needed semantic resources are created. We discuss progress and results to date, lessons learned from this adaptation process, and describe our upcoming efforts to extend this knowledge to the next generation of Earth and data scientists.
Supervised Semantic Classification for Nuclear Proliferation Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Cheriyadat, Anil M; Gleason, Shaun Scott
2010-01-01
Existing feature extraction and classification approaches are not suitable for monitoring proliferation activity using high-resolution multi-temporal remote sensing imagery. In this paper we present a supervised semantic labeling framework based on the Latent Dirichlet Allocation method. This framework is used to analyze over 120 images collected under different spatial and temporal settings over the globe representing three major semantic categories: airports, nuclear, and coal power plants. Initial experimental results show a reasonable discrimination of these three categories even though coal and nuclear images share highly common and overlapping objects. This research also identified several research challenges associated with nuclear proliferationmore » monitoring using high resolution remote sensing images.« less
A selective deficit in imageable concepts: a window to the organization of the conceptual system
Gvion, Aviah; Friedmann, Naama
2013-01-01
Nissim, a 64 years old Hebrew-speaking man who sustained an ischemic infarct in the left occipital lobe, exhibited an intriguing pattern. He could hold a deep and fluent conversation about abstract and complex issues, such as the social risks in unemployment, but failed to retrieve imageable words such as ball, spoon, carrot, or giraffe. A detailed study of the words he could and could not retrieve, in tasks of picture naming, tactile naming, and naming to definition, indicated that whereas he was able to retrieve abstract words, he had severe difficulties when trying to retrieve imageable words. The same dissociation also applied for proper names—he could retrieve names of people who have no visual image attached to their representation (such as the son of the biblical Abraham), but could not name people who had a visual image (such as his own son, or Barack Obama). When he tried to produce imageable words, he mainly produced perseverations and empty speech, and some semantic paraphasias. He did not produce perseverations when he tried to retrieve abstract words. This suggests that perseverations may occur when the phonological production system produces a word without proper activation in the semantic lexicon. Nissim evinced a similar dissociation in comprehension—he could understand abstract words and sentences but failed to understand sentences with imageable words, and to match spoken imageable words to pictures or to semantically related imageable words. He was able to understand proverbs with imageable literal meaning but abstract figurative meaning. His comprehension was impaired also in tasks of semantic associations of pictures, pointing to a conceptual, rather than lexical source of the deficit. His visual perception as well as his phonological input and output lexicons and buffers (assessed by auditory lexical decision, word and sentence repetition, and writing to dictation) were intact, supporting a selective conceptual system impairment. He was able to retrieve gestures for objects and pictures he saw, indicating that his access to concepts often sufficed for the activation of the motoric information but did not suffice for access to the entry in the semantic lexicon. These results show that imageable concepts can be selectively impaired, and shed light on the organization of conceptual-semantic system. PMID:23785321
A selective deficit in imageable concepts: a window to the organization of the conceptual system.
Gvion, Aviah; Friedmann, Naama
2013-01-01
Nissim, a 64 years old Hebrew-speaking man who sustained an ischemic infarct in the left occipital lobe, exhibited an intriguing pattern. He could hold a deep and fluent conversation about abstract and complex issues, such as the social risks in unemployment, but failed to retrieve imageable words such as ball, spoon, carrot, or giraffe. A detailed study of the words he could and could not retrieve, in tasks of picture naming, tactile naming, and naming to definition, indicated that whereas he was able to retrieve abstract words, he had severe difficulties when trying to retrieve imageable words. The same dissociation also applied for proper names-he could retrieve names of people who have no visual image attached to their representation (such as the son of the biblical Abraham), but could not name people who had a visual image (such as his own son, or Barack Obama). When he tried to produce imageable words, he mainly produced perseverations and empty speech, and some semantic paraphasias. He did not produce perseverations when he tried to retrieve abstract words. This suggests that perseverations may occur when the phonological production system produces a word without proper activation in the semantic lexicon. Nissim evinced a similar dissociation in comprehension-he could understand abstract words and sentences but failed to understand sentences with imageable words, and to match spoken imageable words to pictures or to semantically related imageable words. He was able to understand proverbs with imageable literal meaning but abstract figurative meaning. His comprehension was impaired also in tasks of semantic associations of pictures, pointing to a conceptual, rather than lexical source of the deficit. His visual perception as well as his phonological input and output lexicons and buffers (assessed by auditory lexical decision, word and sentence repetition, and writing to dictation) were intact, supporting a selective conceptual system impairment. He was able to retrieve gestures for objects and pictures he saw, indicating that his access to concepts often sufficed for the activation of the motoric information but did not suffice for access to the entry in the semantic lexicon. These results show that imageable concepts can be selectively impaired, and shed light on the organization of conceptual-semantic system.
Semantic Image Segmentation with Contextual Hierarchical Models.
Seyedhosseini, Mojtaba; Tasdizen, Tolga
2016-05-01
Semantic segmentation is the problem of assigning an object label to each pixel. It unifies the image segmentation and object recognition problems. The importance of using contextual information in semantic segmentation frameworks has been widely realized in the field. We propose a contextual framework, called contextual hierarchical model (CHM), which learns contextual information in a hierarchical framework for semantic segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. Contextual hierarchical model is purely based on the input image patches and does not make use of any fragments or shape examples. Hence, it is applicable to a variety of problems such as object segmentation and edge detection. We demonstrate that CHM performs at par with state-of-the-art on Stanford background and Weizmann horse datasets. It also outperforms state-of-the-art edge detection methods on NYU depth dataset and achieves state-of-the-art on Berkeley segmentation dataset (BSDS 500).
Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A
2011-08-15
Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
Semantic Document Library: A Virtual Research Environment for Documents, Data and Workflows Sharing
NASA Astrophysics Data System (ADS)
Kotwani, K.; Liu, Y.; Myers, J.; Futrelle, J.
2008-12-01
The Semantic Document Library (SDL) was driven by use cases from the environmental observatory communities and is designed to provide conventional document repository features of uploading, downloading, editing and versioning of documents as well as value adding features of tagging, querying, sharing, annotating, ranking, provenance, social networking and geo-spatial mapping services. It allows users to organize a catalogue of watershed observation data, model output, workflows, as well publications and documents related to the same watershed study through the tagging capability. Users can tag all relevant materials using the same watershed name and find all of them easily later using this tag. The underpinning semantic content repository can store materials from other cyberenvironments such as workflow or simulation tools and SDL provides an effective interface to query and organize materials from various sources. Advanced features of the SDL allow users to visualize the provenance of the materials such as the source and how the output data is derived. Other novel features include visualizing all geo-referenced materials on a geospatial map. SDL as a component of a cyberenvironment portal (the NCSA Cybercollaboratory) has goal of efficient management of information and relationships between published artifacts (Validated models, vetted data, workflows, annotations, best practices, reviews and papers) produced from raw research artifacts (data, notes, plans etc.) through agents (people, sensors etc.). Tremendous scientific potential of artifacts is achieved through mechanisms of sharing, reuse and collaboration - empowering scientists to spread their knowledge and protocols and to benefit from the knowledge of others. SDL successfully implements web 2.0 technologies and design patterns along with semantic content management approach that enables use of multiple ontologies and dynamic evolution (e.g. folksonomies) of terminology. Scientific documents involved with many interconnected entities (artifacts or agents) are represented as RDF triples using semantic content repository middleware Tupelo in one or many data/metadata RDF stores. Queries to the RDF enables discovery of relations among data, process and people, digging out valuable aspects, making recommendations to users, such as what tools are typically used to answer certain kinds of questions or with certain types of dataset. This innovative concept brings out coherent information about entities from four different perspectives of the social context (Who-human relations and interactions), the casual context (Why - provenance and history), the geo-spatial context (Where - location or spatially referenced information) and the conceptual context (What - domain specific relations, ontologies etc.).
High-fidelity data embedding for image annotation.
He, Shan; Kirovski, Darko; Wu, Min
2009-02-01
High fidelity is a demanding requirement for data hiding, especially for images with artistic or medical value. This correspondence proposes a high-fidelity image watermarking for annotation with robustness to moderate distortion. To achieve the high fidelity of the embedded image, we introduce a visual perception model that aims at quantifying the local tolerance to noise for arbitrary imagery. Based on this model, we embed two kinds of watermarks: a pilot watermark that indicates the existence of the watermark and an information watermark that conveys a payload of several dozen bits. The objective is to embed 32 bits of metadata into a single image in such a way that it is robust to JPEG compression and cropping. We demonstrate the effectiveness of the visual model and the application of the proposed annotation technology using a database of challenging photographic and medical images that contain a large amount of smooth regions.
The Influence of Semantic Property and Grammatical Class on Semantic Selection
ERIC Educational Resources Information Center
Yang, Fan-pei Gloria; Khodaparast, Navid; Bradley, Kailyn; Fang, Min-Chieh; Bernstein, Ari; Krawczyk, Daniel C.
2013-01-01
Research to-date has not successfully demonstrated consistent neural distinctions for different types of ambiguity or explored the effect of grammatical class on semantic selection. We conducted a relatedness judgment task using event-related functional magnetic resonance imaging (fMRI) to further explore these topics. Participants judged…
Semantics of User Interface for Image Retrieval: Possibility Theory and Learning Techniques.
ERIC Educational Resources Information Center
Crehange, M.; And Others
1989-01-01
Discusses the need for a rich semantics for the user interface in interactive image retrieval and presents two methods for building such interfaces: possibility theory applied to fuzzy data retrieval, and a machine learning technique applied to learning the user's deep need. Prototypes developed using videodisks and knowledge-based software are…
2010-05-27
programming language, threads can only communicate through fields and this assertion prohibits an alias to the object under construction from being writ- ten...1.9. We call this type of reporting “compiler-like” in the sense that the descriptive message output by the tool has to communicate the semantics of...way to communicate a “need” for further annotation to the tool user because a precise expression of both the location and content of the needed
Developing a disease outbreak event corpus.
Conway, Mike; Kawazoe, Ai; Chanlekha, Hutchatai; Collier, Nigel
2010-09-28
In recent years, there has been a growth in work on the use of information extraction technologies for tracking disease outbreaks from online news texts, yet publicly available evaluation standards (and associated resources) for this new area of research have been noticeably lacking. This study seeks to create a "gold standard" data set against which to test how accurately disease outbreak information extraction systems can identify the semantics of disease outbreak events. Additionally, we hope that the provision of an annotation scheme (and associated corpus) to the community will encourage open evaluation in this new and growing application area. We developed an annotation scheme for identifying infectious disease outbreak events in news texts. An event--in the context of our annotation scheme--consists minimally of geographical (eg, country and province) and disease name information. However, the scheme also allows for the rich encoding of other domain salient concepts (eg, international travel, species, and food contamination). The work resulted in a 200-document corpus of event-annotated disease outbreak reports that can be used to evaluate the accuracy of event detection algorithms (in this case, for the BioCaster biosurveillance online news information extraction system). In the 200 documents, 394 distinct events were identified (mean 1.97 events per document, range 0-25 events per document). We also provide a download script and graphical user interface (GUI)-based event browsing software to facilitate corpus exploration. In summary, we present an annotation scheme and corpus that can be used in the evaluation of disease outbreak event extraction algorithms. The annotation scheme and corpus were designed both with the particular evaluation requirements of the BioCaster system in mind as well as the wider need for further evaluation resources in this growing research area.
Medical image classification based on multi-scale non-negative sparse coding.
Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar
2017-11-01
With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Structural and functional correlates for language efficiency in auditory word processing.
Jung, JeYoung; Kim, Sunmi; Cho, Hyesuk; Nam, Kichun
2017-01-01
This study aims to provide convergent understanding of the neural basis of auditory word processing efficiency using a multimodal imaging. We investigated the structural and functional correlates of word processing efficiency in healthy individuals. We acquired two structural imaging (T1-weighted imaging and diffusion tensor imaging) and functional magnetic resonance imaging (fMRI) during auditory word processing (phonological and semantic tasks). Our results showed that better phonological performance was predicted by the greater thalamus activity. In contrary, better semantic performance was associated with the less activation in the left posterior middle temporal gyrus (pMTG), supporting the neural efficiency hypothesis that better task performance requires less brain activation. Furthermore, our network analysis revealed the semantic network including the left anterior temporal lobe (ATL), dorsolateral prefrontal cortex (DLPFC) and pMTG was correlated with the semantic efficiency. Especially, this network acted as a neural efficient manner during auditory word processing. Structurally, DLPFC and cingulum contributed to the word processing efficiency. Also, the parietal cortex showed a significate association with the word processing efficiency. Our results demonstrated that two features of word processing efficiency, phonology and semantics, can be supported in different brain regions and, importantly, the way serving it in each region was different according to the feature of word processing. Our findings suggest that word processing efficiency can be achieved by in collaboration of multiple brain regions involved in language and general cognitive function structurally and functionally.
Structural and functional correlates for language efficiency in auditory word processing
Kim, Sunmi; Cho, Hyesuk; Nam, Kichun
2017-01-01
This study aims to provide convergent understanding of the neural basis of auditory word processing efficiency using a multimodal imaging. We investigated the structural and functional correlates of word processing efficiency in healthy individuals. We acquired two structural imaging (T1-weighted imaging and diffusion tensor imaging) and functional magnetic resonance imaging (fMRI) during auditory word processing (phonological and semantic tasks). Our results showed that better phonological performance was predicted by the greater thalamus activity. In contrary, better semantic performance was associated with the less activation in the left posterior middle temporal gyrus (pMTG), supporting the neural efficiency hypothesis that better task performance requires less brain activation. Furthermore, our network analysis revealed the semantic network including the left anterior temporal lobe (ATL), dorsolateral prefrontal cortex (DLPFC) and pMTG was correlated with the semantic efficiency. Especially, this network acted as a neural efficient manner during auditory word processing. Structurally, DLPFC and cingulum contributed to the word processing efficiency. Also, the parietal cortex showed a significate association with the word processing efficiency. Our results demonstrated that two features of word processing efficiency, phonology and semantics, can be supported in different brain regions and, importantly, the way serving it in each region was different according to the feature of word processing. Our findings suggest that word processing efficiency can be achieved by in collaboration of multiple brain regions involved in language and general cognitive function structurally and functionally. PMID:28892503
NASA Astrophysics Data System (ADS)
Guo, H., II
2016-12-01
Spatial distribution information of mountainous area settlement place is of great significance to the earthquake emergency work because most of the key earthquake hazardous areas of china are located in the mountainous area. Remote sensing has the advantages of large coverage and low cost, it is an important way to obtain the spatial distribution information of mountainous area settlement place. At present, fully considering the geometric information, spectral information and texture information, most studies have applied object-oriented methods to extract settlement place information, In this article, semantic constraints is to be added on the basis of object-oriented methods. The experimental data is one scene remote sensing image of domestic high resolution satellite (simply as GF-1), with a resolution of 2 meters. The main processing consists of 3 steps, the first is pretreatment, including ortho rectification and image fusion, the second is Object oriented information extraction, including Image segmentation and information extraction, the last step is removing the error elements under semantic constraints, in order to formulate these semantic constraints, the distribution characteristics of mountainous area settlement place must be analyzed and the spatial logic relation between settlement place and other objects must be considered. The extraction accuracy calculation result shows that the extraction accuracy of object oriented method is 49% and rise up to 86% after the use of semantic constraints. As can be seen from the extraction accuracy, the extract method under semantic constraints can effectively improve the accuracy of mountainous area settlement place information extraction. The result shows that it is feasible to extract mountainous area settlement place information form GF-1 image, so the article proves that it has a certain practicality to use domestic high resolution optical remote sensing image in earthquake emergency preparedness.
Assaf, Michal; Rivkin, Paul R; Kuzu, Cheedem H; Calhoun, Vince D; Kraut, Michael A; Groth, Karyn M; Yassa, Michael A; Hart, John; Pearlson, Godfrey D
2006-03-01
The neural basis of formal thought disorder (FTD) is unknown. An influential theory is that FTD results from impaired semantic memory processing. We explored the neural correlates of semantic memory retrieval in schizophrenia using an imaging task assessing semantic object recall. Sixteen healthy control subjects and sixteen schizophrenia patients whose FTD symptoms were measured with the Thought Disorder Index completed a verbal object-recall task during functional magnetic resonance imaging. Participants viewed two words describing object features that either evoked (object recall) or did not evoke a semantic concept. Schizophrenia patients tended to overrecall objects for feature pairs that did not describe the same object. Functionally, rostral anterior cingulate cortex (ACC) activation in patients positively correlated with FTD severity during both correct recalled and overrecalled trials. Compared with control subjects, during object recalling, patients overactivated bilateral ACC, temporooccipital junctions, temporal poles and parahippocampi, right inferior frontal gyrus, and dorsolateral prefrontal cortex, but underactivated inferior parietal lobules. Our results support impaired semantic memory retrieval as underlying FTD pathophysiology. Schizophrenia patients showed abnormal activations of brain areas involved in semantic memory, verbal working memory, and initiation and suppression of conflicting responses, which were associated with semantic overrecall and FTD.
Automatic image enhancement based on multi-scale image decomposition
NASA Astrophysics Data System (ADS)
Feng, Lu; Wu, Zhuangzhi; Pei, Luo; Long, Xiong
2014-01-01
In image processing and computational photography, automatic image enhancement is one of the long-range objectives. Recently the automatic image enhancement methods not only take account of the globe semantics, like correct color hue and brightness imbalances, but also the local content of the image, such as human face and sky of landscape. In this paper we describe a new scheme for automatic image enhancement that considers both global semantics and local content of image. Our automatic image enhancement method employs the multi-scale edge-aware image decomposition approach to detect the underexposure regions and enhance the detail of the salient content. The experiment results demonstrate the effectiveness of our approach compared to existing automatic enhancement methods.
Multiple Semantic Matching on Augmented N-partite Graph for Object Co-segmentation.
Wang, Chuan; Zhang, Hua; Yang, Liang; Cao, Xiaochun; Xiong, Hongkai
2017-09-08
Recent methods for object co-segmentation focus on discovering single co-occurring relation of candidate regions representing the foreground of multiple images. However, region extraction based only on low and middle level information often occupies a large area of background without the help of semantic context. In addition, seeking single matching solution very likely leads to discover local parts of common objects. To cope with these deficiencies, we present a new object cosegmentation framework, which takes advantages of semantic information and globally explores multiple co-occurring matching cliques based on an N-partite graph structure. To this end, we first propose to incorporate candidate generation with semantic context. Based on the regions extracted from semantic segmentation of each image, we design a merging mechanism to hierarchically generate candidates with high semantic responses. Secondly, all candidates are taken into consideration to globally formulate multiple maximum weighted matching cliques, which complements the discovery of part of the common objects induced by a single clique. To facilitate the discovery of multiple matching cliques, an N-partite graph, which inherently excludes intralinks between candidates from the same image, is constructed to separate multiple cliques without additional constraints. Further, we augment the graph with an additional virtual node in each part to handle irrelevant matches when the similarity between two candidates is too small. Finally, with the explored multiple cliques, we statistically compute pixel-wise co-occurrence map for each image. Experimental results on two benchmark datasets, i.e., iCoseg and MSRC datasets, achieve desirable performance and demonstrate the effectiveness of our proposed framework.
Cross-Domain Shoe Retrieval with a Semantic Hierarchy of Attribute Classification Network.
Zhan, Huijing; Shi, Boxin; Kot, Alex C
2017-08-04
Cross-domain shoe image retrieval is a challenging problem, because the query photo from the street domain (daily life scenario) and the reference photo in the online domain (online shop images) have significant visual differences due to the viewpoint and scale variation, self-occlusion, and cluttered background. This paper proposes the Semantic Hierarchy Of attributE Convolutional Neural Network (SHOE-CNN) with a three-level feature representation for discriminative shoe feature expression and efficient retrieval. The SHOE-CNN with its newly designed loss function systematically merges semantic attributes of closer visual appearances to prevent shoe images with the obvious visual differences being confused with each other; the features extracted from image, region, and part levels effectively match the shoe images across different domains. We collect a large-scale shoe dataset composed of 14341 street domain and 12652 corresponding online domain images with fine-grained attributes to train our network and evaluate our system. The top-20 retrieval accuracy improves significantly over the solution with the pre-trained CNN features.
Tenenbaum, Jessica D.; Whetzel, Patricia L.; Anderson, Kent; Borromeo, Charles D.; Dinov, Ivo D.; Gabriel, Davera; Kirschner, Beth; Mirel, Barbara; Morris, Tim; Noy, Natasha; Nyulas, Csongor; Rubenson, David; Saxman, Paul R.; Singh, Harpreet; Whelan, Nancy; Wright, Zach; Athey, Brian D.; Becich, Michael J.; Ginsburg, Geoffrey S.; Musen, Mark A.; Smith, Kevin A.; Tarantal, Alice F.; Rubin, Daniel L; Lyster, Peter
2010-01-01
The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health. PMID:20955817
Applied and implied semantics in crystallographic publishing
2012-01-01
Background Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles. Results Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value. Conclusions The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines. PMID:22932420
GeoSciGraph: An Ontological Framework for EarthCube Semantic Infrastructure
NASA Astrophysics Data System (ADS)
Gupta, A.; Schachne, A.; Condit, C.; Valentine, D.; Richard, S.; Zaslavsky, I.
2015-12-01
The CINERGI (Community Inventory of EarthCube Resources for Geosciences Interoperability) project compiles an inventory of a wide variety of earth science resources including documents, catalogs, vocabularies, data models, data services, process models, information repositories, domain-specific ontologies etc. developed by research groups and data practitioners. We have developed a multidisciplinary semantic framework called GeoSciGraph semantic ingration of earth science resources. An integrated ontology is constructed with Basic Formal Ontology (BFO) as its upper ontology and currently ingests multiple component ontologies including the SWEET ontology, GeoSciML's lithology ontology, Tematres controlled vocabulary server, GeoNames, GCMD vocabularies on equipment, platforms and institutions, software ontology, CUAHSI hydrology vocabulary, the environmental ontology (ENVO) and several more. These ontologies are connected through bridging axioms; GeoSciGraph identifies lexically close terms and creates equivalence class or subclass relationships between them after human verification. GeoSciGraph allows a community to create community-specific customizations of the integrated ontology. GeoSciGraph uses the Neo4J,a graph database that can hold several billion concepts and relationships. GeoSciGraph provides a number of REST services that can be called by other software modules like the CINERGI information augmentation pipeline. 1) Vocabulary services are used to find exact and approximate terms, term categories (community-provided clusters of terms e.g., measurement-related terms or environmental material related terms), synonyms, term definitions and annotations. 2) Lexical services are used for text parsing to find entities, which can then be included into the ontology by a domain expert. 3) Graph services provide the ability to perform traversal centric operations e.g., finding paths and neighborhoods which can be used to perform ontological operations like computing transitive closure (e.g., finding all subclasses of rocks). 4) Annotation services are used to adorn an arbitrary block of text (e.g., from a NOAA catalog record) with ontology terms. The system has been used to ontologically integrate diverse sources like Science-base, NOAA records, PETDB.
Automatic segmentation of MR brain images of preterm infants using supervised classification.
Moeskops, Pim; Benders, Manon J N L; Chiţ, Sabina M; Kersbergen, Karina J; Groenendaal, Floris; de Vries, Linda S; Viergever, Max A; Išgum, Ivana
2015-09-01
Preterm birth is often associated with impaired brain development. The state and expected progression of preterm brain development can be evaluated using quantitative assessment of MR images. Such measurements require accurate segmentation of different tissue types in those images. This paper presents an algorithm for the automatic segmentation of unmyelinated white matter (WM), cortical grey matter (GM), and cerebrospinal fluid in the extracerebral space (CSF). The algorithm uses supervised voxel classification in three subsequent stages. In the first stage, voxels that can easily be assigned to one of the three tissue types are labelled. In the second stage, dedicated analysis of the remaining voxels is performed. The first and the second stages both use two-class classification for each tissue type separately. Possible inconsistencies that could result from these tissue-specific segmentation stages are resolved in the third stage, which performs multi-class classification. A set of T1- and T2-weighted images was analysed, but the optimised system performs automatic segmentation using a T2-weighted image only. We have investigated the performance of the algorithm when using training data randomly selected from completely annotated images as well as when using training data from only partially annotated images. The method was evaluated on images of preterm infants acquired at 30 and 40weeks postmenstrual age (PMA). When the method was trained using random selection from the completely annotated images, the average Dice coefficients were 0.95 for WM, 0.81 for GM, and 0.89 for CSF on an independent set of images acquired at 30weeks PMA. When the method was trained using only the partially annotated images, the average Dice coefficients were 0.95 for WM, 0.78 for GM and 0.87 for CSF for the images acquired at 30weeks PMA, and 0.92 for WM, 0.80 for GM and 0.85 for CSF for the images acquired at 40weeks PMA. Even though the segmentations obtained using training data from the partially annotated images resulted in slightly lower Dice coefficients, the performance in all experiments was close to that of a second human expert (0.93 for WM, 0.79 for GM and 0.86 for CSF for the images acquired at 30weeks, and 0.94 for WM, 0.76 for GM and 0.87 for CSF for the images acquired at 40weeks). These results show that the presented method is robust to age and acquisition protocol and that it performs accurate segmentation of WM, GM, and CSF when the training data is extracted from complete annotations as well as when the training data is extracted from partial annotations only. This extends the applicability of the method by reducing the time and effort necessary to create training data in a population with different characteristics. Copyright © 2015 Elsevier Inc. All rights reserved.
Cognitive search model and a new query paradigm
NASA Astrophysics Data System (ADS)
Xu, Zhonghui
2001-06-01
This paper proposes a cognitive model in which people begin to search pictures by using semantic content and find a right picture by judging whether its visual content is a proper visualization of the semantics desired. It is essential that human search is not just a process of matching computation on visual feature but rather a process of visualization of the semantic content known. For people to search electronic images in the way as they manually do in the model, we suggest that querying be a semantic-driven process like design. A query-by-design paradigm is prosed in the sense that what you design is what you find. Unlike query-by-example, query-by-design allows users to specify the semantic content through an iterative and incremental interaction process so that a retrieval can start with association and identification of the given semantic content and get refined while further visual cues are available. An experimental image retrieval system, Kuafu, has been under development using the query-by-design paradigm and an iconic language is adopted.
2013-01-01
Background Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale. Results We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters. Conclusions M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder PMID:24565382
Feature engineering for MEDLINE citation categorization with MeSH.
Jimeno Yepes, Antonio Jose; Plaza, Laura; Carrillo-de-Albornoz, Jorge; Mork, James G; Aronson, Alan R
2015-04-08
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
ERIC Educational Resources Information Center
Harris, Gordon J.; Chabris, Christopher F.; Clark, Jill; Urban, Trinity; Aharon, Itzhak; Steele, Shelley; McGrath, Lauren; Condouris, Karen; Tager-Flusberg, Helen
2006-01-01
Language and communication deficits are core features of autism spectrum disorders (ASD), even in high-functioning adults with ASD. This study investigated brain activation patterns using functional magnetic resonance imaging in right-handed adult males with ASD and a control group, matched on age, handedness, and verbal IQ. Semantic processing in…
Optimizing Search and Ranking in Folksonomy Systems by Exploiting Context Information
NASA Astrophysics Data System (ADS)
Abel, Fabian; Henze, Nicola; Krause, Daniel
Tagging systems enable users to annotate resources with freely chosen keywords. The evolving bunch of tag assignments is called folksonomy and there exist already some approaches that exploit folksonomies to improve resource retrieval. In this paper, we analyze and compare graph-based ranking algorithms: FolkRank and SocialPageRank. We enhance these algorithms by exploiting the context of tags, and evaluate the results on the GroupMe! dataset. In GroupMe!, users can organize and maintain arbitrary Web resources in self-defined groups. When users annotate resources in GroupMe!, this can be interpreted in context of a certain group. The grouping activity itself is easy for users to perform. However, it delivers valuable semantic information about resources and their context. We present GRank that uses the context information to improve and optimize the detection of relevant search results, and compare different strategies for ranking result lists in folksonomy systems.
GOGrapher: A Python library for GO graph representation and analysis.
Muller, Brian; Richards, Adam J; Jin, Bo; Lu, Xinghua
2009-07-07
The Gene Ontology is the most commonly used controlled vocabulary for annotating proteins. The concepts in the ontology are organized as a directed acyclic graph, in which a node corresponds to a biological concept and a directed edge denotes the parent-child semantic relationship between a pair of terms. A large number of protein annotations further create links between proteins and their functional annotations, reflecting the contemporary knowledge about proteins and their functional relationships. This leads to a complex graph consisting of interleaved biological concepts and their associated proteins. What is needed is a simple, open source library that provides tools to not only create and view the Gene Ontology graph, but to analyze and manipulate it as well. Here we describe the development and use of GOGrapher, a Python library that can be used for the creation, analysis, manipulation, and visualization of Gene Ontology related graphs. An object-oriented approach was adopted to organize the hierarchy of the graphs types and associated classes. An Application Programming Interface is provided through which different types of graphs can be pragmatically created, manipulated, and visualized. GOGrapher has been successfully utilized in multiple research projects, e.g., a graph-based multi-label text classifier for protein annotation. The GOGrapher project provides a reusable programming library designed for the manipulation and analysis of Gene Ontology graphs. The library is freely available for the scientific community to use and improve.
Clunie, David; Ulrich, Ethan; Bauer, Christian; Wahle, Andreas; Brown, Bartley; Onken, Michael; Riesmeier, Jörg; Pieper, Steve; Kikinis, Ron; Buatti, John; Beichel, Reinhard R.
2016-01-01
Background. Imaging biomarkers hold tremendous promise for precision medicine clinical applications. Development of such biomarkers relies heavily on image post-processing tools for automated image quantitation. Their deployment in the context of clinical research necessitates interoperability with the clinical systems. Comparison with the established outcomes and evaluation tasks motivate integration of the clinical and imaging data, and the use of standardized approaches to support annotation and sharing of the analysis results and semantics. We developed the methodology and tools to support these tasks in Positron Emission Tomography and Computed Tomography (PET/CT) quantitative imaging (QI) biomarker development applied to head and neck cancer (HNC) treatment response assessment, using the Digital Imaging and Communications in Medicine (DICOM®) international standard and free open-source software. Methods. Quantitative analysis of PET/CT imaging data collected on patients undergoing treatment for HNC was conducted. Processing steps included Standardized Uptake Value (SUV) normalization of the images, segmentation of the tumor using manual and semi-automatic approaches, automatic segmentation of the reference regions, and extraction of the volumetric segmentation-based measurements. Suitable components of the DICOM standard were identified to model the various types of data produced by the analysis. A developer toolkit of conversion routines and an Application Programming Interface (API) were contributed and applied to create a standards-based representation of the data. Results. DICOM Real World Value Mapping, Segmentation and Structured Reporting objects were utilized for standards-compliant representation of the PET/CT QI analysis results and relevant clinical data. A number of correction proposals to the standard were developed. The open-source DICOM toolkit (DCMTK) was improved to simplify the task of DICOM encoding by introducing new API abstractions. Conversion and visualization tools utilizing this toolkit were developed. The encoded objects were validated for consistency and interoperability. The resulting dataset was deposited in the QIN-HEADNECK collection of The Cancer Imaging Archive (TCIA). Supporting tools for data analysis and DICOM conversion were made available as free open-source software. Discussion. We presented a detailed investigation of the development and application of the DICOM model, as well as the supporting open-source tools and toolkits, to accommodate representation of the research data in QI biomarker development. We demonstrated that the DICOM standard can be used to represent the types of data relevant in HNC QI biomarker development, and encode their complex relationships. The resulting annotated objects are amenable to data mining applications, and are interoperable with a variety of systems that support the DICOM standard. PMID:27257542
Image-based diagnostic aid for interstitial lung disease with secondary data integration
NASA Astrophysics Data System (ADS)
Depeursinge, Adrien; Müller, Henning; Hidki, Asmâa; Poletti, Pierre-Alexandre; Platon, Alexandra; Geissbuhler, Antoine
2007-03-01
Interstitial lung diseases (ILDs) are a relatively heterogeneous group of around 150 illnesses with often very unspecific symptoms. The most complete imaging method for the characterisation of ILDs is the high-resolution computed tomography (HRCT) of the chest but a correct interpretation of these images is difficult even for specialists as many diseases are rare and thus little experience exists. Moreover, interpreting HRCT images requires knowledge of the context defined by clinical data of the studied case. A computerised diagnostic aid tool based on HRCT images with associated medical data to retrieve similar cases of ILDs from a dedicated database can bring quick and precious information for example for emergency radiologists. The experience from a pilot project highlighted the need for detailed database containing high-quality annotations in addition to clinical data. The state of the art is studied to identify requirements for image-based diagnostic aid for interstitial lung disease with secondary data integration. The data acquisition steps are detailed. The selection of the most relevant clinical parameters is done in collaboration with lung specialists from current literature, along with knowledge bases of computer-based diagnostic decision support systems. In order to perform high-quality annotations of the interstitial lung tissue in the HRCT images an annotation software and its own file format is implemented for DICOM images. A multimedia database is implemented to store ILD cases with clinical data and annotated image series. Cases from the University & University Hospitals of Geneva (HUG) are retrospectively and prospectively collected to populate the database. Currently, 59 cases with certified diagnosis and their clinical parameters are stored in the database as well as 254 image series of which 26 have their regions of interest annotated. The available data was used to test primary visual features for the classification of lung tissue patterns. These features show good discriminative properties for the separation of five classes of visual observations.
Annotating image ROIs with text descriptions for multimodal biomedical document retrieval
NASA Astrophysics Data System (ADS)
You, Daekeun; Simpson, Matthew; Antani, Sameer; Demner-Fushman, Dina; Thoma, George R.
2013-01-01
Regions of interest (ROIs) that are pointed to by overlaid markers (arrows, asterisks, etc.) in biomedical images are expected to contain more important and relevant information than other regions for biomedical article indexing and retrieval. We have developed several algorithms that localize and extract the ROIs by recognizing markers on images. Cropped ROIs then need to be annotated with contents describing them best. In most cases accurate textual descriptions of the ROIs can be found from figure captions, and these need to be combined with image ROIs for annotation. The annotated ROIs can then be used to, for example, train classifiers that separate ROIs into known categories (medical concepts), or to build visual ontologies, for indexing and retrieval of biomedical articles. We propose an algorithm that pairs visual and textual ROIs that are extracted from images and figure captions, respectively. This algorithm based on dynamic time warping (DTW) clusters recognized pointers into groups, each of which contains pointers with identical visual properties (shape, size, color, etc.). Then a rule-based matching algorithm finds the best matching group for each textual ROI mention. Our method yields a precision and recall of 96% and 79%, respectively, when ground truth textual ROI data is used.
2016-10-18
Scientists from NASA's New Horizons mission have spotted signs of long run-out landslides on Pluto's largest moon, Charon. This image of Charon's informally named "Serenity Chasma" was taken by New Horizons' Long Range Reconnaissance Imager (LORRI) on July 14, 2015, from a distance of 48,912 miles (78,717 kilometers). An annotated image shows arrows in the annotated figure mark indications of landslide activity at http://photojournal.jpl.nasa.gov/catalog/PIA21128
NASA Astrophysics Data System (ADS)
García Castro, Alexander; García-Castro, Leyla Jael; Labarga, Alberto; Giraldo, Olga; Montaña, César; O'Neil, Kieran; Bateman, John A.
Rather than a document that is being constantly re-written as in the wiki approach, the Living Document (LD) is one that acts as a document router, operating by means of structured and organized social tagging and existing ontologies. It offers an environment where users can manage papers and related information, share their knowledge with their peers and discover hidden associations among the shared knowledge. The LD builds upon both the Semantic Web, which values the integration of well-structured data, and the Social Web, which aims to facilitate interaction amongst people by means of user-generated content. In this vein, the LD is similar to a social networking system, with users as central nodes in the network, with the difference that interaction is focused on papers rather than people. Papers, with their ability to represent research interests, expertise, affiliations, and links to web based tools and databanks, represent a central axis for interaction amongst users. To begin to show the potential of this vision, we have implemented a novel web prototype that enables researchers to accomplish three activities central to the Semantic Web vision: organizing, sharing and discovering. Availability: http://www.scientifik.info/
Ontologies as integrative tools for plant science
Walls, Ramona L.; Athreya, Balaji; Cooper, Laurel; Elser, Justin; Gandolfo, Maria A.; Jaiswal, Pankaj; Mungall, Christopher J.; Preece, Justin; Rensing, Stefan; Smith, Barry; Stevenson, Dennis W.
2012-01-01
Premise of the study Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web. Methods This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae). Key results Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. Conclusions Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies. PMID:22847540
Supervised pixel classification using a feature space derived from an artificial visual system
NASA Technical Reports Server (NTRS)
Baxter, Lisa C.; Coggins, James M.
1991-01-01
Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.
A new image representation for compact and secure communication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prasad, Lakshman; Skourikhine, A. N.
In many areas of nuclear materials management there is a need for communication, archival, and retrieval of annotated image data between heterogeneous platforms and devices to effectively implement safety, security, and safeguards of nuclear materials. Current image formats such as JPEG are not ideally suited in such scenarios as they are not scalable to different viewing formats, and do not provide a high-level representation of images that facilitate automatic object/change detection or annotation. The new Scalable Vector Graphics (SVG) open standard for representing graphical information, recommended by the World Wide Web Consortium (W3C) is designed to address issues of imagemore » scalability, portability, and annotation. However, until now there has been no viable technology to efficiently field images of high visual quality under this standard. Recently, LANL has developed a vectorized image representation that is compatible with the SVG standard and preserves visual quality. This is based on a new geometric framework for characterizing complex features in real-world imagery that incorporates perceptual principles of processing visual information known from cognitive psychology and vision science, to obtain a polygonal image representation of high fidelity. This representation can take advantage of all textual compression and encryption routines unavailable to other image formats. Moreover, this vectorized image representation can be exploited to facilitate automated object recognition that can reduce time required for data review. The objects/features of interest in these vectorized images can be annotated via animated graphics to facilitate quick and easy display and comprehension of processed image content.« less
Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing.
Zhao, Yingwen; Fu, Guangyuan; Wang, Jun; Guo, Maozu; Yu, Guoxian
2018-02-23
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash. Copyright © 2018 Elsevier Inc. All rights reserved.
BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas, Matús; Puntervoll, Pål; Joseph, Alexandre; Bartaseviciūte, Edita; Töpfer, Armin; Venkataraman, Prabakar; Pettifer, Steve; Bryne, Jan Christian; Ison, Jon; Blanchet, Christophe; Rapacki, Kristoffer; Jonassen, Inge
2010-09-15
The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community.
The benefits of sensorimotor knowledge: body-object interaction facilitates semantic processing.
Siakaluk, Paul D; Pexman, Penny M; Sears, Christopher R; Wilson, Kim; Locheed, Keri; Owen, William J
2008-04-05
This article examined the effects of body-object interaction (BOI) on semantic processing. BOI measures perceptions of the ease with which a human body can physically interact with a word's referent. In Experiment 1, BOI effects were examined in 2 semantic categorization tasks (SCT) in which participants decided if words are easily imageable. Responses were faster and more accurate for high BOI words (e.g., mask) than for low BOI words (e.g., ship). In Experiment 2, BOI effects were examined in a semantic lexical decision task (SLDT), which taps both semantic feedback and semantic processing. The BOI effect was larger in the SLDT than in the SCT, suggesting that BOI facilitates both semantic feedback and semantic processing. The findings are consistent with the embodied cognition perspective (e.g., Barsalou's, 1999, Perceptual Symbols Theory), which proposes that sensorimotor interactions with the environment are incorporated in semantic knowledge. 2008 Cognitive Science Society, Inc.
Semantic attributes based texture generation
NASA Astrophysics Data System (ADS)
Chi, Huifang; Gan, Yanhai; Qi, Lin; Dong, Junyu; Madessa, Amanuel Hirpa
2018-04-01
Semantic attributes are commonly used for texture description. They can be used to describe the information of a texture, such as patterns, textons, distributions, brightness, and so on. Generally speaking, semantic attributes are more concrete descriptors than perceptual features. Therefore, it is practical to generate texture images from semantic attributes. In this paper, we propose to generate high-quality texture images from semantic attributes. Over the last two decades, several works have been done on texture synthesis and generation. Most of them focusing on example-based texture synthesis and procedural texture generation. Semantic attributes based texture generation still deserves more devotion. Gan et al. proposed a useful joint model for perception driven texture generation. However, perceptual features are nonobjective spatial statistics used by humans to distinguish different textures in pre-attentive situations. To give more describing information about texture appearance, semantic attributes which are more in line with human description habits are desired. In this paper, we use sigmoid cross entropy loss in an auxiliary model to provide enough information for a generator. Consequently, the discriminator is released from the relatively intractable mission of figuring out the joint distribution of condition vectors and samples. To demonstrate the validity of our method, we compare our method to Gan et al.'s method on generating textures by designing experiments on PTD and DTD. All experimental results show that our model can generate textures from semantic attributes.
A journey to Semantic Web query federation in the life sciences.
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-10-01
As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community.
A journey to Semantic Web query federation in the life sciences
Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian
2009-01-01
Background As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. Methods and results We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. Conclusion We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query federation scenario. We have identified both the strengths and weaknesses of these technologies. While Semantic Web offers a global data model including the use of Uniform Resource Identifiers (URI's), the proliferation of semantically-equivalent URI's hinders large scale data integration. Our work helps direct research and tool development, which will be of benefit to this community. PMID:19796394
Automatic detection of regions of interest in mammographic images
NASA Astrophysics Data System (ADS)
Cheng, Erkang; Ling, Haibin; Bakic, Predrag R.; Maidment, Andrew D. A.; Megalooikonomou, Vasileios
2011-03-01
This work is a part of our ongoing study aimed at comparing the topology of anatomical branching structures with the underlying image texture. Detection of regions of interest (ROIs) in clinical breast images serves as the first step in development of an automated system for image analysis and breast cancer diagnosis. In this paper, we have investigated machine learning approaches for the task of identifying ROIs with visible breast ductal trees in a given galactographic image. Specifically, we have developed boosting based framework using the AdaBoost algorithm in combination with Haar wavelet features for the ROI detection. Twenty-eight clinical galactograms with expert annotated ROIs were used for training. Positive samples were generated by resampling near the annotated ROIs, and negative samples were generated randomly by image decomposition. Each detected ROI candidate was given a confidences core. Candidate ROIs with spatial overlap were merged and their confidence scores combined. We have compared three strategies for elimination of false positives. The strategies differed in their approach to combining confidence scores by summation, averaging, or selecting the maximum score.. The strategies were compared based upon the spatial overlap with annotated ROIs. Using a 4-fold cross-validation with the annotated clinical galactographic images, the summation strategy showed the best performance with 75% detection rate. When combining the top two candidates, the selection of maximum score showed the best performance with 96% detection rate.
First Steps to Automated Interior Reconstruction from Semantically Enriched Point Clouds and Imagery
NASA Astrophysics Data System (ADS)
Obrock, L. S.; Gülch, E.
2018-05-01
The automated generation of a BIM-Model from sensor data is a huge challenge for the modeling of existing buildings. Currently the measurements and analyses are time consuming, allow little automation and require expensive equipment. We do lack an automated acquisition of semantical information of objects in a building. We are presenting first results of our approach based on imagery and derived products aiming at a more automated modeling of interior for a BIM building model. We examine the building parts and objects visible in the collected images using Deep Learning Methods based on Convolutional Neural Networks. For localization and classification of building parts we apply the FCN8s-Model for pixel-wise Semantic Segmentation. We, so far, reach a Pixel Accuracy of 77.2 % and a mean Intersection over Union of 44.2 %. We finally use the network for further reasoning on the images of the interior room. We combine the segmented images with the original images and use photogrammetric methods to produce a three-dimensional point cloud. We code the extracted object types as colours of the 3D-points. We thus are able to uniquely classify the points in three-dimensional space. We preliminary investigate a simple extraction method for colour and material of building parts. It is shown, that the combined images are very well suited to further extract more semantic information for the BIM-Model. With the presented methods we see a sound basis for further automation of acquisition and modeling of semantic and geometric information of interior rooms for a BIM-Model.
Coiera, Enrico
2014-01-01
Background and objective Annotations to physical workspaces such as signs and notes are ubiquitous. When densely annotated, work areas become communication spaces. This study aims to characterize the types and purpose of such annotations. Methods A qualitative observational study was undertaken in two wards and the radiology department of a 440-bed metropolitan teaching hospital. Images were purposefully sampled; 39 were analyzed after excluding inferior images. Results Annotation functions included signaling identity, location, capability, status, availability, and operation. They encoded data, rules or procedural descriptions. Most aggregated into groups that either created a workflow by referencing each other, supported a common workflow without reference to each other, or were heterogeneous, referring to many workflows. Higher-level assemblies of such groupings were also observed. Discussion Annotations make visible the gap between work done and the capability of a space to support work. Annotations are repairs of an environment, improving fitness for purpose, fixing inadequacy in design, or meeting emergent needs. Annotations thus record the missing information needed to undertake tasks, typically added post-implemented. Measuring annotation levels post-implementation could help assess the fit of technology to task. Physical and digital spaces could meet broader user needs by formally supporting user customization, ‘programming through annotation’. Augmented reality systems could also directly support annotation, addressing existing information gaps, and enhancing work with context sensitive annotation. Conclusions Communication spaces offer a model of how work unfolds. Annotations make visible local adaptation that makes technology fit for purpose post-implementation and suggest an important role for annotatable information systems and digital augmentation of the physical environment. PMID:24005797
ERIC Educational Resources Information Center
Kuchinke, Lars; van der Meer, Elke; Krueger, Frank
2009-01-01
Conceptual knowledge of our world is represented in semantic memory in terms of concepts and semantic relations between concepts. We used functional magnetic resonance imaging (fMRI) to examine the cortical regions underlying the processing of sequential and taxonomic relations. Participants were presented verbal cues and performed three tasks:…
The Benefits of Sensorimotor Knowledge: Body-Object Interaction Facilitates Semantic Processing
ERIC Educational Resources Information Center
Siakaluk, Paul D.; Pexman, Penny M.; Sears, Christopher R.; Wilson, Kim; Locheed, Keri; Owen, William J.
2008-01-01
This article examined the effects of body-object interaction (BOI) on semantic processing. BOI measures perceptions of the ease with which a human body can physically interact with a word's referent. In Experiment 1, BOI effects were examined in 2 semantic categorization tasks (SCT) in which participants decided if words are easily imageable.…
SCEGRAM: An image database for semantic and syntactic inconsistencies in scenes.
Öhlschläger, Sabine; Võ, Melissa Le-Hoa
2017-10-01
Our visual environment is not random, but follows compositional rules according to what objects are usually found where. Despite the growing interest in how such semantic and syntactic rules - a scene grammar - enable effective attentional guidance and object perception, no common image database containing highly-controlled object-scene modifications has been publically available. Such a database is essential in minimizing the risk that low-level features drive high-level effects of interest, which is being discussed as possible source of controversial study results. To generate the first database of this kind - SCEGRAM - we took photographs of 62 real-world indoor scenes in six consistency conditions that contain semantic and syntactic (both mild and extreme) violations as well as their combinations. Importantly, always two scenes were paired, so that an object was semantically consistent in one scene (e.g., ketchup in kitchen) and inconsistent in the other (e.g., ketchup in bathroom). Low-level salience did not differ between object-scene conditions and was generally moderate. Additionally, SCEGRAM contains consistency ratings for every object-scene condition, as well as object-absent scenes and object-only images. Finally, a cross-validation using eye-movements replicated previous results of longer dwell times for both semantic and syntactic inconsistencies compared to consistent controls. In sum, the SCEGRAM image database is the first to contain well-controlled semantic and syntactic object-scene inconsistencies that can be used in a broad range of cognitive paradigms (e.g., verbal and pictorial priming, change detection, object identification, etc.) including paradigms addressing developmental aspects of scene grammar. SCEGRAM can be retrieved for research purposes from http://www.scenegrammarlab.com/research/scegram-database/ .
Olson, Ingrid R.
2012-01-01
Famous people and artifacts are referred to as “unique entities” (UEs) due to the unique nature of the knowledge we have about them. Past imaging and lesion experiments have indicated that the anterior temporal lobes (ATLs) as having a special role in the processing of UEs. It has remained unclear which attributes of UEs were responsible for the observed effects in imaging experiments. In this study, we investigated what factors of UEs influence brain activity. In a training paradigm, we systematically varied the uniqueness of semantic associations, the presence/absence of a proper name, and the number of semantic associations to determine factors modulating activity in regions subserving the processing of UEs. We found that a conjunction of unique semantic information and proper names modulated activity within a section of the left ATL. Overall, the processing of UEs involved a wider left-hemispheric cortical network. Within these regions, brain activity was significantly affected by the unique semantic attributes especially in the presence of a proper name, but we could not find evidence for an effect of the number of semantic associations. Findings are discussed in regard to current models of ATL function, the neurophysiology of semantics, and social cognitive processing. PMID:22021913
Semantic biomedical resource discovery: a Natural Language Processing framework.
Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis
2015-09-30
A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.
Tagare, Hemant D.; Jaffe, C. Carl; Duncan, James
1997-01-01
Abstract Information contained in medical images differs considerably from that residing in alphanumeric format. The difference can be attributed to four characteristics: (1) the semantics of medical knowledge extractable from images is imprecise; (2) image information contains form and spatial data, which are not expressible in conventional language; (3) a large part of image information is geometric; (4) diagnostic inferences derived from images rest on an incomplete, continuously evolving model of normality. This paper explores the differentiating characteristics of text versus images and their impact on design of a medical image database intended to allow content-based indexing and retrieval. One strategy for implementing medical image databases is presented, which employs object-oriented iconic queries, semantics by association with prototypes, and a generic schema. PMID:9147338
The semantic anatomical network: Evidence from healthy and brain-damaged patient populations.
Fang, Yuxing; Han, Zaizhu; Zhong, Suyu; Gong, Gaolang; Song, Luping; Liu, Fangsong; Huang, Ruiwang; Du, Xiaoxia; Sun, Rong; Wang, Qiang; He, Yong; Bi, Yanchao
2015-09-01
Semantic processing is central to cognition and is supported by widely distributed gray matter (GM) regions and white matter (WM) tracts. The exact manner in which GM regions are anatomically connected to process semantics remains unknown. We mapped the semantic anatomical network (connectome) by conducting diffusion imaging tractography in 48 healthy participants across 90 GM "nodes," and correlating the integrity of each obtained WM edge and semantic performance across 80 brain-damaged patients. Fifty-three WM edges were obtained whose lower integrity associated with semantic deficits and together with their linked GM nodes constitute a semantic WM network. Graph analyses of this network revealed three structurally segregated modules that point to distinct semantic processing components and identified network hubs and connectors that are central in the communication across the subnetworks. Together, our results provide an anatomical framework of human semantic network, advancing the understanding of the structural substrates supporting semantic processing. © 2015 Wiley Periodicals, Inc.
Assessing Strength of Evidence of Appropriate Use Criteria for Diagnostic Imaging Examinations.
Lacson, Ronilda; Raja, Ali S; Osterbur, David; Ip, Ivan; Schneider, Louise; Bain, Paul; Mita, Carol; Whelan, Julia; Silveira, Patricia; Dement, David; Khorasani, Ramin
2016-05-01
For health information technology tools to fully inform evidence-based decisions, recommendations must be reliably assessed for quality and strength of evidence. We aimed to create an annotation framework for grading recommendations regarding appropriate use of diagnostic imaging examinations. The annotation framework was created by an expert panel (clinicians in three medical specialties, medical librarians, and biomedical scientists) who developed a process for achieving consensus in assessing recommendations, and evaluated by measuring agreement in grading the strength of evidence for 120 empirically selected recommendations using the Oxford Levels of Evidence. Eighty-two percent of recommendations were assigned to Level 5 (expert opinion). Inter-annotator agreement was 0.70 on initial grading (κ = 0.35, 95% CI, 0.23-0.48). After systematic discussion utilizing the annotation framework, agreement increased significantly to 0.97 (κ = 0.88, 95% CI, 0.77-0.99). A novel annotation framework was effective for grading the strength of evidence supporting appropriate use criteria for diagnostic imaging exams. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.