interactive semi-automatic annotation: Topics by Science.gov

Sample records for interactive semi-automatic annotation

A semi-automatic annotation tool for cooking video

NASA Astrophysics Data System (ADS)

Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

2013-03-01

In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
Argo: enabling the development of bespoke workflows and services for disease annotation.

PubMed

Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

2016-01-01

Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest.With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources.This article presents the application of Argo's capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V's User Interactive Track (IAT), we demonstrated and evaluated Argo's suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track's top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo's support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo's potential as an enabling technology for curating disease and phenotypic information from literature.Database URL: http://argo.nactem.ac.uk. © The Author(s) 2016. Published by Oxford University Press.
Argo: enabling the development of bespoke workflows and services for disease annotation

PubMed Central

Batista-Navarro, Riza; Carter, Jacob; Ananiadou, Sophia

2016-01-01

Argo (http://argo.nactem.ac.uk) is a generic text mining workbench that can cater to a variety of use cases, including the semi-automatic annotation of literature. It enables its technical users to build their own customised text mining solutions by providing a wide array of interoperable and configurable elementary components that can be seamlessly integrated into processing workflows. With Argo's graphical annotation interface, domain experts can then make use of the workflows' automatically generated output to curate information of interest. With the continuously rising need to understand the aetiology of diseases as well as the demand for their informed diagnosis and personalised treatment, the curation of disease-relevant information from medical and clinical documents has become an indispensable scientific activity. In the Fifth BioCreative Challenge Evaluation Workshop (BioCreative V), there was substantial interest in the mining of literature for disease-relevant information. Apart from a panel discussion focussed on disease annotations, the chemical-disease relations (CDR) track was also organised to foster the sharing and advancement of disease annotation tools and resources. This article presents the application of Argo’s capabilities to the literature-based annotation of diseases. As part of our participation in BioCreative V’s User Interactive Track (IAT), we demonstrated and evaluated Argo’s suitability to the semi-automatic curation of chronic obstructive pulmonary disease (COPD) phenotypes. Furthermore, the workbench facilitated the development of some of the CDR track’s top-performing web services for normalising disease mentions against the Medical Subject Headings (MeSH) database. In this work, we highlight Argo’s support for developing various types of bespoke workflows ranging from ones which enabled us to easily incorporate information from various databases, to those which train and apply machine learning-based concept recognition models, through to user-interactive ones which allow human curators to manually provide their corrections to automatically generated annotations. Our participation in the BioCreative V challenges shows Argo’s potential as an enabling technology for curating disease and phenotypic information from literature. Database URL: http://argo.nactem.ac.uk PMID:27189607
A semi-supervised learning framework for biomedical event extraction based on hidden topics.

PubMed

Zhou, Deyu; Zhong, Dayou

2015-05-01

Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.
Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction

PubMed Central

Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong

2010-01-01

Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696
Utilization of Automatic Tagging Using Web Information to Datamining

NASA Astrophysics Data System (ADS)

Sugimura, Hiroshi; Matsumoto, Kazunori

This paper proposes a data annotation system using the automatic tagging approach. Although annotations of data are useful for deep analysis and mining of it, the cost of providing them becomes huge in most of the cases. In order to solve this problem, we develop a semi-automatic method that consists of two stages. In the first stage, it searches the Web space for relating information, and discovers candidates of effective annotations. The second stage uses knowledge of a human user. The candidates are investigated and refined by the user, and then they become annotations. We in this paper focus on time-series data, and show effectiveness of a GUI tool that supports the above process.
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

PubMed

Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia

2015-01-01

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Semantator: semantic annotator for converting biomedical text to linked data.

PubMed

Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G

2013-10-01

More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.
Semi Automatic Ontology Instantiation in the domain of Risk Management

NASA Astrophysics Data System (ADS)

Makki, Jawad; Alquier, Anne-Marie; Prince, Violaine

One of the challenging tasks in the context of Ontological Engineering is to automatically or semi-automatically support the process of Ontology Learning and Ontology Population from semi-structured documents (texts). In this paper we describe a Semi-Automatic Ontology Instantiation method from natural language text, in the domain of Risk Management. This method is composed from three steps 1 ) Annotation with part-of-speech tags, 2) Semantic Relation Instances Extraction, 3) Ontology instantiation process. It's based on combined NLP techniques using human intervention between steps 2 and 3 for control and validation. Since it heavily relies on linguistic knowledge it is not domain dependent which is a good feature for portability between the different fields of risk management application. The proposed methodology uses the ontology of the PRIMA1 project (supported by the European community) as a Generic Domain Ontology and populates it via an available corpus. A first validation of the approach is done through an experiment with Chemical Fact Sheets from Environmental Protection Agency2.
Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries.

PubMed

Lin, Ching-Heng; Wu, Nai-Yuan; Lai, Wei-Shao; Liou, Der-Ming

2015-01-01

Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.comFor numbered affiliation see end of article.
Current and future trends in marine image annotation software

NASA Astrophysics Data System (ADS)

Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

2016-12-01

Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images. Integration into available MIAS is currently limited to semi-automated processes of pixel recognition through computer-vision modules that compile expert-based knowledge. Important topics aiding the choice of a specific software are outlined, the ideal software is discussed and future trends are presented.
Towards Automated Annotation of Benthic Survey Images: Variability of Human Experts and Operational Modes of Automation

PubMed Central

Beijbom, Oscar; Edmunds, Peter J.; Roelfsema, Chris; Smith, Jennifer; Kline, David I.; Neal, Benjamin P.; Dunlap, Matthew J.; Moriarty, Vincent; Fan, Tung-Yung; Tan, Chih-Jui; Chan, Stephen; Treibitz, Tali; Gamst, Anthony; Mitchell, B. Greg; Kriegman, David

2015-01-01

Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys. PMID:26154157
Plexiform neurofibroma tissue classification

NASA Astrophysics Data System (ADS)

Weizman, L.; Hoch, L.; Ben Sira, L.; Joskowicz, L.; Pratt, L.; Constantini, S.; Ben Bashat, D.

2011-03-01

Plexiform Neurofibroma (PN) is a major complication of NeuroFibromatosis-1 (NF1), a common genetic disease that involving the nervous system. PNs are peripheral nerve sheath tumors extending along the length of the nerve in various parts of the body. Treatment decision is based on tumor volume assessment using MRI, which is currently time consuming and error prone, with limited semi-automatic segmentation support. We present in this paper a new method for the segmentation and tumor mass quantification of PN from STIR MRI scans. The method starts with a user-based delineation of the tumor area in a single slice and automatically detects the PN lesions in the entire image based on the tumor connectivity. Experimental results on seven datasets yield a mean volume overlap difference of 25% as compared to manual segmentation by expert radiologist with a mean computation and interaction time of 12 minutes vs. over an hour for manual annotation. Since the user interaction in the segmentation process is minimal, our method has the potential to successfully become part of the clinical workflow.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.

PubMed

Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

2015-12-01

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
MIPS: analysis and annotation of genome information in 2007

PubMed Central

Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298
MIPS: analysis and annotation of genome information in 2007.

PubMed

Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

2008-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons

PubMed Central

2012-01-01

Background Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow. Results We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort. Conclusions LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The LTRsift software is freely available at http://www.zbh.uni-hamburg.de/LTRsift under an open-source license. PMID:23131050
LTRsift: a graphical user interface for semi-automatic classification and postprocessing of de novo detected LTR retrotransposons.

PubMed

Steinbiss, Sascha; Kastens, Sascha; Kurtz, Stefan

2012-11-07

Long terminal repeat (LTR) retrotransposons are a class of eukaryotic mobile elements characterized by a distinctive sequence similarity-based structure. Hence they are well suited for computational identification. Current software allows for a comprehensive genome-wide de novo detection of such elements. The obvious next step is the classification of newly detected candidates resulting in (super-)families. Such a de novo classification approach based on sequence-based clustering of transposon features has been proposed before, resulting in a preliminary assignment of candidates to families as a basis for subsequent manual refinement. However, such a classification workflow is typically split across a heterogeneous set of glue scripts and generic software (for example, spreadsheets), making it tedious for a human expert to inspect, curate and export the putative families produced by the workflow. We have developed LTRsift, an interactive graphical software tool for semi-automatic postprocessing of de novo predicted LTR retrotransposon annotations. Its user-friendly interface offers customizable filtering and classification functionality, displaying the putative candidate groups, their members and their internal structure in a hierarchical fashion. To ease manual work, it also supports graphical user interface-driven reassignment, splitting and further annotation of candidates. Export of grouped candidate sets in standard formats is possible. In two case studies, we demonstrate how LTRsift can be employed in the context of a genome-wide LTR retrotransposon survey effort. LTRsift is a useful and convenient tool for semi-automated classification of newly detected LTR retrotransposons based on their internal features. Its efficient implementation allows for convenient and seamless filtering and classification in an integrated environment. Developed for life scientists, it is helpful in postprocessing and refining the output of software for predicting LTR retrotransposons up to the stage of preparing full-length reference sequence libraries. The LTRsift software is freely available at http://www.zbh.uni-hamburg.de/LTRsift under an open-source license.
Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases.

PubMed

Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel

2013-04-15

In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.
Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

PubMed Central

2013-01-01

Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394

Semantic annotation of Web data applied to risk in food.

PubMed

Hignette, Gaëlle; Buche, Patrice; Couvert, Olivier; Dibie-Barthélemy, Juliette; Doussot, David; Haemmerlé, Ollivier; Mettler, Eric; Soler, Lydie

2008-11-30

A preliminary step to risk in food assessment is the gathering of experimental data. In the framework of the Sym'Previus project (http://www.symprevius.org), a complete data integration system has been designed, grouping data provided by industrial partners and data extracted from papers published in the main scientific journals of the domain. Those data have been classified by means of a predefined vocabulary, called ontology. Our aim is to complement the database with data extracted from the Web. In the framework of the WebContent project (www.webcontent.fr), we have designed a semi-automatic acquisition tool, called @WEB, which retrieves scientific documents from the Web. During the @WEB process, data tables are extracted from the documents and then annotated with the ontology. We focus on the data tables as they contain, in general, a synthesis of data published in the documents. In this paper, we explain how the columns of the data tables are automatically annotated with data types of the ontology and how the relations represented by the table are recognised. We also give the results of our experimentation to assess the quality of such an annotation.
Support patient search on pathology reports with interactive online learning based data extraction.

PubMed

Zheng, Shuai; Lu, James J; Appin, Christina; Brat, Daniel; Wang, Fusheng

2015-01-01

Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users' corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
Watch-and-Comment as an Approach to Collaboratively Annotate Points of Interest in Video and Interactive-TV Programs

NASA Astrophysics Data System (ADS)

Pimentel, Maria Da Graça C.; Cattelan, Renan G.; Melo, Erick L.; Freitas, Giliard B.; Teixeira, Cesar A.

In earlier work we proposed the Watch-and-Comment (WaC) paradigm as the seamless capture of multimodal comments made by one or more users while watching a video, resulting in the automatic generation of multimedia documents specifying annotated interactive videos. The aim is to allow services to be offered by applying document engineering techniques to the multimedia document generated automatically. The WaC paradigm was demonstrated with a WaCTool prototype application which supports multimodal annotation over video frames and segments, producing a corresponding interactive video. In this chapter, we extend the WaC paradigm to consider contexts in which several viewers may use their own mobile devices while watching and commenting on an interactive-TV program. We first review our previous work. Next, we discuss scenarios in which mobile users can collaborate via the WaC paradigm. We then present a new prototype application which allows users to employ their mobile devices to collaboratively annotate points of interest in video and interactive-TV programs. We also detail the current software infrastructure which supports our new prototype; the infrastructure extends the Ginga middleware for the Brazilian Digital TV with an implementation of the UPnP protocol - the aim is to provide the seamless integration of the users' mobile devices into the TV environment. As a result, the work reported in this chapter defines the WaC paradigm for the mobile-user as an approach to allow the collaborative annotation of the points of interest in video and interactive-TV programs.
Validation of semi-automatic segmentation of the left atrium

NASA Astrophysics Data System (ADS)

Rettmann, M. E.; Holmes, D. R., III; Camp, J. J.; Packer, D. L.; Robb, R. A.

2008-03-01

Catheter ablation therapy has become increasingly popular for the treatment of left atrial fibrillation. The effect of this treatment on left atrial morphology, however, has not yet been completely quantified. Initial studies have indicated a decrease in left atrial size with a concomitant decrease in pulmonary vein diameter. In order to effectively study if catheter based therapies affect left atrial geometry, robust segmentations with minimal user interaction are required. In this work, we validate a method to semi-automatically segment the left atrium from computed-tomography scans. The first step of the technique utilizes seeded region growing to extract the entire blood pool including the four chambers of the heart, the pulmonary veins, aorta, superior vena cava, inferior vena cava, and other surrounding structures. Next, the left atrium and pulmonary veins are separated from the rest of the blood pool using an algorithm that searches for thin connections between user defined points in the volumetric data or on a surface rendering. Finally, pulmonary veins are separated from the left atrium using a three dimensional tracing tool. A single user segmented three datasets three times using both the semi-automatic technique as well as manual tracing. The user interaction time for the semi-automatic technique was approximately forty-five minutes per dataset and the manual tracing required between four and eight hours per dataset depending on the number of slices. A truth model was generated using a simple voting scheme on the repeated manual segmentations. A second user segmented each of the nine datasets using the semi-automatic technique only. Several metrics were computed to assess the agreement between the semi-automatic technique and the truth model including percent differences in left atrial volume, DICE overlap, and mean distance between the boundaries of the segmented left atria. Overall, the semi-automatic approach was demonstrated to be repeatable within and between raters, and accurate when compared to the truth model. Finally, we generated a visualization to assess the spatial variability in the segmentation errors between the semi-automatic approach and the truth model. The visualization demonstrates the highest errors occur at the boundaries between the left atium and pulmonary veins as well as the left atrium and left atrial appendage. In conclusion, we describe a semi-automatic approach for left atrial segmentation that demonstrates repeatability and accuracy, with the advantage of significant time reduction in user interaction time.
Introducing meta-services for biomedical information extraction

PubMed Central

Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

2008-01-01

We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497
Text-mining-assisted biocuration workflows in Argo

PubMed Central

Rak, Rafal; Batista-Navarro, Riza Theresa; Rowley, Andrew; Carter, Jacob; Ananiadou, Sophia

2014-01-01

Biocuration activities have been broadly categorized into the selection of relevant documents, the annotation of biological concepts of interest and identification of interactions between the concepts. Text mining has been shown to have a potential to significantly reduce the effort of biocurators in all the three activities, and various semi-automatic methodologies have been integrated into curation pipelines to support them. We investigate the suitability of Argo, a workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components. Apart from syntactic and semantic analytics, the ever-growing library of components includes several data readers and consumers that support well-established as well as emerging data interchange formats such as XMI, RDF and BioC, which facilitate the interoperability of Argo with other platforms or resources. To validate the suitability of Argo for curation activities, we participated in the BioCreative IV challenge whose purpose was to evaluate Web-based systems addressing user-defined biocuration tasks. Argo proved to have the edge over other systems in terms of flexibility of defining biocuration tasks. As expected, the versatility of the workbench inevitably lengthened the time the curators spent on learning the system before taking on the task, which may have affected the usability of Argo. The participation in the challenge gave us an opportunity to gather valuable feedback and identify areas of improvement, some of which have already been introduced. Database URL: http://argo.nactem.ac.uk PMID:25037308
User Interaction in Semi-Automatic Segmentation of Organs at Risk: a Case Study in Radiotherapy.

PubMed

Ramkumar, Anjana; Dolz, Jose; Kirisli, Hortense A; Adebahr, Sonja; Schimek-Jasch, Tanja; Nestle, Ursula; Massoptier, Laurent; Varga, Edit; Stappers, Pieter Jan; Niessen, Wiro J; Song, Yu

2016-04-01

Accurate segmentation of organs at risk is an important step in radiotherapy planning. Manual segmentation being a tedious procedure and prone to inter- and intra-observer variability, there is a growing interest in automated segmentation methods. However, automatic methods frequently fail to provide satisfactory result, and post-processing corrections are often needed. Semi-automatic segmentation methods are designed to overcome these problems by combining physicians' expertise and computers' potential. This study evaluates two semi-automatic segmentation methods with different types of user interactions, named the "strokes" and the "contour", to provide insights into the role and impact of human-computer interaction. Two physicians participated in the experiment. In total, 42 case studies were carried out on five different types of organs at risk. For each case study, both the human-computer interaction process and quality of the segmentation results were measured subjectively and objectively. Furthermore, different measures of the process and the results were correlated. A total of 36 quantifiable and ten non-quantifiable correlations were identified for each type of interaction. Among those pairs of measures, 20 of the contour method and 22 of the strokes method were strongly or moderately correlated, either directly or inversely. Based on those correlated measures, it is concluded that: (1) in the design of semi-automatic segmentation methods, user interactions need to be less cognitively challenging; (2) based on the observed workflows and preferences of physicians, there is a need for flexibility in the interface design; (3) the correlated measures provide insights that can be used in improving user interaction design.
The TREC Interactive Track: An Annotated Bibliography.

ERIC Educational Resources Information Center

Over, Paul

2001-01-01

Discussion of the study of interactive information retrieval (IR) at the Text Retrieval Conferences (TREC) focuses on summaries of the Interactive Track at each conference. Describes evolution of the track, which has changed from comparing human-machine systems with fully automatic systems to comparing interactive systems that focus on the search…
Automatic acquisition of motion trajectories: tracking hockey players

NASA Astrophysics Data System (ADS)

Okuma, Kenji; Little, James J.; Lowe, David

2003-12-01

Computer systems that have the capability of analyzing complex and dynamic scenes play an essential role in video annotation. Scenes can be complex in such a way that there are many cluttered objects with different colors, shapes and sizes, and can be dynamic with multiple interacting moving objects and a constantly changing background. In reality, there are many scenes that are complex, dynamic, and challenging enough for computers to describe. These scenes include games of sports, air traffic, car traffic, street intersections, and cloud transformations. Our research is about the challenge of inventing a descriptive computer system that analyzes scenes of hockey games where multiple moving players interact with each other on a constantly moving background due to camera motions. Ultimately, such a computer system should be able to acquire reliable data by extracting the players" motion as their trajectories, querying them by analyzing the descriptive information of data, and predict the motions of some hockey players based on the result of the query. Among these three major aspects of the system, we primarily focus on visual information of the scenes, that is, how to automatically acquire motion trajectories of hockey players from video. More accurately, we automatically analyze the hockey scenes by estimating parameters (i.e., pan, tilt, and zoom) of the broadcast cameras, tracking hockey players in those scenes, and constructing a visual description of the data by displaying trajectories of those players. Many technical problems in vision such as fast and unpredictable players' motions and rapid camera motions make our challenge worth tackling. To the best of our knowledge, there have not been any automatic video annotation systems for hockey developed in the past. Although there are many obstacles to overcome, our efforts and accomplishments would hopefully establish the infrastructure of the automatic hockey annotation system and become a milestone for research in automatic video annotation in this domain.
FOCIH: Form-Based Ontology Creation and Information Harvesting

NASA Astrophysics Data System (ADS)

Tao, Cui; Embley, David W.; Liddle, Stephen W.

Creating an ontology and populating it with data are both labor-intensive tasks requiring a high degree of expertise. Thus, scaling ontology creation and population to the size of the web in an effort to create a web of data—which some see as Web 3.0—is prohibitive. Can we find ways to streamline these tasks and lower the barrier enough to enable Web 3.0? Toward this end we offer a form-based approach to ontology creation that provides a way to create Web 3.0 ontologies without the need for specialized training. And we offer a way to semi-automatically harvest data from the current web of pages for a Web 3.0 ontology. In addition to harvesting information with respect to an ontology, the approach also annotates web pages and links facts in web pages to ontological concepts, resulting in a web of data superimposed over the web of pages. Experience with our prototype system shows that mappings between conceptual-model-based ontologies and forms are sufficient for creating the kind of ontologies needed for Web 3.0, and experiments with our prototype system show that automatic harvesting, automatic annotation, and automatic superimposition of a web of data over a web of pages work well.
KUTE-BASE: storing, downloading and exporting MIAME-compliant microarray experiments in minutes rather than hours.

PubMed

Draghici, Sorin; Tarca, Adi L; Yu, Longfei; Ethier, Stephen; Romero, Roberto

2008-03-01

The BioArray Software Environment (BASE) is a very popular MIAME-compliant, web-based microarray data repository. However in BASE, like in most other microarray data repositories, the experiment annotation and raw data uploading can be very timeconsuming, especially for large microarray experiments. We developed KUTE (Karmanos Universal daTabase for microarray Experiments), as a plug-in for BASE 2.0 that addresses these issues. KUTE provides an automatic experiment annotation feature and a completely redesigned data work-flow that dramatically reduce the human-computer interaction time. For instance, in BASE 2.0 a typical Affymetrix experiment involving 100 arrays required 4 h 30 min of user interaction time forexperiment annotation, and 45 min for data upload/download. In contrast, for the same experiment, KUTE required only 28 min of user interaction time for experiment annotation, and 3.3 min for data upload/download. http://vortex.cs.wayne.edu/kute/index.html.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

PubMed Central

Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

2015-01-01

Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

PubMed

Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

2015-09-01

To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Surface smoothness: cartilage biomarkers for knee OA beyond the radiologist

NASA Astrophysics Data System (ADS)

Tummala, Sudhakar; Dam, Erik B.

2010-03-01

Fully automatic imaging biomarkers may allow quantification of patho-physiological processes that a radiologist would not be able to assess reliably. This can introduce new insight but is problematic to validate due to lack of meaningful ground truth expert measurements. Rather than quantification accuracy, such novel markers must therefore be validated against clinically meaningful end-goals such as the ability to allow correct diagnosis. We present a method for automatic cartilage surface smoothness quantification in the knee joint. The quantification is based on a curvature flow method used on tibial and femoral cartilage compartments resulting from an automatic segmentation scheme. These smoothness estimates are validated for their ability to diagnose osteoarthritis and compared to smoothness estimates based on manual expert segmentations and to conventional cartilage volume quantification. We demonstrate that the fully automatic markers eliminate the time required for radiologist annotations, and in addition provide a diagnostic marker superior to the evaluated semi-manual markers.
Computer systems for annotation of single molecule fragments

DOEpatents

Schwartz, David Charles; Severin, Jessica

2016-07-19

There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.
Ensembl 2002: accommodating comparative genomics.

PubMed

Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

2003-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

PubMed Central

Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio

2007-01-01

Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642
iPad: Semantic annotation and markup of radiological images.

PubMed

Rubin, Daniel L; Rodriguez, Cesar; Shah, Priyanka; Beaulieu, Chris

2008-11-06

Radiological images contain a wealth of information,such as anatomy and pathology, which is often not explicit and computationally accessible. Information schemes are being developed to describe the semantic content of images, but such schemes can be unwieldy to operationalize because there are few tools to enable users to capture structured information easily as part of the routine research workflow. We have created iPad, an open source tool enabling researchers and clinicians to create semantic annotations on radiological images. iPad hides the complexity of the underlying image annotation information model from users, permitting them to describe images and image regions using a graphical interface that maps their descriptions to structured ontologies semi-automatically. Image annotations are saved in a variety of formats,enabling interoperability among medical records systems, image archives in hospitals, and the Semantic Web. Tools such as iPad can help reduce the burden of collecting structured information from images, and it could ultimately enable researchers and physicians to exploit images on a very large scale and glean the biological and physiological significance of image content.
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

PubMed

Hettne, Kristina M; Williams, Antony J; van Mulligen, Erik M; Kleinjans, Jos; Tkachenko, Valery; Kors, Jan A

2010-03-23

Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

PubMed Central

2010-01-01

Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist. PMID:20331846

DoctorEye: A clinically driven multifunctional platform, for accurate processing of tumors in medical images.

PubMed

Skounakis, Emmanouil; Farmaki, Christina; Sakkalis, Vangelis; Roniotis, Alexandros; Banitsas, Konstantinos; Graf, Norbert; Marias, Konstantinos

2010-01-01

This paper presents a novel, open access interactive platform for 3D medical image analysis, simulation and visualization, focusing in oncology images. The platform was developed through constant interaction and feedback from expert clinicians integrating a thorough analysis of their requirements while having an ultimate goal of assisting in accurately delineating tumors. It allows clinicians not only to work with a large number of 3D tomographic datasets but also to efficiently annotate multiple regions of interest in the same session. Manual and semi-automatic segmentation techniques combined with integrated correction tools assist in the quick and refined delineation of tumors while different users can add different components related to oncology such as tumor growth and simulation algorithms for improving therapy planning. The platform has been tested by different users and over large number of heterogeneous tomographic datasets to ensure stability, usability, extensibility and robustness with promising results. the platform, a manual and tutorial videos are available at: http://biomodeling.ics.forth.gr. it is free to use under the GNU General Public License.
An evaluation of automatic coronary artery calcium scoring methods with cardiac CT using the orCaScore framework.

PubMed

Wolterink, Jelmer M; Leiner, Tim; de Vos, Bob D; Coatrieux, Jean-Louis; Kelm, B Michael; Kondo, Satoshi; Salgado, Rodrigo A; Shahzad, Rahil; Shu, Huazhong; Snoeren, Miranda; Takx, Richard A P; van Vliet, Lucas J; van Walsum, Theo; Willems, Tineke P; Yang, Guanyu; Zheng, Yefeng; Viergever, Max A; Išgum, Ivana

2016-05-01

The amount of coronary artery calcification (CAC) is a strong and independent predictor of cardiovascular disease (CVD) events. In clinical practice, CAC is manually identified and automatically quantified in cardiac CT using commercially available software. This is a tedious and time-consuming process in large-scale studies. Therefore, a number of automatic methods that require no interaction and semiautomatic methods that require very limited interaction for the identification of CAC in cardiac CT have been proposed. Thus far, a comparison of their performance has been lacking. The objective of this study was to perform an independent evaluation of (semi)automatic methods for CAC scoring in cardiac CT using a publicly available standardized framework. Cardiac CT exams of 72 patients distributed over four CVD risk categories were provided for (semi)automatic CAC scoring. Each exam consisted of a noncontrast-enhanced calcium scoring CT (CSCT) and a corresponding coronary CT angiography (CCTA) scan. The exams were acquired in four different hospitals using state-of-the-art equipment from four major CT scanner vendors. The data were divided into 32 training exams and 40 test exams. A reference standard for CAC in CSCT was defined by consensus of two experts following a clinical protocol. The framework organizers evaluated the performance of (semi)automatic methods on test CSCT scans, per lesion, artery, and patient. Five (semi)automatic methods were evaluated. Four methods used both CSCT and CCTA to identify CAC, and one method used only CSCT. The evaluated methods correctly detected between 52% and 94% of CAC lesions with positive predictive values between 65% and 96%. Lesions in distal coronary arteries were most commonly missed and aortic calcifications close to the coronary ostia were the most common false positive errors. The majority (between 88% and 98%) of correctly identified CAC lesions were assigned to the correct artery. Linearly weighted Cohen's kappa for patient CVD risk categorization by the evaluated methods ranged from 0.80 to 1.00. A publicly available standardized framework for the evaluation of (semi)automatic methods for CAC identification in cardiac CT is described. An evaluation of five (semi)automatic methods within this framework shows that automatic per patient CVD risk categorization is feasible. CAC lesions at ambiguous locations such as the coronary ostia remain challenging, but their detection had limited impact on CVD risk determination.
Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

PubMed Central

Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean

2006-01-01

Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810
K-Nearest Neighbors Relevance Annotation Model for Distance Education

ERIC Educational Resources Information Center

Ke, Xiao; Li, Shaozi; Cao, Donglin

2011-01-01

With the rapid development of Internet technologies, distance education has become a popular educational mode. In this paper, the authors propose an online image automatic annotation distance education system, which could effectively help children learn interrelations between image content and corresponding keywords. Image automatic annotation is…
Semi-Automatic Segmentation Software for Quantitative Clinical Brain Glioblastoma Evaluation

PubMed Central

Zhu, Y; Young, G; Xue, Z; Huang, R; You, H; Setayesh, K; Hatabu, H; Cao, F; Wong, S.T.

2012-01-01

Rationale and Objectives Quantitative measurement provides essential information about disease progression and treatment response in patients with Glioblastoma multiforme (GBM). The goal of this paper is to present and validate a software pipeline for semi-automatic GBM segmentation, called AFINITI (Assisted Follow-up in NeuroImaging of Therapeutic Intervention), using clinical data from GBM patients. Materials and Methods Our software adopts the current state-of-the-art tumor segmentation algorithms and combines them into one clinically usable pipeline. Both the advantages of the traditional voxel-based and the deformable shape-based segmentation are embedded into the software pipeline. The former provides an automatic tumor segmentation scheme based on T1- and T2-weighted MR brain data, and the latter refines the segmentation results with minimal manual input. Results Twenty six clinical MR brain images of GBM patients were processed and compared with manual results. The results can be visualized using the embedded graphic user interface (GUI). Conclusion Validation results using clinical GBM data showed high correlation between the AFINITI results and manual annotation. Compared to the voxel-wise segmentation, AFINITI yielded more accurate results in segmenting the enhanced GBM from multimodality MRI data. The proposed pipeline could be used as additional information to interpret MR brain images in neuroradiology. PMID:22591720
Challenges for automatically extracting molecular interactions from full-text articles.

PubMed

McIntosh, Tara; Curran, James R

2009-09-24

The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
Automatic annotation of histopathological images using a latent topic model based on non-negative matrix factorization

PubMed Central

Cruz-Roa, Angel; Díaz, Gloria; Romero, Eduardo; González, Fabio A.

2011-01-01

Histopathological images are an important resource for clinical diagnosis and biomedical research. From an image understanding point of view, the automatic annotation of these images is a challenging problem. This paper presents a new method for automatic histopathological image annotation based on three complementary strategies, first, a part-based image representation, called the bag of features, which takes advantage of the natural redundancy of histopathological images for capturing the fundamental patterns of biological structures, second, a latent topic model, based on non-negative matrix factorization, which captures the high-level visual patterns hidden in the image, and, third, a probabilistic annotation model that links visual appearance of morphological and architectural features associated to 10 histopathological image annotations. The method was evaluated using 1,604 annotated images of skin tissues, which included normal and pathological architectural and morphological features, obtaining a recall of 74% and a precision of 50%, which improved a baseline annotation method based on support vector machines in a 64% and 24%, respectively. PMID:22811960
Automatic Annotation Method on Learners' Opinions in Case Method Discussion

ERIC Educational Resources Information Center

Samejima, Masaki; Hisakane, Daichi; Komoda, Norihisa

2015-01-01

Purpose: The purpose of this paper is to annotate an attribute of a problem, a solution or no annotation on learners' opinions automatically for supporting the learners' discussion without a facilitator. The case method aims at discussing problems and solutions in a target case. However, the learners miss discussing some of problems and solutions.…
Biotea: semantics for Pubmed Central.

PubMed

Garcia, Alexander; Lopez, Federico; Garcia, Leyla; Giraldo, Olga; Bucheli, Victor; Dumontier, Michel

2018-01-01

A significant portion of biomedical literature is represented in a manner that makes it difficult for consumers to find or aggregate content through a computational query. One approach to facilitate reuse of the scientific literature is to structure this information as linked data using standardized web technologies. In this paper we present the second version of Biotea, a semantic, linked data version of the open-access subset of PubMed Central that has been enhanced with specialized annotation pipelines that uses existing infrastructure from the National Center for Biomedical Ontology. We expose our models, services, software and datasets. Our infrastructure enables manual and semi-automatic annotation, resulting data are represented as RDF-based linked data and can be readily queried using the SPARQL query language. We illustrate the utility of our system with several use cases. Our datasets, methods and techniques are available at http://biotea.github.io.
ALLocator: an interactive web platform for the analysis of metabolomic LC-ESI-MS datasets, enabling semi-automated, user-revised compound annotation and mass isotopomer ratio analysis.

PubMed

Kessler, Nikolas; Walter, Frederik; Persicke, Marcus; Albaum, Stefan P; Kalinowski, Jörn; Goesmann, Alexander; Niehaus, Karsten; Nattkemper, Tim W

2014-01-01

Adduct formation, fragmentation events and matrix effects impose special challenges to the identification and quantitation of metabolites in LC-ESI-MS datasets. An important step in compound identification is the deconvolution of mass signals. During this processing step, peaks representing adducts, fragments, and isotopologues of the same analyte are allocated to a distinct group, in order to separate peaks from coeluting compounds. From these peak groups, neutral masses and pseudo spectra are derived and used for metabolite identification via mass decomposition and database matching. Quantitation of metabolites is hampered by matrix effects and nonlinear responses in LC-ESI-MS measurements. A common approach to correct for these effects is the addition of a U-13C-labeled internal standard and the calculation of mass isotopomer ratios for each metabolite. Here we present a new web-platform for the analysis of LC-ESI-MS experiments. ALLocator covers the workflow from raw data processing to metabolite identification and mass isotopomer ratio analysis. The integrated processing pipeline for spectra deconvolution "ALLocatorSD" generates pseudo spectra and automatically identifies peaks emerging from the U-13C-labeled internal standard. Information from the latter improves mass decomposition and annotation of neutral losses. ALLocator provides an interactive and dynamic interface to explore and enhance the results in depth. Pseudo spectra of identified metabolites can be stored in user- and method-specific reference lists that can be applied on succeeding datasets. The potential of the software is exemplified in an experiment, in which abundance fold-changes of metabolites of the l-arginine biosynthesis in C. glutamicum type strain ATCC 13032 and l-arginine producing strain ATCC 21831 are compared. Furthermore, the capability for detection and annotation of uncommon large neutral losses is shown by the identification of (γ-)glutamyl dipeptides in the same strains. ALLocator is available online at: https://allocator.cebitec.uni-bielefeld.de. A login is required, but freely available.
Interactive approach to segment organs at risk in radiotherapy treatment planning

NASA Astrophysics Data System (ADS)

Dolz, Jose; Kirisli, Hortense A.; Viard, Romain; Massoptier, Laurent

2014-03-01

Accurate delineation of organs at risk (OAR) is required for radiation treatment planning (RTP). However, it is a very time consuming and tedious task. The use in clinic of image guided radiation therapy (IGRT) becomes more and more popular, thus increasing the need of (semi-)automatic methods for delineation of the OAR. In this work, an interactive segmentation approach to delineate OAR is proposed and validated. The method is based on the combination of watershed transformation, which groups small areas of similar intensities in homogeneous labels, and graph cuts approach, which uses these labels to create the graph. Segmentation information can be added in any view - axial, sagittal or coronal -, making the interaction with the algorithm easy and fast. Subsequently, this information is propagated within the whole volume, providing a spatially coherent result. Manual delineations made by experts of 6 OAR - lungs, kidneys, liver, spleen, heart and aorta - over a set of 9 computed tomography (CT) scans were used as reference standard to validate the proposed approach. With a maximum of 4 interactions, a Dice similarity coefficient (DSC) higher than 0.87 was obtained, which demonstrates that, with the proposed segmentation approach, only few interactions are required to achieve similar results as the ones obtained manually. The integration of this method in the RTP process may save a considerable amount of time, and reduce the annotation complexity.
Semi-automatic knee cartilage segmentation

NASA Astrophysics Data System (ADS)

Dam, Erik B.; Folkesson, Jenny; Pettersen, Paola C.; Christiansen, Claus

2006-03-01

Osteo-Arthritis (OA) is a very common age-related cause of pain and reduced range of motion. A central effect of OA is wear-down of the articular cartilage that otherwise ensures smooth joint motion. Quantification of the cartilage breakdown is central in monitoring disease progression and therefore cartilage segmentation is required. Recent advances allow automatic cartilage segmentation with high accuracy in most cases. However, the automatic methods still fail in some problematic cases. For clinical studies, even if a few failing cases will be averaged out in the overall results, this reduces the mean accuracy and precision and thereby necessitates larger/longer studies. Since the severe OA cases are often most problematic for the automatic methods, there is even a risk that the quantification will introduce a bias in the results. Therefore, interactive inspection and correction of these problematic cases is desirable. For diagnosis on individuals, this is even more crucial since the diagnosis will otherwise simply fail. We introduce and evaluate a semi-automatic cartilage segmentation method combining an automatic pre-segmentation with an interactive step that allows inspection and correction. The automatic step consists of voxel classification based on supervised learning. The interactive step combines a watershed transformation of the original scan with the posterior probability map from the classification step at sub-voxel precision. We evaluate the method for the task of segmenting the tibial cartilage sheet from low-field magnetic resonance imaging (MRI) of knees. The evaluation shows that the combined method allows accurate and highly reproducible correction of the segmentation of even the worst cases in approximately ten minutes of interaction.
Automatic ultrasound image enhancement for 2D semi-automatic breast-lesion segmentation

NASA Astrophysics Data System (ADS)

Lu, Kongkuo; Hall, Christopher S.

2014-03-01

Breast cancer is the fastest growing cancer, accounting for 29%, of new cases in 2012, and second leading cause of cancer death among women in the United States and worldwide. Ultrasound (US) has been used as an indispensable tool for breast cancer detection/diagnosis and treatment. In computer-aided assistance, lesion segmentation is a preliminary but vital step, but the task is quite challenging in US images, due to imaging artifacts that complicate detection and measurement of the suspect lesions. The lesions usually present with poor boundary features and vary significantly in size, shape, and intensity distribution between cases. Automatic methods are highly application dependent while manual tracing methods are extremely time consuming and have a great deal of intra- and inter- observer variability. Semi-automatic approaches are designed to counterbalance the advantage and drawbacks of the automatic and manual methods. However, considerable user interaction might be necessary to ensure reasonable segmentation for a wide range of lesions. This work proposes an automatic enhancement approach to improve the boundary searching ability of the live wire method to reduce necessary user interaction while keeping the segmentation performance. Based on the results of segmentation of 50 2D breast lesions in US images, less user interaction is required to achieve desired accuracy, i.e. < 80%, when auto-enhancement is applied for live-wire segmentation.
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

PubMed Central

2012-01-01

Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Multiple sclerosis lesion segmentation using an automatic multimodal graph cuts.

PubMed

García-Lorenzo, Daniel; Lecoeur, Jeremy; Arnold, Douglas L; Collins, D Louis; Barillot, Christian

2009-01-01

Graph Cuts have been shown as a powerful interactive segmentation technique in several medical domains. We propose to automate the Graph Cuts in order to automatically segment Multiple Sclerosis (MS) lesions in MRI. We replace the manual interaction with a robust EM-based approach in order to discriminate between MS lesions and the Normal Appearing Brain Tissues (NABT). Evaluation is performed in synthetic and real images showing good agreement between the automatic segmentation and the target segmentation. We compare our algorithm with the state of the art techniques and with several manual segmentations. An advantage of our algorithm over previously published ones is the possibility to semi-automatically improve the segmentation due to the Graph Cuts interactive feature.
Standardized evaluation framework for evaluating coronary artery stenosis detection, stenosis quantification and lumen segmentation algorithms in computed tomography angiography.

PubMed

Kirişli, H A; Schaap, M; Metz, C T; Dharampal, A S; Meijboom, W B; Papadopoulou, S L; Dedic, A; Nieman, K; de Graaf, M A; Meijs, M F L; Cramer, M J; Broersen, A; Cetin, S; Eslami, A; Flórez-Valencia, L; Lor, K L; Matuszewski, B; Melki, I; Mohr, B; Oksüz, I; Shahzad, R; Wang, C; Kitslaar, P H; Unal, G; Katouzian, A; Örkisz, M; Chen, C M; Precioso, F; Najman, L; Masood, S; Ünay, D; van Vliet, L; Moreno, R; Goldenberg, R; Vuçini, E; Krestin, G P; Niessen, W J; van Walsum, T

2013-12-01

Though conventional coronary angiography (CCA) has been the standard of reference for diagnosing coronary artery disease in the past decades, computed tomography angiography (CTA) has rapidly emerged, and is nowadays widely used in clinical practice. Here, we introduce a standardized evaluation framework to reliably evaluate and compare the performance of the algorithms devised to detect and quantify the coronary artery stenoses, and to segment the coronary artery lumen in CTA data. The objective of this evaluation framework is to demonstrate the feasibility of dedicated algorithms to: (1) (semi-)automatically detect and quantify stenosis on CTA, in comparison with quantitative coronary angiography (QCA) and CTA consensus reading, and (2) (semi-)automatically segment the coronary lumen on CTA, in comparison with expert's manual annotation. A database consisting of 48 multicenter multivendor cardiac CTA datasets with corresponding reference standards are described and made available. The algorithms from 11 research groups were quantitatively evaluated and compared. The results show that (1) some of the current stenosis detection/quantification algorithms may be used for triage or as a second-reader in clinical practice, and that (2) automatic lumen segmentation is possible with a precision similar to that obtained by experts. The framework is open for new submissions through the website, at http://coronary.bigr.nl/stenoses/. Copyright © 2013 Elsevier B.V. All rights reserved.
Modeling loosely annotated images using both given and imagined annotations

NASA Astrophysics Data System (ADS)

Tang, Hong; Boujemaa, Nozha; Chen, Yunhao; Deng, Lei

2011-12-01

In this paper, we present an approach to learn latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: 1. ambiguous correspondences between visual features and annotated keywords; 2. incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning a topic model. In particular, some ``imagined'' keywords are poured into the incomplete annotation through measuring similarity between keywords in terms of their co-occurrence. Then, both given and imagined annotations are employed to learn probabilistic topic models for automatically annotating new images. We conduct experiments on two image databases (i.e., Corel and ESP) coupled with their loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods. The proposed method improves word-driven probability latent semantic analysis (PLSA-words) up to a comparable performance with the best discrete annotation method, while a merit of PLSA-words is still kept, i.e., a wider semantic range.
An ontology-based framework for bioinformatics workflows.

PubMed

Digiampietri, Luciano A; Perez-Alcazar, Jose de J; Medeiros, Claudia Bauzer

2007-01-01

The proliferation of bioinformatics activities brings new challenges - how to understand and organise these resources, how to exchange and reuse successful experimental procedures, and to provide interoperability among data and tools. This paper describes an effort toward these directions. It is based on combining research on ontology management, AI and scientific workflows to design, reuse and annotate bioinformatics experiments. The resulting framework supports automatic or interactive composition of tasks based on AI planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows. We validate our proposal with a prototype running on real data.
Preclinical Biokinetic Modelling of Tc-99m Radiophamaceuticals Obtained from Semi-Automatic Image Processing.

PubMed

Cornejo-Aragón, Luz G; Santos-Cuevas, Clara L; Ocampo-García, Blanca E; Chairez-Oria, Isaac; Diaz-Nieto, Lorenza; García-Quiroz, Janice

2017-01-01

The aim of this study was to develop a semi automatic image processing algorithm (AIPA) based on the simultaneous information provided by X-ray and radioisotopic images to determine the biokinetic models of Tc-99m radiopharmaceuticals from quantification of image radiation activity in murine models. These radioisotopic images were obtained by a CCD (charge couple device) camera coupled to an ultrathin phosphorous screen in a preclinical multimodal imaging system (Xtreme, Bruker). The AIPA consisted of different image processing methods for background, scattering and attenuation correction on the activity quantification. A set of parametric identification algorithms was used to obtain the biokinetic models that characterize the interaction between different tissues and the radiopharmaceuticals considered in the study. The set of biokinetic models corresponded to the Tc-99m biodistribution observed in different ex vivo studies. This fact confirmed the contribution of the semi-automatic image processing technique developed in this study.
Sma3s: a three-step modular annotator for large sequence datasets.

PubMed

Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J

2014-08-01

Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

larvalign: Aligning Gene Expression Patterns from the Larval Brain of Drosophila melanogaster.

PubMed

Muenzing, Sascha E A; Strauch, Martin; Truman, James W; Bühler, Katja; Thum, Andreas S; Merhof, Dorit

2018-01-01

The larval brain of the fruit fly Drosophila melanogaster is a small, tractable model system for neuroscience. Genes for fluorescent marker proteins can be expressed in defined, spatially restricted neuron populations. Here, we introduce the methods for 1) generating a standard template of the larval central nervous system (CNS), 2) spatial mapping of expression patterns from different larvae into a reference space defined by the standard template. We provide a manually annotated gold standard that serves for evaluation of the registration framework involved in template generation and mapping. A method for registration quality assessment enables the automatic detection of registration errors, and a semi-automatic registration method allows one to correct registrations, which is a prerequisite for a high-quality, curated database of expression patterns. All computational methods are available within the larvalign software package: https://github.com/larvalign/larvalign/releases/tag/v1.0.
Real-time pulse oximetry artifact annotation on computerized anaesthetic records.

PubMed

Gostt, Richard Karl; Rathbone, Graeme Dennis; Tucker, Adam Paul

2002-01-01

Adoption of computerised anaesthesia record keeping systems has been limited by the concern that they record artifactual data and accurate data indiscriminately. Data resulting from artifacts does not reflect the patient's true condition and presents a problem in later analysis of the record, with associated medico-legal implications. This study developed an algorithm to automatically annotate pulse oximetry artifacts and sought to evaluate the algorithm's accuracy in routine surgical procedures. MacAnaesthetist is a semi-automatic anaesthetic record keeping system developed for the Apple Macintosh computer, which incorporated an algorithm designed to automatically detect pulse oximetry artifacts. The algorithm labeled artifactual oxygen saturation values < 90%. This was done in real-time by analyzing physiological data captured from a Datex AS/3 Anaesthesia Monitor. An observational study was conducted to evaluate the accuracy of the algorithm during routine surgical procedures (n = 20). An anaesthetic record was made by an anaesthetist using the Datex AS/3 record keeper, while a second anaesthetic record was produced in parallel using MacAnaesthetist. A copy of the Datex AS/3 record was kept for later review by a group of anaesthetists (n = 20), who judged oxygen saturation values < 90% to be either genuine or artifact. MacAnaesthetist correctly labeled 12 out of 13 oxygen saturations < 90% (92.3% accuracy). A post-operative review of the Datex AS/3 anaesthetic records (n = 8) by twenty anaesthetists resulted in 127 correct responses out of total of 200 (63.5% accuracy). The remaining Datex AS/3 records (n = 12) were not reviewed, as they did not contain any oxygen saturations <90%. The real-time artifact detection algorithm developed in this study was more accurate than anaesthetists who post-operatively reviewed records produced by an existing computerised anaesthesia record keeping system. Algorithms have the potential to more accurately identify and annotate artifacts on computerised anaesthetic records, assisting clinicians to more correctly interpret abnormal data.
Haptic exploratory behavior during object discrimination: a novel automatic annotation method.

PubMed

Jansen, Sander E M; Bergmann Tiest, Wouter M; Kappers, Astrid M L

2015-01-01

In order to acquire information concerning the geometry and material of handheld objects, people tend to execute stereotypical hand movement patterns called haptic Exploratory Procedures (EPs). Manual annotation of haptic exploration trials with these EPs is a laborious task that is affected by subjectivity, attentional lapses, and viewing angle limitations. In this paper we propose an automatic EP annotation method based on position and orientation data from motion tracking sensors placed on both hands and inside a stimulus. A set of kinematic variables is computed from these data and compared to sets of predefined criteria for each of four EPs. Whenever all criteria for a specific EP are met, it is assumed that that particular hand movement pattern was performed. This method is applied to data from an experiment where blindfolded participants haptically discriminated between objects differing in hardness, roughness, volume, and weight. In order to validate the method, its output is compared to manual annotation based on video recordings of the same trials. Although mean pairwise agreement is less between human-automatic pairs than between human-human pairs (55.7% vs 74.5%), the proposed method performs much better than random annotation (2.4%). Furthermore, each EP is linked to a specific object property for which it is optimal (e.g., Lateral Motion for roughness). We found that the percentage of trials where the expected EP was found does not differ between manual and automatic annotation. For now, this method cannot yet completely replace a manual annotation procedure. However, it could be used as a starting point that can be supplemented by manual annotation.
Semi-Supervised Learning to Identify UMLS Semantic Relations.

PubMed

Luo, Yuan; Uzuner, Ozlem

2014-01-01

The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions

PubMed Central

Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran

2013-01-01

The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: ••• PMID:23707966
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.

PubMed

Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran

2013-01-01

The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••
A fully automatic end-to-end method for content-based image retrieval of CT scans with similar liver lesion annotations.

PubMed

Spanier, A B; Caplan, N; Sosna, J; Acar, B; Joskowicz, L

2018-01-01

The goal of medical content-based image retrieval (M-CBIR) is to assist radiologists in the decision-making process by retrieving medical cases similar to a given image. One of the key interests of radiologists is lesions and their annotations, since the patient treatment depends on the lesion diagnosis. Therefore, a key feature of M-CBIR systems is the retrieval of scans with the most similar lesion annotations. To be of value, M-CBIR systems should be fully automatic to handle large case databases. We present a fully automatic end-to-end method for the retrieval of CT scans with similar liver lesion annotations. The input is a database of abdominal CT scans labeled with liver lesions, a query CT scan, and optionally one radiologist-specified lesion annotation of interest. The output is an ordered list of the database CT scans with the most similar liver lesion annotations. The method starts by automatically segmenting the liver in the scan. It then extracts a histogram-based features vector from the segmented region, learns the features' relative importance, and ranks the database scans according to the relative importance measure. The main advantages of our method are that it fully automates the end-to-end querying process, that it uses simple and efficient techniques that are scalable to large datasets, and that it produces quality retrieval results using an unannotated CT scan. Our experimental results on 9 CT queries on a dataset of 41 volumetric CT scans from the 2014 Image CLEF Liver Annotation Task yield an average retrieval accuracy (Normalized Discounted Cumulative Gain index) of 0.77 and 0.84 without/with annotation, respectively. Fully automatic end-to-end retrieval of similar cases based on image information alone, rather that on disease diagnosis, may help radiologists to better diagnose liver lesions.
GARNET--gene set analysis with exploration of annotation relations.

PubMed

Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

2011-02-15

Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

PubMed Central

Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest

2006-01-01

Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
Response Evaluation of Malignant Liver Lesions After TACE/SIRT: Comparison of Manual and Semi-Automatic Measurement of Different Response Criteria in Multislice CT.

PubMed

Höink, Anna Janina; Schülke, Christoph; Koch, Raphael; Löhnert, Annika; Kammerer, Sara; Fortkamp, Rasmus; Heindel, Walter; Buerke, Boris

2017-11-01

Purpose To compare measurement precision and interobserver variability in the evaluation of hepatocellular carcinoma (HCC) and liver metastases in MSCT before and after transarterial local ablative therapies. Materials and Methods Retrospective study of 72 patients with malignant liver lesions (42 metastases; 30 HCCs) before and after therapy (43 SIRT procedures; 29 TACE procedures). Established (LAD; SAD; WHO) and vitality-based parameters (mRECIST; mLAD; mSAD; EASL) were assessed manually and semi-automatically by two readers. The relative interobserver difference (RID) and intraclass correlation coefficient (ICC) were calculated. Results The median RID for vitality-based parameters was lower from semi-automatic than from manual measurement of mLAD (manual 12.5 %; semi-automatic 3.4 %), mSAD (manual 12.7 %; semi-automatic 5.7 %) and EASL (manual 10.4 %; semi-automatic 1.8 %). The difference in established parameters was not statistically noticeable (p > 0.05). The ICCs of LAD (manual 0.984; semi-automatic 0.982), SAD (manual 0.975; semi-automatic 0.958) and WHO (manual 0.984; semi-automatic 0.978) are high, both in manual and semi-automatic measurements. The ICCs of manual measurements of mLAD (0.897), mSAD (0.844) and EASL (0.875) are lower. This decrease cannot be found in semi-automatic measurements of mLAD (0.997), mSAD (0.992) and EASL (0.998). Conclusion Vitality-based tumor measurements of HCC and metastases after transarterial local therapies should be performed semi-automatically due to greater measurement precision, thus increasing the reproducibility and in turn the reliability of therapeutic decisions. Key points · Liver lesion measurements according to EASL and mRECIST are more precise when performed semi-automatically.. · The higher reproducibility may facilitate a more reliable classification of therapy response.. · Measurements according to RECIST and WHO offer equivalent precision semi-automatically and manually.. Citation Format · Höink AJ, Schülke C, Koch R et al. Response Evaluation of Malignant Liver Lesions After TACE/SIRT: Comparison of Manual and Semi-Automatic Measurement of Different Response Criteria in Multislice CT. Fortschr Röntgenstr 2017; 189: 1067 - 1075. © Georg Thieme Verlag KG Stuttgart · New York.
Morphosyntactic annotation of CHILDES transcripts*

PubMed Central

SAGAE, KENJI; DAVIS, ERIC; LAVIE, ALON; MACWHINNEY, BRIAN; WINTNER, SHULY

2014-01-01

Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes. PMID:20334720
Critical Assessment of Small Molecule Identification 2016: automated methods.

PubMed

Schymanski, Emma L; Ruttkies, Christoph; Krauss, Martin; Brouard, Céline; Kind, Tobias; Dührkop, Kai; Allen, Felicity; Vaniya, Arpana; Verdegem, Dries; Böcker, Sebastian; Rousu, Juho; Shen, Huibin; Tsugawa, Hiroshi; Sajed, Tanvir; Fiehn, Oliver; Ghesquière, Bart; Neumann, Steffen

2017-03-27

The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification. The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in "Category 2: Best Automatic Structural Identification-In Silico Fragmentation Only", won by Team Brouard with 41% challenge wins. The winner of "Category 3: Best Automatic Structural Identification-Full Information" was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways. The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for "known unknowns". As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for "real life" annotations. The true "unknown unknowns" remain to be evaluated in future CASMI contests. Graphical abstract .
Automatic recognition of conceptualization zones in scientific articles and two life science applications.

PubMed

Liakata, Maria; Saha, Shyamasree; Dobnik, Simon; Batchelor, Colin; Rebholz-Schuhmann, Dietrich

2012-04-01

Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication. We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with 'Experiment', 'Background' and 'Model' being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated CoreSCs in two biomedical applications as well as work in progress. A web-based tool for the automatic annotation of articles with CoreSCs and corresponding documentation is available online at http://www.sapientaproject.com/software http://www.sapientaproject.com also contains detailed information pertaining to CoreSC annotation and links to annotation guidelines as well as a corpus of manually annotated articles, which served as our training data. liakata@ebi.ac.uk Supplementary data are available at Bioinformatics online.
Extraction of sandy bedforms features through geodesic morphometry

NASA Astrophysics Data System (ADS)

Debese, Nathalie; Jacq, Jean-José; Garlan, Thierry

2016-09-01

State-of-art echosounders reveal fine-scale details of mobile sandy bedforms, which are commonly found on continental shelfs. At present, their dynamics are still far from being completely understood. These bedforms are a serious threat to navigation security, anthropic structures and activities, placing emphasis on research breakthroughs. Bedform geometries and their dynamics are closely linked; therefore, one approach is to develop semi-automatic tools aiming at extracting their structural features from bathymetric datasets. Current approaches mimic manual processes or rely on morphological simplification of bedforms. The 1D and 2D approaches cannot address the wide ranges of both types and complexities of bedforms. In contrast, this work attempts to follow a 3D global semi-automatic approach based on a bathymetric TIN. The currently extracted primitives are the salient ridge and valley lines of the sand structures, i.e., waves and mega-ripples. The main difficulty is eliminating the ripples that are found to heavily overprint any observations. To this end, an anisotropic filter that is able to discard these structures while still enhancing the wave ridges is proposed. The second part of the work addresses the semi-automatic interactive extraction and 3D augmented display of the main lines structures. The proposed protocol also allows geoscientists to interactively insert topological constraints.
Quantitative analysis of the patellofemoral motion pattern using semi-automatic processing of 4D CT data.

PubMed

Forsberg, Daniel; Lindblom, Maria; Quick, Petter; Gauffin, Håkan

2016-09-01

To present a semi-automatic method with minimal user interaction for quantitative analysis of the patellofemoral motion pattern. 4D CT data capturing the patellofemoral motion pattern of a continuous flexion and extension were collected for five patients prone to patellar luxation both pre- and post-surgically. For the proposed method, an observer would place landmarks in a single 3D volume, which then are automatically propagated to the other volumes in a time sequence. From the landmarks in each volume, the measures patellar displacement, patellar tilt and angle between femur and tibia were computed. Evaluation of the observer variability showed the proposed semi-automatic method to be favorable over a fully manual counterpart, with an observer variability of approximately 1.5[Formula: see text] for the angle between femur and tibia, 1.5 mm for the patellar displacement, and 4.0[Formula: see text]-5.0[Formula: see text] for the patellar tilt. The proposed method showed that surgery reduced the patellar displacement and tilt at maximum extension with approximately 10-15 mm and 15[Formula: see text]-20[Formula: see text] for three patients but with less evident differences for two of the patients. A semi-automatic method suitable for quantification of the patellofemoral motion pattern as captured by 4D CT data has been presented. Its observer variability is on par with that of other methods but with the distinct advantage to support continuous motions during the image acquisition.
A semi-automatic 2D-to-3D video conversion with adaptive key-frame selection

NASA Astrophysics Data System (ADS)

Ju, Kuanyu; Xiong, Hongkai

2014-11-01

To compensate the deficit of 3D content, 2D to 3D video conversion (2D-to-3D) has recently attracted more attention from both industrial and academic communities. The semi-automatic 2D-to-3D conversion which estimates corresponding depth of non-key-frames through key-frames is more desirable owing to its advantage of balancing labor cost and 3D effects. The location of key-frames plays a role on quality of depth propagation. This paper proposes a semi-automatic 2D-to-3D scheme with adaptive key-frame selection to keep temporal continuity more reliable and reduce the depth propagation errors caused by occlusion. The potential key-frames would be localized in terms of clustered color variation and motion intensity. The distance of key-frame interval is also taken into account to keep the accumulated propagation errors under control and guarantee minimal user interaction. Once their depth maps are aligned with user interaction, the non-key-frames depth maps would be automatically propagated by shifted bilateral filtering. Considering that depth of objects may change due to the objects motion or camera zoom in/out effect, a bi-directional depth propagation scheme is adopted where a non-key frame is interpolated from two adjacent key frames. The experimental results show that the proposed scheme has better performance than existing 2D-to-3D scheme with fixed key-frame interval.
A Case Study on Sepsis Using PubMed and Deep Learning for Ontology Learning.

PubMed

Arguello Casteleiro, Mercedes; Maseda Fernandez, Diego; Demetriou, George; Read, Warren; Fernandez Prieto, Maria Jesus; Des Diz, Julio; Nenadic, Goran; Keane, John; Stevens, Robert

2017-01-01

We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora. Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision.
Generating Customized Verifiers for Automatically Generated Code

NASA Technical Reports Server (NTRS)

Denney, Ewen; Fischer, Bernd

2008-01-01

Program verification using Hoare-style techniques requires many logical annotations. We have previously developed a generic annotation inference algorithm that weaves in all annotations required to certify safety properties for automatically generated code. It uses patterns to capture generator- and property-specific code idioms and property-specific meta-program fragments to construct the annotations. The algorithm is customized by specifying the code patterns and integrating them with the meta-program fragments for annotation construction. However, this is difficult since it involves tedious and error-prone low-level term manipulations. Here, we describe an annotation schema compiler that largely automates this customization task using generative techniques. It takes a collection of high-level declarative annotation schemas tailored towards a specific code generator and safety property, and generates all customized analysis functions and glue code required for interfacing with the generic algorithm core, thus effectively creating a customized annotation inference algorithm. The compiler raises the level of abstraction and simplifies schema development and maintenance. It also takes care of some more routine aspects of formulating patterns and schemas, in particular handling of irrelevant program fragments and irrelevant variance in the program structure, which reduces the size, complexity, and number of different patterns and annotation schemas that are required. The improvements described here make it easier and faster to customize the system to a new safety property or a new generator, and we demonstrate this by customizing it to certify frame safety of space flight navigation code that was automatically generated from Simulink models by MathWorks' Real-Time Workshop.
Automatic 2.5-D Facial Landmarking and Emotion Annotation for Social Interaction Assistance.

PubMed

Zhao, Xi; Zou, Jianhua; Li, Huibin; Dellandrea, Emmanuel; Kakadiaris, Ioannis A; Chen, Liming

2016-09-01

People with low vision, Alzheimer's disease, and autism spectrum disorder experience difficulties in perceiving or interpreting facial expression of emotion in their social lives. Though automatic facial expression recognition (FER) methods on 2-D videos have been extensively investigated, their performance was constrained by challenges in head pose and lighting conditions. The shape information in 3-D facial data can reduce or even overcome these challenges. However, high expenses of 3-D cameras prevent their widespread use. Fortunately, 2.5-D facial data from emerging portable RGB-D cameras provide a good balance for this dilemma. In this paper, we propose an automatic emotion annotation solution on 2.5-D facial data collected from RGB-D cameras. The solution consists of a facial landmarking method and a FER method. Specifically, we propose building a deformable partial face model and fit the model to a 2.5-D face for localizing facial landmarks automatically. In FER, a novel action unit (AU) space-based FER method has been proposed. Facial features are extracted using landmarks and further represented as coordinates in the AU space, which are classified into facial expressions. Evaluated on three publicly accessible facial databases, namely EURECOM, FRGC, and Bosphorus databases, the proposed facial landmarking and expression recognition methods have achieved satisfactory results. Possible real-world applications using our algorithms have also been discussed.
Overview of the gene ontology task at BioCreative IV.

PubMed

Mao, Yuqing; Van Auken, Kimberly; Li, Donghui; Arighi, Cecilia N; McQuilton, Peter; Hayman, G Thomas; Tweedie, Susan; Schaeffer, Mary L; Laulederkind, Stanley J F; Wang, Shur-Jen; Gobeill, Julien; Ruch, Patrick; Luu, Anh Tuan; Kim, Jung-Jae; Chiang, Jung-Hsien; Chen, Yu-De; Yang, Chia-Jung; Liu, Hongfang; Zhu, Dongqing; Li, Yanpeng; Yu, Hong; Emadzadeh, Ehsan; Gonzalez, Graciela; Chen, Jian-Ming; Dai, Hong-Jie; Lu, Zhiyong

2014-01-01

Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation. http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

Interoperability between biomedical ontologies through relation expansion, upper-level ontologies and automatic reasoning.

PubMed

Hoehndorf, Robert; Dumontier, Michel; Oellrich, Anika; Rebholz-Schuhmann, Dietrich; Schofield, Paul N; Gkoutos, Georgios V

2011-01-01

Researchers design ontologies as a means to accurately annotate and integrate experimental data across heterogeneous and disparate data- and knowledge bases. Formal ontologies make the semantics of terms and relations explicit such that automated reasoning can be used to verify the consistency of knowledge. However, many biomedical ontologies do not sufficiently formalize the semantics of their relations and are therefore limited with respect to automated reasoning for large scale data integration and knowledge discovery. We describe a method to improve automated reasoning over biomedical ontologies and identify several thousand contradictory class definitions. Our approach aligns terms in biomedical ontologies with foundational classes in a top-level ontology and formalizes composite relations as class expressions. We describe the semi-automated repair of contradictions and demonstrate expressive queries over interoperable ontologies. Our work forms an important cornerstone for data integration, automatic inference and knowledge discovery based on formal representations of knowledge. Our results and analysis software are available at http://bioonto.de/pmwiki.php/Main/ReasonableOntologies.
Game-powered machine learning

PubMed Central

Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

2012-01-01

Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the “wisdom of the crowds.” Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., “funky jazz with saxophone,” “spooky electronica,” etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data. PMID:22460786
Game-powered machine learning.

PubMed

Barrington, Luke; Turnbull, Douglas; Lanckriet, Gert

2012-04-24

Searching for relevant content in a massive amount of multimedia information is facilitated by accurately annotating each image, video, or song with a large number of relevant semantic keywords, or tags. We introduce game-powered machine learning, an integrated approach to annotating multimedia content that combines the effectiveness of human computation, through online games, with the scalability of machine learning. We investigate this framework for labeling music. First, a socially-oriented music annotation game called Herd It collects reliable music annotations based on the "wisdom of the crowds." Second, these annotated examples are used to train a supervised machine learning system. Third, the machine learning system actively directs the annotation games to collect new data that will most benefit future model iterations. Once trained, the system can automatically annotate a corpus of music much larger than what could be labeled using human computation alone. Automatically annotated songs can be retrieved based on their semantic relevance to text-based queries (e.g., "funky jazz with saxophone," "spooky electronica," etc.). Based on the results presented in this paper, we find that actively coupling annotation games with machine learning provides a reliable and scalable approach to making searchable massive amounts of multimedia data.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.

PubMed

Li, Jiao; Sun, Yueping; Johnson, Robin J; Sciaky, Daniela; Wei, Chih-Hsuan; Leaman, Robert; Davis, Allan Peter; Mattingly, Carolyn J; Wiegers, Thomas C; Lu, Zhiyong

2016-01-01

Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
Supporting Handoff in Asynchronous Collaborative Sensemaking Using Knowledge-Transfer Graphs.

PubMed

Zhao, Jian; Glueck, Michael; Isenberg, Petra; Chevalier, Fanny; Khan, Azam

2018-01-01

During asynchronous collaborative analysis, handoff of partial findings is challenging because externalizations produced by analysts may not adequately communicate their investigative process. To address this challenge, we developed techniques to automatically capture and help encode tacit aspects of the investigative process based on an analyst's interactions, and streamline explicit authoring of handoff annotations. We designed our techniques to mediate awareness of analysis coverage, support explicit communication of progress and uncertainty with annotation, and implicit communication through playback of investigation histories. To evaluate our techniques, we developed an interactive visual analysis system, KTGraph, that supports an asynchronous investigative document analysis task. We conducted a two-phase user study to characterize a set of handoff strategies and to compare investigative performance with and without our techniques. The results suggest that our techniques promote the use of more effective handoff strategies, help increase an awareness of prior investigative process and insights, as well as improve final investigative outcomes.
Automated Detection of Microaneurysms Using Scale-Adapted Blob Analysis and Semi-Supervised Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adal, Kedir M.; Sidebe, Desire; Ali, Sharib

2014-01-07

Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are then introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier to detect true MAs. The developed system is built using onlymore » few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images.« less
Pulmonary lobar volumetry using novel volumetric computer-aided diagnosis and computed tomography

PubMed Central

Iwano, Shingo; Kitano, Mariko; Matsuo, Keiji; Kawakami, Kenichi; Koike, Wataru; Kishimoto, Mariko; Inoue, Tsutomu; Li, Yuanzhong; Naganawa, Shinji

2013-01-01

OBJECTIVES To compare the accuracy of pulmonary lobar volumetry using the conventional number of segments method and novel volumetric computer-aided diagnosis using 3D computed tomography images. METHODS We acquired 50 consecutive preoperative 3D computed tomography examinations for lung tumours reconstructed at 1-mm slice thicknesses. We calculated the lobar volume and the emphysematous lobar volume < −950 HU of each lobe using (i) the slice-by-slice method (reference standard), (ii) number of segments method, and (iii) semi-automatic and (iv) automatic computer-aided diagnosis. We determined Pearson correlation coefficients between the reference standard and the three other methods for lobar volumes and emphysematous lobar volumes. We also compared the relative errors among the three measurement methods. RESULTS Both semi-automatic and automatic computer-aided diagnosis results were more strongly correlated with the reference standard than the number of segments method. The correlation coefficients for automatic computer-aided diagnosis were slightly lower than those for semi-automatic computer-aided diagnosis because there was one outlier among 50 cases (2%) in the right upper lobe and two outliers among 50 cases (4%) in the other lobes. The number of segments method relative error was significantly greater than those for semi-automatic and automatic computer-aided diagnosis (P < 0.001). The computational time for automatic computer-aided diagnosis was 1/2 to 2/3 than that of semi-automatic computer-aided diagnosis. CONCLUSIONS A novel lobar volumetry computer-aided diagnosis system could more precisely measure lobar volumes than the conventional number of segments method. Because semi-automatic computer-aided diagnosis and automatic computer-aided diagnosis were complementary, in clinical use, it would be more practical to first measure volumes by automatic computer-aided diagnosis, and then use semi-automatic measurements if automatic computer-aided diagnosis failed. PMID:23526418
Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences.

PubMed

Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou

2006-06-15

The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart.
MIPS: analysis and annotation of proteins from whole genomes in 2005

PubMed Central

Mewes, H. W.; Frishman, D.; Mayer, K. F. X.; Münsterkötter, M.; Noubibou, O.; Pagel, P.; Rattei, T.; Oesterheld, M.; Ruepp, A.; Stümpflen, V.

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (). PMID:16381839
MIPS: analysis and annotation of proteins from whole genomes in 2005.

PubMed

Mewes, H W; Frishman, D; Mayer, K F X; Münsterkötter, M; Noubibou, O; Pagel, P; Rattei, T; Oesterheld, M; Ruepp, A; Stümpflen, V

2006-01-01

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein-protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.gsf.de).
MPEG-7 based video annotation and browsing

NASA Astrophysics Data System (ADS)

Hoeynck, Michael; Auweiler, Thorsten; Wellhausen, Jens

2003-11-01

The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.
Effective user guidance in online interactive semantic segmentation

NASA Astrophysics Data System (ADS)

Petersen, Jens; Bendszus, Martin; Debus, Jürgen; Heiland, Sabine; Maier-Hein, Klaus H.

2017-03-01

With the recent success of machine learning based solutions for automatic image parsing, the availability of reference image annotations for algorithm training is one of the major bottlenecks in medical image segmentation. We are interested in interactive semantic segmentation methods that can be used in an online fashion to generate expert segmentations. These can be used to train automated segmentation techniques or, from an application perspective, for quick and accurate tumor progression monitoring. Using simulated user interactions in a MRI glioblastoma segmentation task, we show that if the user possesses knowledge of the correct segmentation it is significantly (p <= 0.009) better to present data and current segmentation to the user in such a manner that they can easily identify falsely classified regions compared to guiding the user to regions where the classifier exhibits high uncertainty, resulting in differences of mean Dice scores between +0.070 (Whole tumor) and +0.136 (Tumor Core) after 20 iterations. The annotation process should cover all classes equally, which results in a significant (p <= 0.002) improvement compared to completely random annotations anywhere in falsely classified regions for small tumor regions such as the necrotic tumor core (mean Dice +0.151 after 20 it.) and non-enhancing abnormalities (mean Dice +0.069 after 20 it.). These findings provide important insights for the development of efficient interactive segmentation systems and user interfaces.
Semi-Automated Annotation of Biobank Data Using Standard Medical Terminologies in a Graph Database.

PubMed

Hofer, Philipp; Neururer, Sabrina; Goebel, Georg

2016-01-01

Data describing biobank resources frequently contains unstructured free-text information or insufficient coding standards. (Bio-) medical ontologies like Orphanet Rare Diseases Ontology (ORDO) or the Human Disease Ontology (DOID) provide a high number of concepts, synonyms and entity relationship properties. Such standard terminologies increase quality and granularity of input data by adding comprehensive semantic background knowledge from validated entity relationships. Moreover, cross-references between terminology concepts facilitate data integration across databases using different coding standards. In order to encourage the use of standard terminologies, our aim is to identify and link relevant concepts with free-text diagnosis inputs within a biobank registry. Relevant concepts are selected automatically by lexical matching and SPARQL queries against a RDF triplestore. To ensure correctness of annotations, proposed concepts have to be confirmed by medical data administration experts before they are entered into the registry database. Relevant (bio-) medical terminologies describing diseases and phenotypes were identified and stored in a graph database which was tied to a local biobank registry. Concept recommendations during data input trigger a structured description of medical data and facilitate data linkage between heterogeneous systems.
Integrated protocol for reliable and fast quantification and documentation of electrophoresis gels.

PubMed

Rehbein, Peter; Schwalbe, Harald

2015-06-01

Quantitative analysis of electrophoresis gels is an important part in molecular cloning, as well as in protein expression and purification. Parallel quantifications in yield and purity can be most conveniently obtained from densitometric analysis. This communication reports a comprehensive, reliable and simple protocol for gel quantification and documentation, applicable for single samples and with special features for protein expression screens. As major component of the protocol, the fully annotated code of a proprietary open source computer program for semi-automatic densitometric quantification of digitized electrophoresis gels is disclosed. The program ("GelQuant") is implemented for the C-based macro-language of the widespread integrated development environment of IGOR Pro. Copyright © 2014 Elsevier Inc. All rights reserved.
Semantics-enabled service discovery framework in the SIMDAT pharma grid.

PubMed

Qu, Cangtao; Zimmermann, Falk; Kumpf, Kai; Kamuzinzi, Richard; Ledent, Valérie; Herzog, Robert

2008-03-01

We present the design and implementation of a semantics-enabled service discovery framework in the data Grids for process and product development using numerical simulation and knowledge discovery (SIMDAT) Pharma Grid, an industry-oriented Grid environment for integrating thousands of Grid-enabled biological data services and analysis services. The framework consists of three major components: the Web ontology language (OWL)-description logic (DL)-based biological domain ontology, OWL Web service ontology (OWL-S)-based service annotation, and semantic matchmaker based on the ontology reasoning. Built upon the framework, workflow technologies are extensively exploited in the SIMDAT to assist biologists in (semi)automatically performing in silico experiments. We present a typical usage scenario through the case study of a biological workflow: IXodus.
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Crowdsourcing for error detection in cortical surface delineations.

PubMed

Ganz, Melanie; Kondermann, Daniel; Andrulis, Jonas; Knudsen, Gitte Moos; Maier-Hein, Lena

2017-01-01

With the recent trend toward big data analysis, neuroimaging datasets have grown substantially in the past years. While larger datasets potentially offer important insights for medical research, one major bottleneck is the requirement for resources of medical experts needed to validate automatic processing results. To address this issue, the goal of this paper was to assess whether anonymous nonexperts from an online community can perform quality control of MR-based cortical surface delineations derived by an automatic algorithm. So-called knowledge workers from an online crowdsourcing platform were asked to annotate errors in automatic cortical surface delineations on 100 central, coronal slices of MR images. On average, annotations for 100 images were obtained in less than an hour. When using expert annotations as reference, the crowd on average achieves a sensitivity of 82 % and a precision of 42 %. Merging multiple annotations per image significantly improves the sensitivity of the crowd (up to 95 %), but leads to a decrease in precision (as low as 22 %). Our experiments show that the detection of errors in automatic cortical surface delineations generated by anonymous untrained workers is feasible. Future work will focus on increasing the sensitivity of our method further, such that the error detection tasks can be handled exclusively by the crowd and expert resources can be focused on error correction.
Semantic photo synthesis

NASA Astrophysics Data System (ADS)

Johnson, Matthew; Brostow, G. J.; Shotton, J.; Kwatra, V.; Cipolla, R.

2007-02-01

Composite images are synthesized from existing photographs by artists who make concept art, e.g. storyboards for movies or architectural planning. Current techniques allow an artist to fabricate such an image by digitally splicing parts of stock photographs. While these images serve mainly to "quickly" convey how a scene should look, their production is laborious. We propose a technique that allows a person to design a new photograph with substantially less effort. This paper presents a method that generates a composite image when a user types in nouns, such as "boat" and "sand." The artist can optionally design an intended image by specifying other constraints. Our algorithm formulates the constraints as queries to search an automatically annotated image database. The desired photograph, not a collage, is then synthesized using graph-cut optimization, optionally allowing for further user interaction to edit or choose among alternative generated photos. Our results demonstrate our contributions of (1) a method of creating specific images with minimal human effort, and (2) a combined algorithm for automatically building an image library with semantic annotations from any photo collection.
Characterizing artifacts in RR stress test time series.

PubMed

Astudillo-Salinas, Fabian; Palacio-Baus, Kenneth; Solano-Quinde, Lizandro; Medina, Ruben; Wong, Sara

2016-08-01

Electrocardiographic stress test records have a lot of artifacts. In this paper we explore a simple method to characterize the amount of artifacts present in unprocessed RR stress test time series. Four time series classes were defined: Very good lead, Good lead, Low quality lead and Useless lead. 65 ECG, 8 lead, records of stress test series were analyzed. Firstly, RR-time series were annotated by two experts. The automatic methodology is based on dividing the RR-time series in non-overlapping windows. Each window is marked as noisy whenever it exceeds an established standard deviation threshold (SDT). Series are classified according to the percentage of windows that exceeds a given value, based upon the first manual annotation. Different SDT were explored. Results show that SDT close to 20% (as a percentage of the mean) provides the best results. The coincidence between annotators classification is 70.77% whereas, the coincidence between the second annotator and the automatic method providing the best matches is larger than 63%. Leads classified as Very good leads and Good leads could be combined to improve automatic heartbeat labeling.
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

NASA Astrophysics Data System (ADS)

Liu, Wuying; Wang, Lin

2018-03-01

The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.

Density estimation in aerial images of large crowds for automatic people counting

NASA Astrophysics Data System (ADS)

Herrmann, Christian; Metzler, Juergen

2013-05-01

Counting people is a common topic in the area of visual surveillance and crowd analysis. While many image-based solutions are designed to count only a few persons at the same time, like pedestrians entering a shop or watching an advertisement, there is hardly any solution for counting large crowds of several hundred persons or more. We addressed this problem previously by designing a semi-automatic system being able to count crowds consisting of hundreds or thousands of people based on aerial images of demonstrations or similar events. This system requires major user interaction to segment the image. Our principle aim is to reduce this manual interaction. To achieve this, we propose a new and automatic system. Besides counting the people in large crowds, the system yields the positions of people allowing a plausibility check by a human operator. In order to automatize the people counting system, we use crowd density estimation. The determination of crowd density is based on several features like edge intensity or spatial frequency. They indicate the density and discriminate between a crowd and other image regions like buildings, bushes or trees. We compare the performance of our automatic system to the previous semi-automatic system and to manual counting in images. By counting a test set of aerial images showing large crowds containing up to 12,000 people, the performance gain of our new system will be measured. By improving our previous system, we will increase the benefit of an image-based solution for counting people in large crowds.
ChemBrowser: a flexible framework for mining chemical documents.

PubMed

Wu, Xian; Zhang, Li; Chen, Ying; Rhodes, James; Griffin, Thomas D; Boyer, Stephen K; Alba, Alfredo; Cai, Keke

2010-01-01

The ability to extract chemical and biological entities and relations from text documents automatically has great value to biochemical research and development activities. The growing maturity of text mining and artificial intelligence technologies shows promise in enabling such automatic chemical entity extraction capabilities (called "Chemical Annotation" in this paper). Many techniques have been reported in the literature, ranging from dictionary and rule-based techniques to machine learning approaches. In practice, we found that no single technique works well in all cases. A combinatorial approach that allows one to quickly compose different annotation techniques together for a given situation is most effective. In this paper, we describe the key challenges we face in real-world chemical annotation scenarios. We then present a solution called ChemBrowser which has a flexible framework for chemical annotation. ChemBrowser includes a suite of customizable processing units that might be utilized in a chemical annotator, a high-level language that describes the composition of various processing units that would form a chemical annotator, and an execution engine that translates the composition language to an actual annotator that can generate annotation results for a given set of documents. We demonstrate the impact of this approach by tailoring an annotator for extracting chemical names from patent documents and show how this annotator can be easily modified with simple configuration alone.
Argo: an integrative, interactive, text mining-based workbench supporting curation

PubMed Central

Rak, Rafal; Rowley, Andrew; Black, William; Ananiadou, Sophia

2012-01-01

Curation of biomedical literature is often supported by the automatic analysis of textual content that generally involves a sequence of individual processing components. Text mining (TM) has been used to enhance the process of manual biocuration, but has been focused on specific databases and tasks rather than an environment integrating TM tools into the curation pipeline, catering for a variety of tasks, types of information and applications. Processing components usually come from different sources and often lack interoperability. The well established Unstructured Information Management Architecture is a framework that addresses interoperability by defining common data structures and interfaces. However, most of the efforts are targeted towards software developers and are not suitable for curators, or are otherwise inconvenient to use on a higher level of abstraction. To overcome these issues we introduce Argo, an interoperable, integrative, interactive and collaborative system for text analysis with a convenient graphic user interface to ease the development of processing workflows and boost productivity in labour-intensive manual curation. Robust, scalable text analytics follow a modular approach, adopting component modules for distinct levels of text analysis. The user interface is available entirely through a web browser that saves the user from going through often complicated and platform-dependent installation procedures. Argo comes with a predefined set of processing components commonly used in text analysis, while giving the users the ability to deposit their own components. The system accommodates various areas and levels of user expertise, from TM and computational linguistics to ontology-based curation. One of the key functionalities of Argo is its ability to seamlessly incorporate user-interactive components, such as manual annotation editors, into otherwise completely automatic pipelines. As a use case, we demonstrate the functionality of an in-built manual annotation editor that is well suited for in-text corpus annotation tasks. Database URL: http://www.nactem.ac.uk/Argo PMID:22434844
Association algorithm to mine the rules that govern enzyme definition and to classify protein sequences

PubMed Central

Chiu, Shih-Hau; Chen, Chien-Chi; Yuan, Gwo-Fang; Lin, Thy-Hou

2006-01-01

Background The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity. Results There were five datasets collected from the Swiss-Prot for establishing the annotation rules. These were treated as the training sets. The TrEMBL entries were treated as the testing set. A correct enzyme classification rate of 70% was obtained for the prokaryote datasets and a similar rate of about 80% was obtained for the eukaryote datasets. The fungus training dataset which lacks an enzyme class description was also used to evaluate the fungus candidate rules. A total of 88 out of 5085 test entries were matched with the fungus rule set. These were otherwise poorly annotated using their functional descriptions. Conclusion The feasibility of using the method presented here to classify enzyme classes based on the enzyme domain rules is evident. The rules may be also employed by the protein annotators in manual annotation or implemented in an automatic annotation flowchart. PMID:16776838
ISEScan: automated identification of insertion sequence elements in prokaryotic genomes.

PubMed

Xie, Zhiqun; Tang, Haixu

2017-11-01

The insertion sequence (IS) elements are the smallest but most abundant autonomous transposable elements in prokaryotic genomes, which play a key role in prokaryotic genome organization and evolution. With the fast growing genomic data, it is becoming increasingly critical for biology researchers to be able to accurately and automatically annotate ISs in prokaryotic genome sequences. The available automatic IS annotation systems are either providing only incomplete IS annotation or relying on the availability of existing genome annotations. Here, we present a new IS elements annotation pipeline to address these issues. ISEScan is a highly sensitive software pipeline based on profile hidden Markov models constructed from manually curated IS elements. ISEScan performs better than existing IS annotation systems when tested on prokaryotic genomes with curated annotations of IS elements. Applying it to 2784 prokaryotic genomes, we report the global distribution of IS families across taxonomic clades in Archaea and Bacteria. ISEScan is implemented in Python and released as an open source software at https://github.com/xiezhq/ISEScan. hatang@indiana.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

PubMed

Oronoz, Maite; Gojenola, Koldo; Pérez, Alicia; de Ilarraza, Arantza Díaz; Casillas, Arantza

2015-08-01

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning. Copyright © 2015 Elsevier Inc. All rights reserved.
Towards objective and reproducible study of patient-doctor interaction: Automatic text analysis based VR-CoDES annotation of consultation transcripts.

PubMed

Birkett, Charlotte; Arandjelovic, Ognjen; Humphris, Gerald

2017-07-01

While increasingly appreciated for its importance, the interaction between health care professionals (HCP) and patients is notoriously difficult to study, with both methodological and practical challenges. The former has been addressed by the so-called Verona coding definitions of emotional sequences (VR-CoDES) - a system for identifying and coding patient emotions and the corresponding HCP responses - shown to be reliable and informative in a number of independent studies in different health care delivery contexts. In the preset work we focus on the practical challenge of the scalability of this coding system, namely on making it easily usable more widely and on applying it on larger patient cohorts. In particular, VR-CoDES is inherently complex and training is required to ensure consistent annotation of audio recordings or textual transcripts of consultations. Following up on our previous pilot investigation, in the the present paper we describe the first automatic, computer based algorithm capable of providing coarse level coding of textual transcripts. We investigate different representations of patient utterances and classification methodologies, and label each utterance as either containing an explicit expression of emotional distress (a `concern'), an implicit one (a `cue'), or neither. Using a data corpus comprising 200 consultations between radiotherapists and adult female breast cancer patients we demonstrate excellent labelling performance.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.

PubMed

Dudek, Christian-Alexander; Dannheim, Henning; Schomburg, Dietmar

2017-01-01

The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation

PubMed Central

Schomburg, Dietmar

2017-01-01

The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. PMID:28750104
Compositional mining of multiple object API protocols through state abstraction.

PubMed

Dai, Ziying; Mao, Xiaoguang; Lei, Yan; Qi, Yuhua; Wang, Rui; Gu, Bin

2013-01-01

API protocols specify correct sequences of method invocations. Despite their usefulness, API protocols are often unavailable in practice because writing them is cumbersome and error prone. Multiple object API protocols are more expressive than single object API protocols. However, the huge number of objects of typical object-oriented programs poses a major challenge to the automatic mining of multiple object API protocols: besides maintaining scalability, it is important to capture various object interactions. Current approaches utilize various heuristics to focus on small sets of methods. In this paper, we present a general, scalable, multiple object API protocols mining approach that can capture all object interactions. Our approach uses abstract field values to label object states during the mining process. We first mine single object typestates as finite state automata whose transitions are annotated with states of interacting objects before and after the execution of the corresponding method and then construct multiple object API protocols by composing these annotated single object typestates. We implement our approach for Java and evaluate it through a series of experiments.
Compositional Mining of Multiple Object API Protocols through State Abstraction

PubMed Central

Mao, Xiaoguang; Qi, Yuhua; Wang, Rui; Gu, Bin

2013-01-01

API protocols specify correct sequences of method invocations. Despite their usefulness, API protocols are often unavailable in practice because writing them is cumbersome and error prone. Multiple object API protocols are more expressive than single object API protocols. However, the huge number of objects of typical object-oriented programs poses a major challenge to the automatic mining of multiple object API protocols: besides maintaining scalability, it is important to capture various object interactions. Current approaches utilize various heuristics to focus on small sets of methods. In this paper, we present a general, scalable, multiple object API protocols mining approach that can capture all object interactions. Our approach uses abstract field values to label object states during the mining process. We first mine single object typestates as finite state automata whose transitions are annotated with states of interacting objects before and after the execution of the corresponding method and then construct multiple object API protocols by composing these annotated single object typestates. We implement our approach for Java and evaluate it through a series of experiments. PMID:23844378
The CHEMDNER corpus of chemicals and drugs and its annotation principles.

PubMed

Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The CHEMDNER corpus of chemicals and drugs and its annotation principles

PubMed Central

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773
MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

PubMed Central

Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi

2013-01-01

Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518
Triage by ranking to support the curation of protein interactions

PubMed Central

Pasche, Emilie; Gobeill, Julien; Rech de Laval, Valentine; Gleizes, Anne; Michel, Pierre-André; Bairoch, Amos

2017-01-01

Abstract Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5 PMID:29220432
MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC-MS metabolomic datasets.

PubMed

Edmands, William M B; Barupal, Dinesh K; Scalbert, Augustin

2015-03-01

MetMSLine represents a complete collection of functions in the R programming language as an accessible GUI for biomarker discovery in large-scale liquid-chromatography high-resolution mass spectral datasets from acquisition through to final metabolite identification forming a backend to output from any peak-picking software such as XCMS. MetMSLine automatically creates subdirectories, data tables and relevant figures at the following steps: (i) signal smoothing, normalization, filtration and noise transformation (PreProc.QC.LSC.R); (ii) PCA and automatic outlier removal (Auto.PCA.R); (iii) automatic regression, biomarker selection, hierarchical clustering and cluster ion/artefact identification (Auto.MV.Regress.R); (iv) Biomarker-MS/MS fragmentation spectra matching and fragment/neutral loss annotation (Auto.MS.MS.match.R) and (v) semi-targeted metabolite identification based on a list of theoretical masses obtained from public databases (DBAnnotate.R). All source code and suggested parameters are available in an un-encapsulated layout on http://wmbedmands.github.io/MetMSLine/. Readme files and a synthetic dataset of both X-variables (simulated LC-MS data), Y-variables (simulated continuous variables) and metabolite theoretical masses are also available on our GitHub repository. © The Author 2014. Published by Oxford University Press.
MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC–MS metabolomic datasets

PubMed Central

Edmands, William M. B.; Barupal, Dinesh K.; Scalbert, Augustin

2015-01-01

Summary: MetMSLine represents a complete collection of functions in the R programming language as an accessible GUI for biomarker discovery in large-scale liquid-chromatography high-resolution mass spectral datasets from acquisition through to final metabolite identification forming a backend to output from any peak-picking software such as XCMS. MetMSLine automatically creates subdirectories, data tables and relevant figures at the following steps: (i) signal smoothing, normalization, filtration and noise transformation (PreProc.QC.LSC.R); (ii) PCA and automatic outlier removal (Auto.PCA.R); (iii) automatic regression, biomarker selection, hierarchical clustering and cluster ion/artefact identification (Auto.MV.Regress.R); (iv) Biomarker—MS/MS fragmentation spectra matching and fragment/neutral loss annotation (Auto.MS.MS.match.R) and (v) semi-targeted metabolite identification based on a list of theoretical masses obtained from public databases (DBAnnotate.R). Availability and implementation: All source code and suggested parameters are available in an un-encapsulated layout on http://wmbedmands.github.io/MetMSLine/. Readme files and a synthetic dataset of both X-variables (simulated LC–MS data), Y-variables (simulated continuous variables) and metabolite theoretical masses are also available on our GitHub repository. Contact: ScalbertA@iarc.fr PMID:25348215
Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways.

PubMed

Saidi, Rabie; Boudellioua, Imane; Martin, Maria J; Solovyev, Victor

2017-01-01

It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents

NASA Astrophysics Data System (ADS)

Thiam, Mouhamadou; Bennacer, Nacéra; Pernelle, Nathalie; Lô, Moussa

SHIRIis an ontology-based system for integration of semi-structured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extract-Align algorithm enriches the ontology and the number of terms (or named entities) aligned directly with the ontology increases.
Automated detection of microaneurysms using scale-adapted blob analysis and semi-supervised learning.

PubMed

Adal, Kedir M; Sidibé, Désiré; Ali, Sharib; Chaum, Edward; Karnowski, Thomas P; Mériaudeau, Fabrice

2014-04-01

Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier which can detect true MAs. The developed system is built using only few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

GFam: a platform for automatic annotation of gene families.

PubMed

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-10-01

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments.

PubMed

Han, Wenjing; Coutinho, Eduardo; Ruan, Huabin; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

2016-01-01

Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.
A Simple and Practical Dictionary-based Approach for Identification of Proteins in Medline Abstracts

PubMed Central

Egorov, Sergei; Yuryev, Anton; Daraselia, Nikolai

2004-01-01

Objective: The aim of this study was to develop a practical and efficient protein identification system for biomedical corpora. Design: The developed system, called ProtScan, utilizes a carefully constructed dictionary of mammalian proteins in conjunction with a specialized tokenization algorithm to identify and tag protein name occurrences in biomedical texts and also takes advantage of Medline “Name-of-Substance” (NOS) annotation. The dictionaries for ProtScan were constructed in a semi-automatic way from various public-domain sequence databases followed by an intensive expert curation step. Measurements: The recall and precision of the system have been determined using 1,000 randomly selected and hand-tagged Medline abstracts. Results: The developed system is capable of identifying protein occurrences in Medline abstracts with a 98% precision and 88% recall. It was also found to be capable of processing approximately 300 abstracts per second. Without utilization of NOS annotation, precision and recall were found to be 98.5% and 84%, respectively. Conclusion: The developed system appears to be well suited for protein-based Medline indexing and can help to improve biomedical information retrieval. Further approaches to ProtScan's recall improvement also are discussed. PMID:14764613
Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments

PubMed Central

Han, Wenjing; Coutinho, Eduardo; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

2016-01-01

Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances. PMID:27627768
Workflow and web application for annotating NCBI BioProject transcriptome data

PubMed Central

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David

2017-01-01

Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

PubMed

Agarwal, Shashank; Yu, Hong

2009-12-01

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.
Functional-to-form mapping for assembly design automation

NASA Astrophysics Data System (ADS)

Xu, Z. G.; Liu, W. M.; Shen, W. D.; Yang, D. Y.; Liu, T. T.

2017-11-01

Assembly-level function-to-form mapping is the most effective procedure towards design automation. The research work mainly includes: the assembly-level function definitions, product network model and the two-step mapping mechanisms. The function-to-form mapping is divided into two steps, i.e. mapping of function-to-behavior, called the first-step mapping, and the second-step mapping, i.e. mapping of behavior-to-structure. After the first step mapping, the three dimensional transmission chain (or 3D sketch) is studied, and the feasible design computing tools are developed. The mapping procedure is relatively easy to be implemented interactively, but, it is quite difficult to finish it automatically. So manual, semi-automatic, automatic and interactive modification of the mapping model are studied. A mechanical hand F-F mapping process is illustrated to verify the design methodologies.
AISLE: an automatic volumetric segmentation method for the study of lung allometry.

PubMed

Ren, Hongliang; Kazanzides, Peter

2011-01-01

We developed a fully automatic segmentation method for volumetric CT (computer tomography) datasets to support construction of a statistical atlas for the study of allometric laws of the lung. The proposed segmentation method, AISLE (Automated ITK-Snap based on Level-set), is based on the level-set implementation from an existing semi-automatic segmentation program, ITK-Snap. AISLE can segment the lung field without human interaction and provide intermediate graphical results as desired. The preliminary experimental results show that the proposed method can achieve accurate segmentation, in terms of volumetric overlap metric, by comparing with the ground-truth segmentation performed by a radiologist.
Comparison between manual and semi-automatic segmentation of nasal cavity and paranasal sinuses from CT images.

PubMed

Tingelhoff, K; Moral, A I; Kunkel, M E; Rilk, M; Wagner, I; Eichhorn, K G; Wahl, F M; Bootz, F

2007-01-01

Segmentation of medical image data is getting more and more important over the last years. The results are used for diagnosis, surgical planning or workspace definition of robot-assisted systems. The purpose of this paper is to find out whether manual or semi-automatic segmentation is adequate for ENT surgical workflow or whether fully automatic segmentation of paranasal sinuses and nasal cavity is needed. We present a comparison of manual and semi-automatic segmentation of paranasal sinuses and the nasal cavity. Manual segmentation is performed by custom software whereas semi-automatic segmentation is realized by a commercial product (Amira). For this study we used a CT dataset of the paranasal sinuses which consists of 98 transversal slices, each 1.0 mm thick, with a resolution of 512 x 512 pixels. For the analysis of both segmentation procedures we used volume, extension (width, length and height), segmentation time and 3D-reconstruction. The segmentation time was reduced from 960 minutes with manual to 215 minutes with semi-automatic segmentation. We found highest variances segmenting nasal cavity. For the paranasal sinuses manual and semi-automatic volume differences are not significant. Dependent on the segmentation accuracy both approaches deliver useful results and could be used for e.g. robot-assisted systems. Nevertheless both procedures are not useful for everyday surgical workflow, because they take too much time. Fully automatic and reproducible segmentation algorithms are needed for segmentation of paranasal sinuses and nasal cavity.
Comparison and assessment of semi-automatic image segmentation in computed tomography scans for image-guided kidney surgery.

PubMed

Glisson, Courtenay L; Altamar, Hernan O; Herrell, S Duke; Clark, Peter; Galloway, Robert L

2011-11-01

Image segmentation is integral to implementing intraoperative guidance for kidney tumor resection. Results seen in computed tomography (CT) data are affected by target organ physiology as well as by the segmentation algorithm used. This work studies variables involved in using level set methods found in the Insight Toolkit to segment kidneys from CT scans and applies the results to an image guidance setting. A composite algorithm drawing on the strengths of multiple level set approaches was built using the Insight Toolkit. This algorithm requires image contrast state and seed points to be identified as input, and functions independently thereafter, selecting and altering method and variable choice as needed. Semi-automatic results were compared to expert hand segmentation results directly and by the use of the resultant surfaces for registration of intraoperative data. Direct comparison using the Dice metric showed average agreement of 0.93 between semi-automatic and hand segmentation results. Use of the segmented surfaces in closest point registration of intraoperative laser range scan data yielded average closest point distances of approximately 1 mm. Application of both inverse registration transforms from the previous step to all hand segmented image space points revealed that the distance variability introduced by registering to the semi-automatically segmented surface versus the hand segmented surface was typically less than 3 mm both near the tumor target and at distal points, including subsurface points. Use of the algorithm shortened user interaction time and provided results which were comparable to the gold standard of hand segmentation. Further, the use of the algorithm's resultant surfaces in image registration provided comparable transformations to surfaces produced by hand segmentation. These data support the applicability and utility of such an algorithm as part of an image guidance workflow.
The Ensembl genome database project.

PubMed

Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

2002-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Preliminary clinical evaluation of semi-automated nailfold capillaroscopy in the assessment of patients with Raynaud's phenomenon.

PubMed

Murray, Andrea K; Feng, Kaiyan; Moore, Tonia L; Allen, Phillip D; Taylor, Christopher J; Herrick, Ariane L

2011-08-01

Nailfold capillaroscopy is well established in screening patients with Raynaud's phenomenon for underlying SSc-spectrum disorders, by identifying abnormal capillaries. Our aim was to compare semi-automatic feature measurement from newly developed software with manual measurements, and determine the degree to which semi-automated data allows disease group classification. Images from 46 healthy controls, 21 patients with PRP and 49 with SSc were preprocessed, and semi-automated measurements of intercapillary distance and capillary width, tortuosity, and derangement were performed. These were compared with manual measurements. Features were used to classify images into the three subject groups. Comparison of automatic and manual measures for distance, width, tortuosity, and derangement had correlations of r=0.583, 0.624, 0.495 (p<0.001), and 0.195 (p=0.040). For automatic measures, correlations were found between width and intercapillary distance, r=0.374, and width and tortuosity, r=0.573 (p<0.001). Significant differences between subject groups were found for all features (p<0.002). Overall, 75% of images correctly matched clinical classification using semi-automated features, compared with 71% for manual measurements. Semi-automatic and manual measurements of distance, width, and tortuosity showed moderate (but statistically significant) correlations. Correlation for derangement was weaker. Semi-automatic measurements are faster than manual measurements. Semi-automatic parameters identify differences between groups, and are as good as manual measurements for between-group classification. © 2011 John Wiley & Sons Ltd.
Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives

PubMed Central

Polepalli Ramesh, Balaji; Belknap, Steven M; Li, Zuofeng; Frid, Nadya; West, Dennis P

2014-01-01

Background The Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) is a repository of spontaneously-reported adverse drug events (ADEs) for FDA-approved prescription drugs. FAERS reports include both structured reports and unstructured narratives. The narratives often include essential information for evaluation of the severity, causality, and description of ADEs that are not present in the structured data. The timely identification of unknown toxicities of prescription drugs is an important, unsolved problem. Objective The objective of this study was to develop an annotated corpus of FAERS narratives and biomedical named entity tagger to automatically identify ADE related information in the FAERS narratives. Methods We developed an annotation guideline and annotate medication information and adverse event related entities on 122 FAERS narratives comprising approximately 23,000 word tokens. A named entity tagger using supervised machine learning approaches was built for detecting medication information and adverse event entities using various categories of features. Results The annotated corpus had an agreement of over .9 Cohen’s kappa for medication and adverse event entities. The best performing tagger achieves an overall performance of 0.73 F1 score for detection of medication, adverse event and other named entities. Conclusions In this study, we developed an annotated corpus of FAERS narratives and machine learning based models for automatically extracting medication and adverse event information from the FAERS narratives. Our study is an important step towards enriching the FAERS data for postmarketing pharmacovigilance. PMID:25600332
New directions in biomedical text annotation: definitions, guidelines and corpus construction

PubMed Central

Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit

2006-01-01

Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
A way toward analyzing high-content bioimage data by means of semantic annotation and visual data mining

NASA Astrophysics Data System (ADS)

Herold, Julia; Abouna, Sylvie; Zhou, Luxian; Pelengaris, Stella; Epstein, David B. A.; Khan, Michael; Nattkemper, Tim W.

2009-02-01

In the last years, bioimaging has turned from qualitative measurements towards a high-throughput and highcontent modality, providing multiple variables for each biological sample analyzed. We present a system which combines machine learning based semantic image annotation and visual data mining to analyze such new multivariate bioimage data. Machine learning is employed for automatic semantic annotation of regions of interest. The annotation is the prerequisite for a biological object-oriented exploration of the feature space derived from the image variables. With the aid of visual data mining, the obtained data can be explored simultaneously in the image as well as in the feature domain. Especially when little is known of the underlying data, for example in the case of exploring the effects of a drug treatment, visual data mining can greatly aid the process of data evaluation. We demonstrate how our system is used for image evaluation to obtain information relevant to diabetes study and screening of new anti-diabetes treatments. Cells of the Islet of Langerhans and whole pancreas in pancreas tissue samples are annotated and object specific molecular features are extracted from aligned multichannel fluorescence images. These are interactively evaluated for cell type classification in order to determine the cell number and mass. Only few parameters need to be specified which makes it usable also for non computer experts and allows for high-throughput analysis.
The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

PubMed Central

Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.

2007-01-01

The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268
Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

PubMed Central

Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John

2008-01-01

Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948
Confidence-based ensemble for GBM brain tumor segmentation

NASA Astrophysics Data System (ADS)

Huo, Jing; van Rikxoort, Eva M.; Okada, Kazunori; Kim, Hyun J.; Pope, Whitney; Goldin, Jonathan; Brown, Matthew

2011-03-01

It is a challenging task to automatically segment glioblastoma multiforme (GBM) brain tumors on T1w post-contrast isotropic MR images. A semi-automated system using fuzzy connectedness has recently been developed for computing the tumor volume that reduces the cost of manual annotation. In this study, we propose a an ensemble method that combines multiple segmentation results into a final ensemble one. The method is evaluated on a dataset of 20 cases from a multi-center pharmaceutical drug trial and compared to the fuzzy connectedness method. Three individual methods were used in the framework: fuzzy connectedness, GrowCut, and voxel classification. The combination method is a confidence map averaging (CMA) method. The CMA method shows an improved ROC curve compared to the fuzzy connectedness method (p < 0.001). The CMA ensemble result is more robust compared to the three individual methods.
Automatic textual annotation of video news based on semantic visual object extraction

NASA Astrophysics Data System (ADS)

Boujemaa, Nozha; Fleuret, Francois; Gouet, Valerie; Sahbi, Hichem

2003-12-01

In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
Automatic blood vessel based-liver segmentation using the portal phase abdominal CT

NASA Astrophysics Data System (ADS)

Maklad, Ahmed S.; Matsuhiro, Mikio; Suzuki, Hidenobu; Kawata, Yoshiki; Niki, Noboru; Shimada, Mitsuo; Iinuma, Gen

2018-02-01

Liver segmentation is the basis for computer-based planning of hepatic surgical interventions. In diagnosis and analysis of hepatic diseases and surgery planning, automatic segmentation of liver has high importance. Blood vessel (BV) has showed high performance at liver segmentation. In our previous work, we developed a semi-automatic method that segments the liver through the portal phase abdominal CT images in two stages. First stage was interactive segmentation of abdominal blood vessels (ABVs) and subsequent classification into hepatic (HBVs) and non-hepatic (non-HBVs). This stage had 5 interactions that include selective threshold for bone segmentation, selecting two seed points for kidneys segmentation, selection of inferior vena cava (IVC) entrance for starting ABVs segmentation, identification of the portal vein (PV) entrance to the liver and the IVC-exit for classifying HBVs from other ABVs (non-HBVs). Second stage is automatic segmentation of the liver based on segmented ABVs as described in [4]. For full automation of our method we developed a method [5] that segments ABVs automatically tackling the first three interactions. In this paper, we propose full automation of classifying ABVs into HBVs and non- HBVs and consequently full automation of liver segmentation that we proposed in [4]. Results illustrate that the method is effective at segmentation of the liver through the portal abdominal CT images.

Common data model for natural language processing based on two existing standard information models: CDA+GrAF.

PubMed

Meystre, Stéphane M; Lee, Sanghoon; Jung, Chai Young; Chevrier, Raphaël D

2012-08-01

An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text. We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled "CDA+GrAF". We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks. This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and "translating" annotations between different NLP applications, and eventually "plug-and-play" of different modules in NLP applications. Copyright © 2011 Elsevier Inc. All rights reserved.
NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

PubMed

Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke

2017-03-20

The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .
Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

PubMed

Song, Min; Yu, Hwanjo; Han, Wook-Shin

2011-11-24

Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Active Deep Learning-Based Annotation of Electroencephalography Reports for Cohort Identification

PubMed Central

Maldonado, Ramon; Goodwin, Travis R; Harabagiu, Sanda M

2017-01-01

The annotation of a large corpus of Electroencephalography (EEG) reports is a crucial step in the development of an EEG-specific patient cohort retrieval system. The annotation of multiple types of EEG-specific medical concepts, along with their polarity and modality, is challenging, especially when automatically performed on Big Data. To address this challenge, we present a novel framework which combines the advantages of active and deep learning while producing annotations that capture a variety of attributes of medical concepts. Results obtained through our novel framework show great promise. PMID:28815135
Workflow and web application for annotating NCBI BioProject transcriptome data.

PubMed

Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A; Barrero, Luz S; Landsman, David; Mariño-Ramírez, Leonardo

2017-01-01

The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.

PubMed

Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten

2015-01-01

The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
MicroScope: a platform for microbial genome annotation and comparative genomics

PubMed Central

Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493
MicroScope: a platform for microbial genome annotation and comparative genomics.

PubMed

Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.
Synthesizing Certified Code

NASA Technical Reports Server (NTRS)

Whalen, Michael; Schumann, Johann; Fischer, Bernd

2002-01-01

Code certification is a lightweight approach to demonstrate software quality on a formal level. Its basic idea is to require producers to provide formal proofs that their code satisfies certain quality properties. These proofs serve as certificates which can be checked independently. Since code certification uses the same underlying technology as program verification, it also requires many detailed annotations (e.g., loop invariants) to make the proofs possible. However, manually adding theses annotations to the code is time-consuming and error-prone. We address this problem by combining code certification with automatic program synthesis. We propose an approach to generate simultaneously, from a high-level specification, code and all annotations required to certify generated code. Here, we describe a certification extension of AUTOBAYES, a synthesis tool which automatically generates complex data analysis programs from compact specifications. AUTOBAYES contains sufficient high-level domain knowledge to generate detailed annotations. This allows us to use a general-purpose verification condition generator to produce a set of proof obligations in first-order logic. The obligations are then discharged using the automated theorem E-SETHEO. We demonstrate our approach by certifying operator safety for a generated iterative data classification program without manual annotation of the code.
Extending bicluster analysis to annotate unclassified ORFs and predict novel functional modules using expression data

PubMed Central

Bryan, Kenneth; Cunningham, Pádraig

2008-01-01

Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786
DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

PubMed Central

Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

2016-01-01

Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237
Biologically inspired EM image alignment and neural reconstruction.

PubMed

Knowles-Barley, Seymour; Butcher, Nancy J; Meinertzhagen, Ian A; Armstrong, J Douglas

2011-08-15

Three-dimensional reconstruction of consecutive serial-section transmission electron microscopy (ssTEM) images of neural tissue currently requires many hours of manual tracing and annotation. Several computational techniques have already been applied to ssTEM images to facilitate 3D reconstruction and ease this burden. Here, we present an alternative computational approach for ssTEM image analysis. We have used biologically inspired receptive fields as a basis for a ridge detection algorithm to identify cell membranes, synaptic contacts and mitochondria. Detected line segments are used to improve alignment between consecutive images and we have joined small segments of membrane into cell surfaces using a dynamic programming algorithm similar to the Needleman-Wunsch and Smith-Waterman DNA sequence alignment procedures. A shortest path-based approach has been used to close edges and achieve image segmentation. Partial reconstructions were automatically generated and used as a basis for semi-automatic reconstruction of neural tissue. The accuracy of partial reconstructions was evaluated and 96% of membrane could be identified at the cost of 13% false positive detections. An open-source reference implementation is available in the Supplementary information. seymour.kb@ed.ac.uk; douglas.armstrong@ed.ac.uk Supplementary data are available at Bioinformatics online.
Classification of Chemical Compounds to Support Complex Queries in a Pathway Database

PubMed Central

Weidemann, Andreas; Kania, Renate; Peiss, Christian; Rojas, Isabel

2004-01-01

Data quality in biological databases has become a topic of great discussion. To provide high quality data and to deal with the vast amount of biochemical data, annotators and curators need to be supported by software that carries out part of their work in an (semi-) automatic manner. The detection of errors and inconsistencies is a part that requires the knowledge of domain experts, thus in most cases it is done manually, making it very expensive and time-consuming. This paper presents two tools to partially support the curation of data on biochemical pathways. The tool enables the automatic classification of chemical compounds based on their respective SMILES strings. Such classification allows the querying and visualization of biochemical reactions at different levels of abstraction, according to the level of detail at which the reaction participants are described. Chemical compounds can be classified in a flexible manner based on different criteria. The support of the process of data curation is provided by facilitating the detection of compounds that are identified as different but that are actually the same. This is also used to identify similar reactions and, in turn, pathways. PMID:18629066
Semi-Automatic Modelling of Building FAÇADES with Shape Grammars Using Historic Building Information Modelling

NASA Astrophysics Data System (ADS)

Dore, C.; Murphy, M.

2013-02-01

This paper outlines a new approach for generating digital heritage models from laser scan or photogrammetric data using Historic Building Information Modelling (HBIM). HBIM is a plug-in for Building Information Modelling (BIM) software that uses parametric library objects and procedural modelling techniques to automate the modelling stage. The HBIM process involves a reverse engineering solution whereby parametric interactive objects representing architectural elements are mapped onto laser scan or photogrammetric survey data. A library of parametric architectural objects has been designed from historic manuscripts and architectural pattern books. These parametric objects were built using an embedded programming language within the ArchiCAD BIM software called Geometric Description Language (GDL). Procedural modelling techniques have been implemented with the same language to create a parametric building façade which automatically combines library objects based on architectural rules and proportions. Different configurations of the façade are controlled by user parameter adjustment. The automatically positioned elements of the façade can be subsequently refined using graphical editing while overlaying the model with orthographic imagery. Along with this semi-automatic method for generating façade models, manual plotting of library objects can also be used to generate a BIM model from survey data. After the 3D model has been completed conservation documents such as plans, sections, elevations and 3D views can be automatically generated for conservation projects.
GeneTopics - interpretation of gene sets via literature-driven topic models

PubMed Central

2013-01-01

Background Annotation of a set of genes is often accomplished through comparison to a library of labelled gene sets such as biological processes or canonical pathways. However, this approach might fail if the employed libraries are not up to date with the latest research, don't capture relevant biological themes or are curated at a different level of granularity than is required to appropriately analyze the input gene set. At the same time, the vast biomedical literature offers an unstructured repository of the latest research findings that can be tapped to provide thematic sub-groupings for any input gene set. Methods Our proposed method relies on a gene-specific text corpus and extracts commonalities between documents in an unsupervised manner using a topic model approach. We automatically determine the number of topics summarizing the corpus and calculate a gene relevancy score for each topic allowing us to eliminate non-specific topics. As a result we obtain a set of literature topics in which each topic is associated with a subset of the input genes providing directly interpretable keywords and corresponding documents for literature research. Results We validate our method based on labelled gene sets from the KEGG metabolic pathway collection and the genetic association database (GAD) and show that the approach is able to detect topics consistent with the labelled annotation. Furthermore, we discuss the results on three different types of experimentally derived gene sets, (1) differentially expressed genes from a cardiac hypertrophy experiment in mice, (2) altered transcript abundance in human pancreatic beta cells, and (3) genes implicated by GWA studies to be associated with metabolite levels in a healthy population. In all three cases, we are able to replicate findings from the original papers in a quick and semi-automated manner. Conclusions Our approach provides a novel way of automatically generating meaningful annotations for gene sets that are directly tied to relevant articles in the literature. Extending a general topic model method, the approach introduced here establishes a workflow for the interpretation of gene sets generated from diverse experimental scenarios that can complement the classical approach of comparison to reference gene sets. PMID:24564875
Comparison Of Semi-Automatic And Automatic Slick Detection Algorithms For Jiyeh Power Station Oil Spill, Lebanon

NASA Astrophysics Data System (ADS)

Osmanoglu, B.; Ozkan, C.; Sunar, F.

2013-10-01

After air strikes on July 14 and 15, 2006 the Jiyeh Power Station started leaking oil into the eastern Mediterranean Sea. The power station is located about 30 km south of Beirut and the slick covered about 170 km of coastline threatening the neighboring countries Turkey and Cyprus. Due to the ongoing conflict between Israel and Lebanon, cleaning efforts could not start immediately resulting in 12 000 to 15 000 tons of fuel oil leaking into the sea. In this paper we compare results from automatic and semi-automatic slick detection algorithms. The automatic detection method combines the probabilities calculated for each pixel from each image to obtain a joint probability, minimizing the adverse effects of atmosphere on oil spill detection. The method can readily utilize X-, C- and L-band data where available. Furthermore wind and wave speed observations can be used for a more accurate analysis. For this study, we utilize Envisat ASAR ScanSAR data. A probability map is generated based on the radar backscatter, effect of wind and dampening value. The semi-automatic algorithm is based on supervised classification. As a classifier, Artificial Neural Network Multilayer Perceptron (ANN MLP) classifier is used since it is more flexible and efficient than conventional maximum likelihood classifier for multisource and multi-temporal data. The learning algorithm for ANN MLP is chosen as the Levenberg-Marquardt (LM). Training and test data for supervised classification are composed from the textural information created from SAR images. This approach is semiautomatic because tuning the parameters of classifier and composing training data need a human interaction. We point out the similarities and differences between the two methods and their results as well as underlining their advantages and disadvantages. Due to the lack of ground truth data, we compare obtained results to each other, as well as other published oil slick area assessments.
Interactive segmentation of tongue contours in ultrasound video sequences using quality maps

NASA Astrophysics Data System (ADS)

Ghrenassia, Sarah; Ménard, Lucie; Laporte, Catherine

2014-03-01

Ultrasound (US) imaging is an effective and non invasive way of studying the tongue motions involved in normal and pathological speech, and the results of US studies are of interest for the development of new strategies in speech therapy. State-of-the-art tongue shape analysis techniques based on US images depend on semi-automated tongue segmentation and tracking techniques. Recent work has mostly focused on improving the accuracy of the tracking techniques themselves. However, occasional errors remain inevitable, regardless of the technique used, and the tongue tracking process must thus be supervised by a speech scientist who will correct these errors manually or semi-automatically. This paper proposes an interactive framework to facilitate this process. In this framework, the user is guided towards potentially problematic portions of the US image sequence by a segmentation quality map that is based on the normalized energy of an active contour model and automatically produced during tracking. When a problematic segmentation is identified, corrections to the segmented contour can be made on one image and propagated both forward and backward in the problematic subsequence, thereby improving the user experience. The interactive tools were tested in combination with two different tracking algorithms. Preliminary results illustrate the potential of the proposed framework, suggesting that the proposed framework generally improves user interaction time, with little change in segmentation repeatability.
An Infrastructure for Indexing and Organizing Best Practices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Liming; Staples, Mark; Gorton, Ian

Industry best practices are widely held but not necessarily empirically verified software engineering beliefs. Best practices can be documented in distributed web-based public repositories as pattern catalogues or practice libraries. There is a need to systematically index and organize these practices to enable their better practical use and scientific evaluation. In this paper, we propose a semi-automatic approach to index and organise best practices. A central repository acts as an information overlay on top of other pre-existing resources to facilitate organization, navigation, annotation and meta-analysis while maintaining synchronization with those resources. An initial population of the central repository is automatedmore » using Yahoo! contextual search services. The collected data is organized using semantic web technologies so that the data can be more easily shared and used for innovative analyses. A prototype has demonstrated the capability of the approach.« less
Incorporating Feature-Based Annotations into Automatically Generated Knowledge Representations

NASA Astrophysics Data System (ADS)

Lumb, L. I.; Lederman, J. I.; Aldridge, K. D.

2006-12-01

Earth Science Markup Language (ESML) is efficient and effective in representing scientific data in an XML- based formalism. However, features of the data being represented are not accounted for in ESML. Such features might derive from events (e.g., a gap in data collection due to instrument servicing), identifications (e.g., a scientifically interesting area/volume in an image), or some other source. In order to account for features in an ESML context, we consider them from the perspective of annotation, i.e., the addition of information to existing documents without changing the originals. Although it is possible to extend ESML to incorporate feature-based annotations internally (e.g., by extending the XML schema for ESML), there are a number of complicating factors that we identify. Rather than pursuing the ESML-extension approach, we focus on an external representation for feature-based annotations via XML Pointer Language (XPointer). In previous work (Lumb &Aldridge, HPCS 2006, IEEE, doi:10.1109/HPCS.2006.26), we have shown that it is possible to extract relationships from ESML-based representations, and capture the results in the Resource Description Format (RDF). Thus we explore and report on this same requirement for XPointer-based annotations of ESML representations. As in our past efforts, the Global Geodynamics Project (GGP) allows us to illustrate with a real-world example this approach for introducing annotations into automatically generated knowledge representations.
Annotation of Korean Learner Corpora for Particle Error Detection

ERIC Educational Resources Information Center

Lee, Sun-Hee; Jang, Seok Bae; Seo, Sang-Kyu

2009-01-01

In this study, we focus on particle errors and discuss an annotation scheme for Korean learner corpora that can be used to extract heuristic patterns of particle errors efficiently. We investigate different properties of particle errors so that they can be later used to identify learner errors automatically, and we provide resourceful annotation…

Automatic short axis orientation of the left ventricle in 3D ultrasound recordings

NASA Astrophysics Data System (ADS)

Pedrosa, João.; Heyde, Brecht; Heeren, Laurens; Engvall, Jan; Zamorano, Jose; Papachristidis, Alexandros; Edvardsen, Thor; Claus, Piet; D'hooge, Jan

2016-04-01

The recent advent of three-dimensional echocardiography has led to an increased interest from the scientific community in left ventricle segmentation frameworks for cardiac volume and function assessment. An automatic orientation of the segmented left ventricular mesh is an important step to obtain a point-to-point correspondence between the mesh and the cardiac anatomy. Furthermore, this would allow for an automatic division of the left ventricle into the standard 17 segments and, thus, fully automatic per-segment analysis, e.g. regional strain assessment. In this work, a method for fully automatic short axis orientation of the segmented left ventricle is presented. The proposed framework aims at detecting the inferior right ventricular insertion point. 211 three-dimensional echocardiographic images were used to validate this framework by comparison to manual annotation of the inferior right ventricular insertion point. A mean unsigned error of 8, 05° +/- 18, 50° was found, whereas the mean signed error was 1, 09°. Large deviations between the manual and automatic annotations (> 30°) only occurred in 3, 79% of cases. The average computation time was 666ms in a non-optimized MATLAB environment, which potentiates real-time application. In conclusion, a successful automatic real-time method for orientation of the segmented left ventricle is proposed.
GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

PubMed

Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

2013-09-01

GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.
Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns.

PubMed

Xiang, Zuoshuang; Zheng, Jie; Lin, Yu; He, Yongqun

2015-01-01

It is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers proposed a Quick Term Templates (QTTs) process targeted at generating new ontology classes following the same pattern, using term templates in a spreadsheet format. Inspired by the ODPs and QTTs, the Ontorat web application is developed to automatically generate new ontology terms, annotations of terms, and logical axioms based on a specific ODP(s). The inputs of an Ontorat execution include axiom expression settings, an input data file, ID generation settings, and a target ontology (optional). The axiom expression settings can be saved as a predesigned Ontorat setting format text file for reuse. The input data file is generated based on a template file created by a specific ODP (text or Excel format). Ontorat is an efficient tool for ontology expansion. Different use cases are described. For example, Ontorat was applied to automatically generate over 1,000 Japan RIKEN cell line cell terms with both logical axioms and rich annotation axioms in the Cell Line Ontology (CLO). Approximately 800 licensed animal vaccines were represented and annotated in the Vaccine Ontology (VO) by Ontorat. The OBI team used Ontorat to add assay and device terms required by ENCODE project. Ontorat was also used to add missing annotations to all existing Biobank specific terms in the Biobank Ontology. A collection of ODPs and templates with examples are provided on the Ontorat website and can be reused to facilitate ontology development. With ever increasing ontology development and applications, Ontorat provides a timely platform for generating and annotating a large number of ontology terms by following design patterns. http://ontorat.hegroup.org/.
A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

PubMed

Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan

2016-08-01

This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

PubMed

Velupillai, Sumithra; Dalianis, Hercules; Hassel, Martin; Nilsson, Gunnar H

2009-12-01

Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure. This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.
Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

PubMed

Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

2012-08-01

This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.

PubMed

Kaplan, Noam; Vaaknin, Avishay; Linial, Michal

2003-10-01

Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.
Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community

PubMed Central

Hosmani, Prashant S.; Villalobos-Ayala, Krystal; Miller, Sherry; Shippy, Teresa; Flores, Mirella; Rosendale, Andrew; Cordola, Chris; Bell, Tracey; Mann, Hannah; DeAvila, Gabe; DeAvila, Daniel; Moore, Zachary; Buller, Kyle; Ciolkevich, Kathryn; Nandyal, Samantha; Mahoney, Robert; Van Voorhis, Joshua; Dunlevy, Megan; Farrow, David; Hunter, David; Morgan, Taylar; Shore, Kayla; Guzman, Victoria; Izsak, Allison; Dixon, Danielle E.; Cridge, Andrew; Cano, Liliana; Cao, Xiaolong; Jiang, Haobo; Leng, Nan; Johnson, Shannon; Cantarel, Brandi L.; Richards, Stephen; English, Adam; Shatters, Robert G.; Childers, Chris; Chen, Mei-Ju; Hunter, Wayne; Cilia, Michelle; Mueller, Lukas A.; Munoz-Torres, Monica; Nelson, David; Poelchau, Monica F.; Benoit, Joshua B.; Wiersma-Koch, Helen; D’Elia, Tom; Brown, Susan J.

2017-01-01

Abstract The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models. Database URL: https://citrusgreening.org/ PMID:29220441
A collaborative platform for consensus sessions in pathology over Internet.

PubMed

Zapletal, Eric; Le Bozec, Christel; Degoulet, Patrice; Jaulent, Marie-Christine

2003-01-01

The design of valid databases in pathology faces the problem of diagnostic disagreement between pathologists. Organizing consensus sessions between experts to reduce the variability is a difficult task. The TRIDEM platform addresses the issue to organize consensus sessions in pathology over the Internet. In this paper, we present the basis to achieve such collaborative platform. On the one hand, the platform integrates the functionalities of the IDEM consensus module that alleviates the consensus task by presenting to pathologists preliminary computed consensus through ergonomic interfaces (automatic step). On the other hand, a set of lightweight interaction tools such as vocal annotations are implemented to ease the communication between experts as they discuss a case (interactive step). The architecture of the TRIDEM platform is based on a Java-Server-Page web server that communicate with the ObjectStore PSE/PRO database used for the object storage. The HTML pages generated by the web server run Java applets to perform the different steps (automatic and interactive) of the consensus. The current limitations of the platform is to only handle a synchronous process. Moreover, improvements like re-writing the consensus workflow with a protocol such as BPML are already forecast.
Smart Annotation of Cyclic Data Using Hierarchical Hidden Markov Models.

PubMed

Martindale, Christine F; Hoenig, Florian; Strohrmann, Christina; Eskofier, Bjoern M

2017-10-13

Cyclic signals are an intrinsic part of daily life, such as human motion and heart activity. The detailed analysis of them is important for clinical applications such as pathological gait analysis and for sports applications such as performance analysis. Labeled training data for algorithms that analyze these cyclic data come at a high annotation cost due to only limited annotations available under laboratory conditions or requiring manual segmentation of the data under less restricted conditions. This paper presents a smart annotation method that reduces this cost of labeling for sensor-based data, which is applicable to data collected outside of strict laboratory conditions. The method uses semi-supervised learning of sections of cyclic data with a known cycle number. A hierarchical hidden Markov model (hHMM) is used, achieving a mean absolute error of 0.041 ± 0.020 s relative to a manually-annotated reference. The resulting model was also used to simultaneously segment and classify continuous, 'in the wild' data, demonstrating the applicability of using hHMM, trained on limited data sections, to label a complete dataset. This technique achieved comparable results to its fully-supervised equivalent. Our semi-supervised method has the significant advantage of reduced annotation cost. Furthermore, it reduces the opportunity for human error in the labeling process normally required for training of segmentation algorithms. It also lowers the annotation cost of training a model capable of continuous monitoring of cycle characteristics such as those employed to analyze the progress of movement disorders or analysis of running technique.
MIPS: analysis and annotation of proteins from whole genomes

PubMed Central

Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.

PubMed

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Trust, control strategies and allocation of function in human-machine systems.

PubMed

Lee, J; Moray, N

1992-10-01

As automated controllers supplant human intervention in controlling complex systems, the operators' role often changes from that of an active controller to that of a supervisory controller. Acting as supervisors, operators can choose between automatic and manual control. Improperly allocating function between automatic and manual control can have negative consequences for the performance of a system. Previous research suggests that the decision to perform the job manually or automatically depends, in part, upon the trust the operators invest in the automatic controllers. This paper reports an experiment to characterize the changes in operators' trust during an interaction with a semi-automatic pasteurization plant, and investigates the relationship between changes in operators' control strategies and trust. A regression model identifies the causes of changes in trust, and a 'trust transfer function' is developed using time series analysis to describe the dynamics of trust. Based on a detailed analysis of operators' strategies in response to system faults we suggest a model for the choice between manual and automatic control, based on trust in automatic controllers and self-confidence in the ability to control the system manually.
Application of a semi-automatic cartilage segmentation method for biomechanical modeling of the knee joint.

PubMed

Liukkonen, Mimmi K; Mononen, Mika E; Tanska, Petri; Saarakkala, Simo; Nieminen, Miika T; Korhonen, Rami K

2017-10-01

Manual segmentation of articular cartilage from knee joint 3D magnetic resonance images (MRI) is a time consuming and laborious task. Thus, automatic methods are needed for faster and reproducible segmentations. In the present study, we developed a semi-automatic segmentation method based on radial intensity profiles to generate 3D geometries of knee joint cartilage which were then used in computational biomechanical models of the knee joint. Six healthy volunteers were imaged with a 3T MRI device and their knee cartilages were segmented both manually and semi-automatically. The values of cartilage thicknesses and volumes produced by these two methods were compared. Furthermore, the influences of possible geometrical differences on cartilage stresses and strains in the knee were evaluated with finite element modeling. The semi-automatic segmentation and 3D geometry construction of one knee joint (menisci, femoral and tibial cartilages) was approximately two times faster than with manual segmentation. Differences in cartilage thicknesses, volumes, contact pressures, stresses, and strains between segmentation methods in femoral and tibial cartilage were mostly insignificant (p > 0.05) and random, i.e. there were no systematic differences between the methods. In conclusion, the devised semi-automatic segmentation method is a quick and accurate way to determine cartilage geometries; it may become a valuable tool for biomechanical modeling applications with large patient groups.
Transcription and Annotation of a Japanese Accented Spoken Corpus of L2 Spanish for the Development of CAPT Applications

ERIC Educational Resources Information Center

Carranza, Mario

2016-01-01

This paper addresses the process of transcribing and annotating spontaneous non-native speech with the aim of compiling a training corpus for the development of Computer Assisted Pronunciation Training (CAPT) applications, enhanced with Automatic Speech Recognition (ASR) technology. To better adapt ASR technology to CAPT tools, the recognition…
DiffNet: automatic differential functional summarization of dE-MAP networks.

PubMed

Seah, Boon-Siew; Bhowmick, Sourav S; Dewey, C Forbes

2014-10-01

The study of genetic interaction networks that respond to changing conditions is an emerging research problem. Recently, Bandyopadhyay et al. (2010) proposed a technique to construct a differential network (dE-MAPnetwork) from two static gene interaction networks in order to map the interaction differences between them under environment or condition change (e.g., DNA-damaging agent). This differential network is then manually analyzed to conclude that DNA repair is differentially effected by the condition change. Unfortunately, manual construction of differential functional summary from a dE-MAP network that summarizes all pertinent functional responses is time-consuming, laborious and error-prone, impeding large-scale analysis on it. To this end, we propose DiffNet, a novel data-driven algorithm that leverages Gene Ontology (go) annotations to automatically summarize a dE-MAP network to obtain a high-level map of functional responses due to condition change. We tested DiffNet on the dynamic interaction networks following MMS treatment and demonstrated the superiority of our approach in generating differential functional summaries compared to state-of-the-art graph clustering methods. We studied the effects of parameters in DiffNet in controlling the quality of the summary. We also performed a case study that illustrates its utility. Copyright © 2014 Elsevier Inc. All rights reserved.
compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.

PubMed

Edmands, William M B; Petrick, Lauren; Barupal, Dinesh K; Scalbert, Augustin; Wilson, Mark J; Wickliffe, Jeffrey K; Rappaport, Stephen M

2017-04-04

A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS 2 Miner (Comprehensive MS 2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS 2 data files as inputs. The number of MS 2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS 2 Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS 2 data with a Web application GUI compMS 2 Explorer (Comprehensive MS 2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS 2 Miner workflow consists of the following steps: (i) matching unknown MS 1 features to precursor MS 2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS 2 Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS 2 Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS 2 Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS 2 Explorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).
Efficient Semi-Automatic 3D Segmentation for Neuron Tracing in Electron Microscopy Images

PubMed Central

Jones, Cory; Liu, Ting; Cohan, Nathaniel Wood; Ellisman, Mark; Tasdizen, Tolga

2015-01-01

0.1. Background In the area of connectomics, there is a significant gap between the time required for data acquisition and dense reconstruction of the neural processes contained in the same dataset. Automatic methods are able to eliminate this timing gap, but the state-of-the-art accuracy so far is insufficient for use without user corrections. If completed naively, this process of correction can be tedious and time consuming. 0.2. New Method We present a new semi-automatic method that can be used to perform 3D segmentation of neurites in EM image stacks. It utilizes an automatic method that creates a hierarchical structure for recommended merges of superpixels. The user is then guided through each predicted region to quickly identify errors and establish correct links. 0.3. Results We tested our method on three datasets with both novice and expert users. Accuracy and timing were compared with published automatic, semi-automatic, and manual results. 0.4. Comparison with Existing Methods Post-automatic correction methods have also been used in [1] and [2]. These methods do not provide navigation or suggestions in the manner we present. Other semi-automatic methods require user input prior to the automatic segmentation such as [3] and [4] and are inherently different than our method. 0.5. Conclusion Using this method on the three datasets, novice users achieved accuracy exceeding state-of-the-art automatic results, and expert users achieved accuracy on par with full manual labeling but with a 70% time improvement when compared with other examples in publication. PMID:25769273
Automatic analysis of microscopic images of red blood cell aggregates

NASA Astrophysics Data System (ADS)

Menichini, Pablo A.; Larese, Mónica G.; Riquelme, Bibiana D.

2015-06-01

Red blood cell aggregation is one of the most important factors in blood viscosity at stasis or at very low rates of flow. The basic structure of aggregates is a linear array of cell commonly termed as rouleaux. Enhanced or abnormal aggregation is seen in clinical conditions, such as diabetes and hypertension, producing alterations in the microcirculation, some of which can be analyzed through the characterization of aggregated cells. Frequently, image processing and analysis for the characterization of RBC aggregation were done manually or semi-automatically using interactive tools. We propose a system that processes images of RBC aggregation and automatically obtains the characterization and quantification of the different types of RBC aggregates. Present technique could be interesting to perform the adaptation as a routine used in hemorheological and Clinical Biochemistry Laboratories because this automatic method is rapid, efficient and economical, and at the same time independent of the user performing the analysis (repeatability of the analysis).
Tools and Equipment Modeling for Automobile Interactive Assembling Operating Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu Dianliang; Zhu Hongmin; Shanghai Key Laboratory of Advance Manufacturing Environment

Tools and equipment play an important role in the simulation of virtual assembly, especially in the assembly process simulation and plan. Because of variety in function and complexity in structure and manipulation, the simulation of tools and equipments remains to be a challenge for interactive assembly operation. Based on analysis of details and characteristics of interactive operations for automobile assembly, the functional requirement for tools and equipments of automobile assembly is given. Then, a unified modeling method for information expression and function realization of general tools and equipments is represented, and the handling methods of manual, semi-automatic, automatic tools andmore » equipments are discussed. Finally, the application in assembly simulation of rear suspension and front suspension of Roewe 750 automobile is given. The result shows that the modeling and handling methods are applicable in the interactive simulation of various tools and equipments, and can also be used for supporting assembly process planning in virtual environment.« less

Characterization of the mechanism of drug-drug interactions from PubMed using MeSH terms.

PubMed

Lu, Yin; Figler, Bryan; Huang, Hong; Tu, Yi-Cheng; Wang, Ju; Cheng, Feng

2017-01-01

Identifying drug-drug interaction (DDI) is an important topic for the development of safe pharmaceutical drugs and for the optimization of multidrug regimens for complex diseases such as cancer and HIV. There have been about 150,000 publications on DDIs in PubMed, which is a great resource for DDI studies. In this paper, we introduced an automatic computational method for the systematic analysis of the mechanism of DDIs using MeSH (Medical Subject Headings) terms from PubMed literature. MeSH term is a controlled vocabulary thesaurus developed by the National Library of Medicine for indexing and annotating articles. Our method can effectively identify DDI-relevant MeSH terms such as drugs, proteins and phenomena with high accuracy. The connections among these MeSH terms were investigated by using co-occurrence heatmaps and social network analysis. Our approach can be used to visualize relationships of DDI terms, which has the potential to help users better understand DDIs. As the volume of PubMed records increases, our method for automatic analysis of DDIs from the PubMed database will become more accurate.
A robust data-driven approach for gene ontology annotation.

PubMed

Li, Yanpeng; Yu, Hong

2014-01-01

Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks. © The Author(s) 2014. Published by Oxford University Press.
Nonlinear Deep Kernel Learning for Image Annotation.

PubMed

Jiu, Mingyuan; Sahbi, Hichem

2017-02-08

Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.
TRECVID: the utility of a content-based video retrieval evaluation

NASA Astrophysics Data System (ADS)

Hauptmann, Alexander G.

2006-01-01

TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
Caveat emptor: limitations of the automated reconstruction of metabolic pathways in Plasmodium.

PubMed

Ginsburg, Hagai

2009-01-01

The functional reconstruction of metabolic pathways from an annotated genome is a tedious and demanding enterprise. Automation of this endeavor using bioinformatics algorithms could cope with the ever-increasing number of sequenced genomes and accelerate the process. Here, the manual reconstruction of metabolic pathways in the functional genomic database of Plasmodium falciparum--Malaria Parasite Metabolic Pathways--is described and compared with pathways generated automatically as they appear in PlasmoCyc, metaSHARK and the Kyoto Encyclopedia for Genes and Genomes. A critical evaluation of this comparison discloses that the automatic reconstruction of pathways generates manifold paths that need an expert manual verification to accept some and reject most others based on manually curated gene annotation.
EUCLID: automatic classification of proteins in functional classes by their database annotations.

PubMed

Tamames, J; Ouzounis, C; Casari, G; Sander, C; Valencia, A

1998-01-01

A tool is described for the automatic classification of sequences in functional classes using their database annotations. The Euclid system is based on a simple learning procedure from examples provided by human experts. Euclid is freely available for academics at http://www.gredos.cnb.uam.es/EUCLID, with the corresponding dictionaries for the generation of three, eight and 14 functional classes. E-mail: valencia@cnb.uam.es The results of the EUCLID classification of different genomes are available at http://www.sander.ebi.ac. uk/genequiz/. A detailed description of the different applications mentioned in the text is available at http://www.gredos.cnb.uam. es/EUCLID/Full_Paper
Application of MPEG-7 descriptors for content-based indexing of sports videos

NASA Astrophysics Data System (ADS)

Hoeynck, Michael; Auweiler, Thorsten; Ohm, Jens-Rainer

2003-06-01

The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotation of videos for different kind of sports, and present our application for the automatic annotation of equestrian sports videos. Thereby, we especially concentrate on MPEG-7 based feature extraction and content description. We apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information and taking specific domain knowledge into account. Having determined single shot positions as well as the visual highlights, the information is jointly stored together with additional textual information in an MPEG-7 description scheme. Using this information, we generate content summaries which can be utilized in a user front-end in order to provide content-based access to the video stream, but further content-based queries and navigation on a video-on-demand streaming server.
Assemble: an interactive graphical tool to analyze and build RNA architectures at the 2D and 3D levels.

PubMed

Jossinet, Fabrice; Ludwig, Thomas E; Westhof, Eric

2010-08-15

Assemble is an intuitive graphical interface to analyze, manipulate and build complex 3D RNA architectures. It provides several advanced and unique features within the framework of a semi-automated modeling process that can be performed by homology and ab initio with or without electron density maps. Those include the interactive editing of a secondary structure and a searchable, embedded library of annotated tertiary structures. Assemble helps users with performing recurrent and otherwise tedious tasks in structural RNA research. Assemble is released under an open-source license (MIT license) and is freely available at http://bioinformatics.org/assemble. It is implemented in the Java language and runs on MacOSX, Linux and Windows operating systems.
Automatic segmentation of triaxial accelerometry signals for falls risk estimation.

PubMed

Redmond, Stephen J; Scalzi, Maria Elena; Narayanan, Michael R; Lord, Stephen R; Cerutti, Sergio; Lovell, Nigel H

2010-01-01

Falls-related injuries in the elderly population represent one of the most significant contributors to rising health care expense in developed countries. In recent years, falls detection technologies have become more common. However, very few have adopted a preferable falls prevention strategy through unsupervised monitoring in the free-living environment. The basis of the monitoring described herein was a self-administered directed-routine (DR) comprising three separate tests measured by way of a waist-mounted triaxial accelerometer. Using features extracted from the manually segmented signals, a reasonable estimate of falls risk can be achieved. We describe here a series of algorithms for automatically segmenting these recordings, enabling the use of the DR assessment in the unsupervised and home environments. The accelerometry signals, from 68 subjects performing the DR, were manually annotated by an observer. Using the proposed signal segmentation routines, an good agreement was observed between the manually annotated markers and the automatically estimated values. However, a decrease in the correlation with falls risk to 0.73 was observed using the automatic segmentation, compared to 0.81 when using markers manually placed by an observer.
An orthology-based analysis of pathogenic protozoa impacting global health: an improved comparative genomics approach with prokaryotes and model eukaryote orthologs.

PubMed

Cuadrat, Rafael R C; da Serra Cruz, Sérgio Manuel; Tschoeke, Diogo Antônio; Silva, Edno; Tosta, Frederico; Jucá, Henrique; Jardim, Rodrigo; Campos, Maria Luiza M; Mattoso, Marta; Dávila, Alberto M R

2014-08-01

A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.
An Orthology-Based Analysis of Pathogenic Protozoa Impacting Global Health: An Improved Comparative Genomics Approach with Prokaryotes and Model Eukaryote Orthologs

PubMed Central

Cuadrat, Rafael R. C.; da Serra Cruz, Sérgio Manuel; Tschoeke, Diogo Antônio; Silva, Edno; Tosta, Frederico; Jucá, Henrique; Jardim, Rodrigo; Campos, Maria Luiza M.; Mattoso, Marta

2014-01-01

Abstract A key focus in 21st century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools. PMID:24960463
PipeOnline 2.0: automated EST processing and functional data sorting.

PubMed

Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

2002-11-01

Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Annotated chemical patent corpus: a gold standard for text mining.

PubMed

Akhondi, Saber A; Klenner, Alexander G; Tyrchan, Christian; Manchala, Anil K; Boppana, Kiran; Lowe, Daniel; Zimmermann, Marc; Jagarlapudi, Sarma A R P; Sayle, Roger; Kors, Jan A; Muresan, Sorel

2014-01-01

Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Annotated Chemical Patent Corpus: A Gold Standard for Text Mining

PubMed Central

Akhondi, Saber A.; Klenner, Alexander G.; Tyrchan, Christian; Manchala, Anil K.; Boppana, Kiran; Lowe, Daniel; Zimmermann, Marc; Jagarlapudi, Sarma A. R. P.; Sayle, Roger; Kors, Jan A.; Muresan, Sorel

2014-01-01

Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org. PMID:25268232
Annotation Graphs: A Graph-Based Visualization for Meta-Analysis of Data Based on User-Authored Annotations.

PubMed

Zhao, Jian; Glueck, Michael; Breslav, Simon; Chevalier, Fanny; Khan, Azam

2017-01-01

User-authored annotations of data can support analysts in the activity of hypothesis generation and sensemaking, where it is not only critical to document key observations, but also to communicate insights between analysts. We present annotation graphs, a dynamic graph visualization that enables meta-analysis of data based on user-authored annotations. The annotation graph topology encodes annotation semantics, which describe the content of and relations between data selections, comments, and tags. We present a mixed-initiative approach to graph layout that integrates an analyst's manual manipulations with an automatic method based on similarity inferred from the annotation semantics. Various visual graph layout styles reveal different perspectives on the annotation semantics. Annotation graphs are implemented within C8, a system that supports authoring annotations during exploratory analysis of a dataset. We apply principles of Exploratory Sequential Data Analysis (ESDA) in designing C8, and further link these to an existing task typology in the visualization literature. We develop and evaluate the system through an iterative user-centered design process with three experts, situated in the domain of analyzing HCI experiment data. The results suggest that annotation graphs are effective as a method of visually extending user-authored annotations to data meta-analysis for discovery and organization of ideas.
MIPS bacterial genomes functional annotation benchmark dataset.

PubMed

Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

2005-05-15

Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab
Semi-automatic, octave-spanning optical frequency counter.

PubMed

Liu, Tze-An; Shu, Ren-Huei; Peng, Jin-Long

2008-07-07

This work presents and demonstrates a semi-automatic optical frequency counter with octave-spanning counting capability using two fiber laser combs operated at different repetition rates. Monochromators are utilized to provide an approximate frequency of the laser under measurement to determine the mode number difference between the two laser combs. The exact mode number of the beating comb line is obtained from the mode number difference and the measured beat frequencies. The entire measurement process, except the frequency stabilization of the laser combs and the optimization of the beat signal-to-noise ratio, is controlled by a computer running a semi-automatic optical frequency counter.
Accurate and consistent automatic seismocardiogram annotation without concurrent ECG.

PubMed

Laurin, A; Khosrow-Khavar, F; Blaber, A P; Tavakolian, Kouhyar

2016-09-01

Seismocardiography (SCG) is the measurement of vibrations in the sternum caused by the beating of the heart. Precise cardiac mechanical timings that are easily obtained from SCG are critically dependent on accurate identification of fiducial points. So far, SCG annotation has relied on concurrent ECG measurements. An algorithm capable of annotating SCG without the use any other concurrent measurement was designed. We subjected 18 participants to graded lower body negative pressure. We collected ECG and SCG, obtained R peaks from the former, and annotated the latter by hand, using these identified peaks. We also annotated the SCG automatically. We compared the isovolumic moment timings obtained by hand to those obtained using our algorithm. Mean ± confidence interval of the percentage of accurately annotated cardiac cycles were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for levels of negative pressure 0, -20, -30, -40, and -50 mmHg. LF/HF ratios, the relative power of low-frequency variations to high-frequency variations in heart beat intervals, obtained from isovolumic moments were also compared to those obtained from R peaks. The mean differences ± confidence interval were [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] for increasing levels of negative pressure. The accuracy and consistency of the algorithm enables the use of SCG as a stand-alone heart monitoring tool in healthy individuals at rest, and could serve as a basis for an eventual application in pathological cases.
Real-time image annotation by manifold-based biased Fisher discriminant analysis

NASA Astrophysics Data System (ADS)

Ji, Rongrong; Yao, Hongxun; Wang, Jicheng; Sun, Xiaoshuai; Liu, Xianming

2008-01-01

Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval. However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S) problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords) selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
MIPS: curated databases and comprehensive secondary data resources in 2010.

PubMed

Mewes, H Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F X; Stümpflen, Volker; Antonov, Alexey

2011-01-01

The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38,000,000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de).

MIPS: curated databases and comprehensive secondary data resources in 2010

PubMed Central

Mewes, H. Werner; Ruepp, Andreas; Theis, Fabian; Rattei, Thomas; Walter, Mathias; Frishman, Dmitrij; Suhre, Karsten; Spannagl, Manuel; Mayer, Klaus F.X.; Stümpflen, Volker; Antonov, Alexey

2011-01-01

The Munich Information Center for Protein Sequences (MIPS at the Helmholtz Center for Environmental Health, Neuherberg, Germany) has many years of experience in providing annotated collections of biological data. Selected data sets of high relevance, such as model genomes, are subjected to careful manual curation, while the bulk of high-throughput data is annotated by automatic means. High-quality reference resources developed in the past and still actively maintained include Saccharomyces cerevisiae, Neurospora crassa and Arabidopsis thaliana genome databases as well as several protein interaction data sets (MPACT, MPPI and CORUM). More recent projects are PhenomiR, the database on microRNA-related phenotypes, and MIPS PlantsDB for integrative and comparative plant genome research. The interlinked resources SIMAP and PEDANT provide homology relationships as well as up-to-date and consistent annotation for 38 000 000 protein sequences. PPLIPS and CCancer are versatile tools for proteomics and functional genomics interfacing to a database of compilations from gene lists extracted from literature. A novel literature-mining tool, EXCERBT, gives access to structured information on classified relations between genes, proteins, phenotypes and diseases extracted from Medline abstracts by semantic analysis. All databases described here, as well as the detailed descriptions of our projects can be accessed through the MIPS WWW server (http://mips.helmholtz-muenchen.de). PMID:21109531
TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

PubMed

Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

2006-02-01

TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.
Calling on a million minds for community annotation in WikiProteins

PubMed Central

Mons, Barend; Ashburner, Michael; Chichester, Christine; van Mulligen, Erik; Weeber, Marc; den Dunnen, Johan; van Ommen, Gert-Jan; Musen, Mark; Cockerill, Matthew; Hermjakob, Henning; Mons, Albert; Packer, Abel; Pacheco, Roberto; Lewis, Suzanna; Berkeley, Alfred; Melton, William; Barris, Nickolas; Wales, Jimmy; Meijssen, Gerard; Moeller, Erik; Roes, Peter Jan; Borner, Katy; Bairoch, Amos

2008-01-01

WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at . PMID:18507872
Label free cell-tracking and division detection based on 2D time-lapse images for lineage analysis of early embryo development.

PubMed

Cicconet, Marcelo; Gutwein, Michelle; Gunsalus, Kristin C; Geiger, Davi

2014-08-01

In this paper we report a database and a series of techniques related to the problem of tracking cells, and detecting their divisions, in time-lapse movies of mammalian embryos. Our contributions are (1) a method for counting embryos in a well, and cropping each individual embryo across frames, to create individual movies for cell tracking; (2) a semi-automated method for cell tracking that works up to the 8-cell stage, along with a software implementation available to the public (this software was used to build the reported database); (3) an algorithm for automatic tracking up to the 4-cell stage, based on histograms of mirror symmetry coefficients captured using wavelets; (4) a cell-tracking database containing 100 annotated examples of mammalian embryos up to the 8-cell stage; and (5) statistical analysis of various timing distributions obtained from those examples. Copyright © 2014 Elsevier Ltd. All rights reserved.
Automatic Generation of Cycle-Approximate TLMs with Timed RTOS Model Support

NASA Astrophysics Data System (ADS)

Hwang, Yonghyun; Schirner, Gunar; Abdi, Samar

This paper presents a technique for automatically generating cycle-approximate transaction level models (TLMs) for multi-process applications mapped to embedded platforms. It incorporates three key features: (a) basic block level timing annotation, (b) RTOS model integration, and (c) RTOS overhead delay modeling. The inputs to TLM generation are application C processes and their mapping to processors in the platform. A processor data model, including pipelined datapath, memory hierarchy and branch delay model is used to estimate basic block execution delays. The delays are annotated to the C code, which is then integrated with a generated SystemC RTOS model. Our abstract RTOS provides dynamic scheduling and inter-process communication (IPC) with processor- and RTOS-specific pre-characterized timing. Our experiments using a MP3 decoder and a JPEG encoder show that timed TLMs, with integrated RTOS models, can be automatically generated in less than a minute. Our generated TLMs simulated three times faster than real-time and showed less than 10% timing error compared to board measurements.
The language of gene ontology: a Zipf's law analysis.

PubMed

Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy

2012-06-07

Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
A bootstrapping method for development of Treebank

NASA Astrophysics Data System (ADS)

Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

2017-01-01

Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].
Semi-automatic tracking, smoothing and segmentation of hyoid bone motion from videofluoroscopic swallowing study.

PubMed

Kim, Won-Seok; Zeng, Pengcheng; Shi, Jian Qing; Lee, Youngjo; Paik, Nam-Jong

2017-01-01

Motion analysis of the hyoid bone via videofluoroscopic study has been used in clinical research, but the classical manual tracking method is generally labor intensive and time consuming. Although some automatic tracking methods have been developed, masked points could not be tracked and smoothing and segmentation, which are necessary for functional motion analysis prior to registration, were not provided by the previous software. We developed software to track the hyoid bone motion semi-automatically. It works even in the situation where the hyoid bone is masked by the mandible and has been validated in dysphagia patients with stroke. In addition, we added the function of semi-automatic smoothing and segmentation. A total of 30 patients' data were used to develop the software, and data collected from 17 patients were used for validation, of which the trajectories of 8 patients were partly masked. Pearson correlation coefficients between the manual and automatic tracking are high and statistically significant (0.942 to 0.991, P-value<0.0001). Relative errors between automatic tracking and manual tracking in terms of the x-axis, y-axis and 2D range of hyoid bone excursion range from 3.3% to 9.2%. We also developed an automatic method to segment each hyoid bone trajectory into four phases (elevation phase, anterior movement phase, descending phase and returning phase). The semi-automatic hyoid bone tracking from VFSS data by our software is valid compared to the conventional manual tracking method. In addition, the ability of automatic indication to switch the automatic mode to manual mode in extreme cases and calibration without attaching the radiopaque object is convenient and useful for users. Semi-automatic smoothing and segmentation provide further information for functional motion analysis which is beneficial to further statistical analysis such as functional classification and prognostication for dysphagia. Therefore, this software could provide the researchers in the field of dysphagia with a convenient, useful, and all-in-one platform for analyzing the hyoid bone motion. Further development of our method to track the other swallowing related structures or objects such as epiglottis and bolus and to carry out the 2D curve registration may be needed for a more comprehensive functional data analysis for dysphagia with big data.
FIJI Macro 3D ART VeSElecT: 3D Automated Reconstruction Tool for Vesicle Structures of Electron Tomograms

PubMed Central

Kaltdorf, Kristin Verena; Schulze, Katja; Helmprobst, Frederik; Kollmannsberger, Philip; Stigloher, Christian

2017-01-01

Automatic image reconstruction is critical to cope with steadily increasing data from advanced microscopy. We describe here the Fiji macro 3D ART VeSElecT which we developed to study synaptic vesicles in electron tomograms. We apply this tool to quantify vesicle properties (i) in embryonic Danio rerio 4 and 8 days past fertilization (dpf) and (ii) to compare Caenorhabditis elegans N2 neuromuscular junctions (NMJ) wild-type and its septin mutant (unc-59(e261)). We demonstrate development-specific and mutant-specific changes in synaptic vesicle pools in both models. We confirm the functionality of our macro by applying our 3D ART VeSElecT on zebrafish NMJ showing smaller vesicles in 8 dpf embryos then 4 dpf, which was validated by manual reconstruction of the vesicle pool. Furthermore, we analyze the impact of C. elegans septin mutant unc-59(e261) on vesicle pool formation and vesicle size. Automated vesicle registration and characterization was implemented in Fiji as two macros (registration and measurement). This flexible arrangement allows in particular reducing false positives by an optional manual revision step. Preprocessing and contrast enhancement work on image-stacks of 1nm/pixel in x and y direction. Semi-automated cell selection was integrated. 3D ART VeSElecT removes interfering components, detects vesicles by 3D segmentation and calculates vesicle volume and diameter (spherical approximation, inner/outer diameter). Results are collected in color using the RoiManager plugin including the possibility of manual removal of non-matching confounder vesicles. Detailed evaluation considered performance (detected vesicles) and specificity (true vesicles) as well as precision and recall. We furthermore show gain in segmentation and morphological filtering compared to learning based methods and a large time gain compared to manual segmentation. 3D ART VeSElecT shows small error rates and its speed gain can be up to 68 times faster in comparison to manual annotation. Both automatic and semi-automatic modes are explained including a tutorial. PMID:28056033
Community annotation experiment for ground truth generation for the i2b2 medication challenge

PubMed Central

Solti, Imre; Xia, Fei; Cadag, Eithon

2010-01-01

Objective Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as ‘the i2b2 medication challenge team’ or ‘the i2b2 team’ for short) organized a community annotation experiment. Design For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They asked the participants of the Third i2b2 Workshop to annotate 10 discharge summaries per person; each discharge summary was annotated by two annotators from two different teams, and a third annotator from a third team resolved disagreements. Measurements In order to evaluate the reliability of the annotations thus produced, the authors measured community inter-annotator agreement and compared it with the inter-annotator agreement of expert annotators when both the community and the expert annotators generated ground truth based on pooled system outputs. For this purpose, the pool consisted of the three most densely populated automatic annotations of each record. The authors also compared the community inter-annotator agreement with expert inter-annotator agreement when the experts annotated raw records without using the pool. Finally, they measured the quality of the community ground truth by comparing it with the expert ground truth. Results and conclusions The authors found that the community annotators achieved comparable inter-annotator agreement to expert annotators, regardless of whether the experts annotated from the pool. Furthermore, the ground truth generated by the community obtained F-measures above 0.90 against the ground truth of the experts, indicating the value of the community as a source of high-quality ground truth even on intricate and domain-specific annotation tasks. PMID:20819855
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

PubMed Central

Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

2011-01-01

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677
NaviSE: superenhancer navigator integrating epigenomics signal algebra.

PubMed

Ascensión, Alex M; Arrospide-Elgarresta, Mikel; Izeta, Ander; Araúzo-Bravo, Marcos J

2017-06-06

Superenhancers are crucial structural genomic elements determining cell fate, and they are also involved in the determination of several diseases, such as cancer or neurodegeneration. Although there are pipelines which use independent pieces of software to predict the presence of superenhancers from genome-wide chromatin marks or DNA-interaction protein binding sites, there is not yet an integrated software tool that processes automatically algebra combinations of raw data sequencing into a comprehensive final annotated report of predicted superenhancers. We have developed NaviSE, a user-friendly streamlined tool which performs a fully-automated parallel processing of genome-wide epigenomics data from sequencing files into a final report, built with a comprehensive set of annotated files that are navigated through a graphic user interface dynamically generated by NaviSE. NaviSE also implements an 'epigenomics signal algebra' that allows the combination of multiple activation and repression epigenomics signals. NaviSE provides an interactive chromosomal landscaping of the locations of superenhancers, which can be navigated to obtain annotated information about superenhancer signal profile, associated genes, gene ontology enrichment analysis, motifs of transcription factor binding sites enriched in superenhancers, graphs of the metrics evaluating the superenhancers quality, protein-protein interaction networks and enriched metabolic pathways among other features. We have parallelised the most time-consuming tasks achieving a reduction up to 30% for a 15 CPUs machine. We have optimized the default parameters of NaviSE to facilitate its use. NaviSE allows different entry levels of data processing, from sra-fastq files to bed files; and unifies the processing of multiple replicates. NaviSE outperforms the more time-consuming processes required in a non-integrated pipeline. Alongside its high performance, NaviSE is able to provide biological insights, predicting cell type specific markers, such as SOX2 and ZIC3 in embryonic stem cells, CDK5R1 and REST in neurons and CD86 and TLR2 in monocytes. NaviSE is a user-friendly streamlined solution for superenhancer analysis, annotation and navigation, requiring only basic computer and next generation sequencing knowledge. NaviSE binaries and documentation are available at: https://sourceforge.net/projects/navise-superenhancer/ .
Automatic annotation of protein motif function with Gene Ontology terms.

PubMed

Lu, Xinghua; Zhai, Chengxiang; Gopalakrishnan, Vanathi; Buchanan, Bruce G

2004-09-02

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, a much needed and important task is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. This paper presents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifs is viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association is found to be a very useful feature. We take advantage of the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correct association. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about the functions of newly discovered candidate protein motifs.
Diagnostic accuracy of semi-automatic quantitative metrics as an alternative to expert reading of CT myocardial perfusion in the CORE320 study.

PubMed

Ostovaneh, Mohammad R; Vavere, Andrea L; Mehra, Vishal C; Kofoed, Klaus F; Matheson, Matthew B; Arbab-Zadeh, Armin; Fujisawa, Yasuko; Schuijf, Joanne D; Rochitte, Carlos E; Scholte, Arthur J; Kitagawa, Kakuya; Dewey, Marc; Cox, Christopher; DiCarli, Marcelo F; George, Richard T; Lima, Joao A C

To determine the diagnostic accuracy of semi-automatic quantitative metrics compared to expert reading for interpretation of computed tomography perfusion (CTP) imaging. The CORE320 multicenter diagnostic accuracy clinical study enrolled patients between 45 and 85 years of age who were clinically referred for invasive coronary angiography (ICA). Computed tomography angiography (CTA), CTP, single photon emission computed tomography (SPECT), and ICA images were interpreted manually in blinded core laboratories by two experienced readers. Additionally, eight quantitative CTP metrics as continuous values were computed semi-automatically from myocardial and blood attenuation and were combined using logistic regression to derive a final quantitative CTP metric score. For the reference standard, hemodynamically significant coronary artery disease (CAD) was defined as a quantitative ICA stenosis of 50% or greater and a corresponding perfusion defect by SPECT. Diagnostic accuracy was determined by area under the receiver operating characteristic curve (AUC). Of the total 377 included patients, 66% were male, median age was 62 (IQR: 56, 68) years, and 27% had prior myocardial infarction. In patient based analysis, the AUC (95% CI) for combined CTA-CTP expert reading and combined CTA-CTP semi-automatic quantitative metrics was 0.87(0.84-0.91) and 0.86 (0.83-0.9), respectively. In vessel based analyses the AUC's were 0.85 (0.82-0.88) and 0.84 (0.81-0.87), respectively. No significant difference in AUC was found between combined CTA-CTP expert reading and CTA-CTP semi-automatic quantitative metrics in patient based or vessel based analyses(p > 0.05 for all). Combined CTA-CTP semi-automatic quantitative metrics is as accurate as CTA-CTP expert reading to detect hemodynamically significant CAD. Copyright © 2018 Society of Cardiovascular Computed Tomography. Published by Elsevier Inc. All rights reserved.
Does semi-automatic bone-fragment segmentation improve the reproducibility of the Letournel acetabular fracture classification?

PubMed

Boudissa, M; Orfeuvre, B; Chabanas, M; Tonetti, J

2017-09-01

The Letournel classification of acetabular fracture shows poor reproducibility in inexperienced observers, despite the introduction of 3D imaging. We therefore developed a method of semi-automatic segmentation based on CT data. The present prospective study aimed to assess: (1) whether semi-automatic bone-fragment segmentation increased the rate of correct classification; (2) if so, in which fracture types; and (3) feasibility using the open-source itksnap 3.0 software package without incurring extra cost for users. Semi-automatic segmentation of acetabular fractures significantly increases the rate of correct classification by orthopedic surgery residents. Twelve orthopedic surgery residents classified 23 acetabular fractures. Six used conventional 3D reconstructions provided by the center's radiology department (conventional group) and 6 others used reconstructions obtained by semi-automatic segmentation using the open-source itksnap 3.0 software package (segmentation group). Bone fragments were identified by specific colors. Correct classification rates were compared between groups on Chi 2 test. Assessment was repeated 2 weeks later, to determine intra-observer reproducibility. Correct classification rates were significantly higher in the "segmentation" group: 114/138 (83%) versus 71/138 (52%); P<0.0001. The difference was greater for simple (36/36 (100%) versus 17/36 (47%); P<0.0001) than complex fractures (79/102 (77%) versus 54/102 (53%); P=0.0004). Mean segmentation time per fracture was 27±3min [range, 21-35min]. The segmentation group showed excellent intra-observer correlation coefficients, overall (ICC=0.88), and for simple (ICC=0.92) and complex fractures (ICC=0.84). Semi-automatic segmentation, identifying the various bone fragments, was effective in increasing the rate of correct acetabular fracture classification on the Letournel system by orthopedic surgery residents. It may be considered for routine use in education and training. III: prospective case-control study of a diagnostic procedure. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
GeneView: a comprehensive semantic search engine for PubMed.

PubMed

Thomas, Philippe; Starlinger, Johannes; Vowinkel, Alexander; Arzt, Sebastian; Leser, Ulf

2012-07-01

Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
The Schistosoma mansoni phylome: using evolutionary genomics to gain insight into a parasite's biology.

PubMed

Silva, Larissa Lopes; Marcet-Houben, Marina; Nahum, Laila Alves; Zerlotini, Adhemar; Gabaldón, Toni; Oliveira, Guilherme

2012-11-13

Schistosoma mansoni is one of the causative agents of schistosomiasis, a neglected tropical disease that affects about 237 million people worldwide. Despite recent efforts, we still lack a general understanding of the relevant host-parasite interactions, and the possible treatments are limited by the emergence of resistant strains and the absence of a vaccine. The S. mansoni genome was completely sequenced and still under continuous annotation. Nevertheless, more than 45% of the encoded proteins remain without experimental characterization or even functional prediction. To improve our knowledge regarding the biology of this parasite, we conducted a proteome-wide evolutionary analysis to provide a broad view of the S. mansoni's proteome evolution and to improve its functional annotation. Using a phylogenomic approach, we reconstructed the S. mansoni phylome, which comprises the evolutionary histories of all parasite proteins and their homologs across 12 other organisms. The analysis of a total of 7,964 phylogenies allowed a deeper understanding of genomic complexity and evolutionary adaptations to a parasitic lifestyle. In particular, the identification of lineage-specific gene duplications pointed to the diversification of several protein families that are relevant for host-parasite interaction, including proteases, tetraspanins, fucosyltransferases, venom allergen-like proteins, and tegumental-allergen-like proteins. In addition to the evolutionary knowledge, the phylome data enabled us to automatically re-annotate 3,451 proteins through a phylogenetic-based approach rather than solely sequence similarity searches. To allow further exploitation of this valuable data, all information has been made available at PhylomeDB (http://www.phylomedb.org). In this study, we used an evolutionary approach to assess S. mansoni parasite biology, improve genome/proteome functional annotation, and provide insights into host-parasite interactions. Taking advantage of a proteome-wide perspective rather than focusing on individual proteins, we identified that this parasite has experienced specific gene duplication events, particularly affecting genes that are potentially related to the parasitic lifestyle. These innovations may be related to the mechanisms that protect S. mansoni against host immune responses being important adaptations for the parasite survival in a potentially hostile environment. Continuing this work, a comparative analysis involving genomic, transcriptomic, and proteomic data from other helminth parasites, other parasites, and vectors will supply more information regarding parasite's biology as well as host-parasite interactions.
CerebralWeb: a Cytoscape.js plug-in to visualize networks stratified by subcellular localization.

PubMed

Frias, Silvia; Bryan, Kenneth; Brinkman, Fiona S L; Lynn, David J

2015-01-01

CerebralWeb is a light-weight JavaScript plug-in that extends Cytoscape.js to enable fast and interactive visualization of molecular interaction networks stratified based on subcellular localization or other user-supplied annotation. The application is designed to be easily integrated into any website and is configurable to support customized network visualization. CerebralWeb also supports the automatic retrieval of Cerebral-compatible localizations for human, mouse and bovine genes via a web service and enables the automated parsing of Cytoscape compatible XGMML network files. CerebralWeb currently supports embedded network visualization on the InnateDB (www.innatedb.com) and Allergy and Asthma Portal (allergen.innatedb.com) database and analysis resources. Database tool URL: http://www.innatedb.com/CerebralWeb © The Author(s) 2015. Published by Oxford University Press.
JGI Plant Genomics Gene Annotation Pipeline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

2014-07-14

Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports

PubMed Central

Chapman, Wendy W.; Dowling, John N.

2006-01-01

Evaluating automated indexing applications requires comparing automatically indexed terms against manual reference standard annotations. However, there are no standard guidelines for determining which words from a textual document to include in manual annotations, and the vague task can result in substantial variation among manual indexers. We applied grounded theory to emergency department reports to create an annotation schema representing syntactic and semantic variables that could be annotated when indexing clinical conditions. We describe the annotation schema, which includes variables representing medical concepts (e.g., symptom, demographics), linguistic form (e.g., noun, adjective), and modifier types (e.g., anatomic location, severity). We measured the schema’s quality and found: (1) the schema was comprehensive enough to be applied to 20 unseen reports without changes to the schema; (2) agreement between author annotators applying the schema was high, with an F measure of 93%; and (3) an error analysis showed that the authors made complementary errors when applying the schema, demonstrating that the schema incorporates both linguistic and medical expertise. PMID:16230050

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

DTIC Science & Technology

2010-01-01

mostly American English1 tweets from October 27, 2010, automatically tokenized them using a Twitter tok - enizer (O’Connor et al., 2010b),2 and pre...for which our tagset, annotation guidelines, and tokenization rules were deficient or ambiguous. Based on these considerations we revised the tok ...tagger. 6Possessive endings only appear when a user or the tok - enizer has separated the possessive ending from a possessor; the tokenizer only does
Semantic integration of data on transcriptional regulation

PubMed Central

Baitaluk, Michael; Ponomarenko, Julia

2010-01-01

Motivation: Experimental and predicted data concerning gene transcriptional regulation are distributed among many heterogeneous sources. However, there are no resources to integrate these data automatically or to provide a ‘one-stop shop’ experience for users seeking information essential for deciphering and modeling gene regulatory networks. Results: IntegromeDB, a semantic graph-based ‘deep-web’ data integration system that automatically captures, integrates and manages publicly available data concerning transcriptional regulation, as well as other relevant biological information, is proposed in this article. The problems associated with data integration are addressed by ontology-driven data mapping, multiple data annotation and heterogeneous data querying, also enabling integration of the user's data. IntegromeDB integrates over 100 experimental and computational data sources relating to genomics, transcriptomics, genetics, and functional and interaction data concerning gene transcriptional regulation in eukaryotes and prokaryotes. Availability: IntegromeDB is accessible through the integrated research environment BiologicalNetworks at http://www.BiologicalNetworks.org Contact: baitaluk@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20427517
Semantic integration of data on transcriptional regulation.

PubMed

Baitaluk, Michael; Ponomarenko, Julia

2010-07-01

Experimental and predicted data concerning gene transcriptional regulation are distributed among many heterogeneous sources. However, there are no resources to integrate these data automatically or to provide a 'one-stop shop' experience for users seeking information essential for deciphering and modeling gene regulatory networks. IntegromeDB, a semantic graph-based 'deep-web' data integration system that automatically captures, integrates and manages publicly available data concerning transcriptional regulation, as well as other relevant biological information, is proposed in this article. The problems associated with data integration are addressed by ontology-driven data mapping, multiple data annotation and heterogeneous data querying, also enabling integration of the user's data. IntegromeDB integrates over 100 experimental and computational data sources relating to genomics, transcriptomics, genetics, and functional and interaction data concerning gene transcriptional regulation in eukaryotes and prokaryotes. IntegromeDB is accessible through the integrated research environment BiologicalNetworks at http://www.BiologicalNetworks.org baitaluk@sdsc.edu Supplementary data are available at Bioinformatics online.
TOPSAN: a dynamic web database for structural genomics.

PubMed

Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

2011-01-01

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
Image annotation based on positive-negative instances learning

NASA Astrophysics Data System (ADS)

Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping

2017-07-01

Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.
SADI, SHARE, and the in silico scientific method

PubMed Central

2010-01-01

Background The emergence and uptake of Semantic Web technologies by the Life Sciences provides exciting opportunities for exploring novel ways to conduct in silico science. Web Service Workflows are already becoming first-class objects in “the new way”, and serve as explicit, shareable, referenceable representations of how an experiment was done. In turn, Semantic Web Service projects aim to facilitate workflow construction by biological domain-experts such that workflows can be edited, re-purposed, and re-published by non-informaticians. However the aspects of the scientific method relating to explicit discourse, disagreement, and hypothesis generation have remained relatively impervious to new technologies. Results Here we present SADI and SHARE - a novel Semantic Web Service framework, and a reference implementation of its client libraries. Together, SADI and SHARE allow the semi- or fully-automatic discovery and pipelining of Semantic Web Services in response to ad hoc user queries. Conclusions The semantic behaviours exhibited by SADI and SHARE extend the functionalities provided by Description Logic Reasoners such that novel assertions can be automatically added to a data-set without logical reasoning, but rather by analytical or annotative services. This behaviour might be applied to achieve the “semantification” of those aspects of the in silico scientific method that are not yet supported by Semantic Web technologies. We support this suggestion using an example in the clinical research space. PMID:21210986
A Modular Hierarchical Approach to 3D Electron Microscopy Image Segmentation

PubMed Central

Liu, Ting; Jones, Cory; Seyedhosseini, Mojtaba; Tasdizen, Tolga

2014-01-01

The study of neural circuit reconstruction, i.e., connectomics, is a challenging problem in neuroscience. Automated and semi-automated electron microscopy (EM) image analysis can be tremendously helpful for connectomics research. In this paper, we propose a fully automatic approach for intra-section segmentation and inter-section reconstruction of neurons using EM images. A hierarchical merge tree structure is built to represent multiple region hypotheses and supervised classification techniques are used to evaluate their potentials, based on which we resolve the merge tree with consistency constraints to acquire final intra-section segmentation. Then, we use a supervised learning based linking procedure for the inter-section neuron reconstruction. Also, we develop a semi-automatic method that utilizes the intermediate outputs of our automatic algorithm and achieves intra-segmentation with minimal user intervention. The experimental results show that our automatic method can achieve close-to-human intra-segmentation accuracy and state-of-the-art inter-section reconstruction accuracy. We also show that our semi-automatic method can further improve the intra-segmentation accuracy. PMID:24491638
Impact of translation on named-entity recognition in radiology texts

PubMed Central

Pedro, Vasco

2017-01-01

Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455
Automatically identifying health outcome information in MEDLINE records.

PubMed

Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George

2006-01-01

Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.
Automating annotation of information-giving for analysis of clinical conversation.

PubMed

Mayfield, Elijah; Laws, M Barton; Wilson, Ira B; Penstein Rosé, Carolyn

2014-02-01

Coding of clinical communication for fine-grained features such as speech acts has produced a substantial literature. However, annotation by humans is laborious and expensive, limiting application of these methods. We aimed to show that through machine learning, computers could code certain categories of speech acts with sufficient reliability to make useful distinctions among clinical encounters. The data were transcripts of 415 routine outpatient visits of HIV patients which had previously been coded for speech acts using the Generalized Medical Interaction Analysis System (GMIAS); 50 had also been coded for larger scale features using the Comprehensive Analysis of the Structure of Encounters System (CASES). We aggregated selected speech acts into information-giving and requesting, then trained the machine to automatically annotate using logistic regression classification. We evaluated reliability by per-speech act accuracy. We used multiple regression to predict patient reports of communication quality from post-visit surveys using the patient and provider information-giving to information-requesting ratio (briefly, information-giving ratio) and patient gender. Automated coding produces moderate reliability with human coding (accuracy 71.2%, κ=0.57), with high correlation between machine and human prediction of the information-giving ratio (r=0.96). The regression significantly predicted four of five patient-reported measures of communication quality (r=0.263-0.344). The information-giving ratio is a useful and intuitive measure for predicting patient perception of provider-patient communication quality. These predictions can be made with automated annotation, which is a practical option for studying large collections of clinical encounters with objectivity, consistency, and low cost, providing greater opportunity for training and reflection for care providers.
Deformably registering and annotating whole CLARITY brains to an atlas via masked LDDMM

NASA Astrophysics Data System (ADS)

Kutten, Kwame S.; Vogelstein, Joshua T.; Charon, Nicolas; Ye, Li; Deisseroth, Karl; Miller, Michael I.

2016-04-01

The CLARITY method renders brains optically transparent to enable high-resolution imaging in the structurally intact brain. Anatomically annotating CLARITY brains is necessary for discovering which regions contain signals of interest. Manually annotating whole-brain, terabyte CLARITY images is difficult, time-consuming, subjective, and error-prone. Automatically registering CLARITY images to a pre-annotated brain atlas offers a solution, but is difficult for several reasons. Removal of the brain from the skull and subsequent storage and processing cause variable non-rigid deformations, thus compounding inter-subject anatomical variability. Additionally, the signal in CLARITY images arises from various biochemical contrast agents which only sparsely label brain structures. This sparse labeling challenges the most commonly used registration algorithms that need to match image histogram statistics to the more densely labeled histological brain atlases. The standard method is a multiscale Mutual Information B-spline algorithm that dynamically generates an average template as an intermediate registration target. We determined that this method performs poorly when registering CLARITY brains to the Allen Institute's Mouse Reference Atlas (ARA), because the image histogram statistics are poorly matched. Therefore, we developed a method (Mask-LDDMM) for registering CLARITY images, that automatically finds the brain boundary and learns the optimal deformation between the brain and atlas masks. Using Mask-LDDMM without an average template provided better results than the standard approach when registering CLARITY brains to the ARA. The LDDMM pipelines developed here provide a fast automated way to anatomically annotate CLARITY images; our code is available as open source software at http://NeuroData.io.
Design and development of a prototypical software for semi-automatic generation of test methodologies and security checklists for IT vulnerability assessment in small- and medium-sized enterprises (SME)

NASA Astrophysics Data System (ADS)

Möller, Thomas; Bellin, Knut; Creutzburg, Reiner

2015-03-01

The aim of this paper is to show the recent progress in the design and prototypical development of a software suite Copra Breeder* for semi-automatic generation of test methodologies and security checklists for IT vulnerability assessment in small and medium-sized enterprises.
Build your own low-cost seismic/bathymetric recorder annotator

USGS Publications Warehouse

Robinson, W.

1994-01-01

An inexpensive programmable annotator, completely compatible with at least three models of widely used graphic recorders (Raytheon LSR-1811, Raytheon LSR-1807 M, and EDO 550) has been developed to automatically write event marks and print up to sixteen numbers on the paper record. Event mark and character printout intervals, character height and character position are all selectable with front panel switches. Operation is completely compatible with recorders running in either continuous or start-stop mode. ?? 1994.
Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder

NASA Technical Reports Server (NTRS)

Staats, Matt

2009-01-01

We present work on a prototype tool based on the JavaPathfinder (JPF) model checker for automatically generating tests satisfying the MC/DC code coverage criterion. Using the Eclipse IDE, developers and testers can quickly instrument Java source code with JPF annotations covering all MC/DC coverage obligations, and JPF can then be used to automatically generate tests that satisfy these obligations. The prototype extension to JPF enables various tasks useful in automatic test generation to be performed, such as test suite reduction and execution of generated tests.
Comparative analysis of semantic localization accuracies between adult and pediatric DICOM CT images

NASA Astrophysics Data System (ADS)

Robertson, Duncan; Pathak, Sayan D.; Criminisi, Antonio; White, Steve; Haynor, David; Chen, Oliver; Siddiqui, Khan

2012-02-01

Existing literature describes a variety of techniques for semantic annotation of DICOM CT images, i.e. the automatic detection and localization of anatomical structures. Semantic annotation facilitates enhanced image navigation, linkage of DICOM image content and non-image clinical data, content-based image retrieval, and image registration. A key challenge for semantic annotation algorithms is inter-patient variability. However, while the algorithms described in published literature have been shown to cope adequately with the variability in test sets comprising adult CT scans, the problem presented by the even greater variability in pediatric anatomy has received very little attention. Most existing semantic annotation algorithms can only be extended to work on scans of both adult and pediatric patients by adapting parameters heuristically in light of patient size. In contrast, our approach, which uses random regression forests ('RRF'), learns an implicit model of scale variation automatically using training data. In consequence, anatomical structures can be localized accurately in both adult and pediatric CT studies without the need for parameter adaptation or additional information about patient scale. We show how the RRF algorithm is able to learn scale invariance from a combined training set containing a mixture of pediatric and adult scans. Resulting localization accuracy for both adult and pediatric data remains comparable with that obtained using RRFs trained and tested using only adult data.
A semi-automatic traffic sign detection, classification, and positioning system

NASA Astrophysics Data System (ADS)

Creusen, I. M.; Hazelhoff, L.; de With, P. H. N.

2012-01-01

The availability of large-scale databases containing street-level panoramic images offers the possibility to perform semi-automatic surveying of real-world objects such as traffic signs. These inventories can be performed significantly more efficiently than using conventional methods. Governmental agencies are interested in these inventories for maintenance and safety reasons. This paper introduces a complete semi-automatic traffic sign inventory system. The system consists of several components. First, a detection algorithm locates the 2D position of the traffic signs in the panoramic images. Second, a classification algorithm is used to identify the traffic sign. Third, the 3D position of the traffic sign is calculated using the GPS position of the photographs. Finally, the results are listed in a table for quick inspection and are also visualized in a web browser.
BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation

PubMed Central

Kiefer, Christina; Fehlmann, Tobias; Backes, Christina

2017-01-01

Abstract Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02–95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee. PMID:28472498
Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

PubMed Central

2014-01-01

Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods. PMID:25374614
ELECTROMAGNETIC AND ELECTROSTATIC GENERATORS: ANNOTATED BIBLIOGRAPHY.

DTIC Science & Technology

generator with split poles, ultrasonic-frequency generator, unipolar generator, single-phase micromotors , synchronous motor, asynchronous motor...asymmetrical rotor, magnetic circuit, dc micromotors , circuit for the automatic control of synchronized induction motors, induction torque micromotors , electric
Quantification of regional fat volume in rat MRI

NASA Astrophysics Data System (ADS)

Sacha, Jaroslaw P.; Cockman, Michael D.; Dufresne, Thomas E.; Trokhan, Darren

2003-05-01

Multiple initiatives in the pharmaceutical and beauty care industries are directed at identifying therapies for weight management. Body composition measurements are critical for such initiatives. Imaging technologies that can be used to measure body composition noninvasively include DXA (dual energy x-ray absorptiometry) and MRI (magnetic resonance imaging). Unlike other approaches, MRI provides the ability to perform localized measurements of fat distribution. Several factors complicate the automatic delineation of fat regions and quantification of fat volumes. These include motion artifacts, field non-uniformity, brightness and contrast variations, chemical shift misregistration, and ambiguity in delineating anatomical structures. We have developed an approach to deal practically with those challenges. The approach is implemented in a package, the Fat Volume Tool, for automatic detection of fat tissue in MR images of the rat abdomen, including automatic discrimination between abdominal and subcutaneous regions. We suppress motion artifacts using masking based on detection of implicit landmarks in the images. Adaptive object extraction is used to compensate for intensity variations. This approach enables us to perform fat tissue detection and quantification in a fully automated manner. The package can also operate in manual mode, which can be used for verification of the automatic analysis or for performing supervised segmentation. In supervised segmentation, the operator has the ability to interact with the automatic segmentation procedures to touch-up or completely overwrite intermediate segmentation steps. The operator's interventions steer the automatic segmentation steps that follow. This improves the efficiency and quality of the final segmentation. Semi-automatic segmentation tools (interactive region growing, live-wire, etc.) improve both the accuracy and throughput of the operator when working in manual mode. The quality of automatic segmentation has been evaluated by comparing the results of fully automated analysis to manual analysis of the same images. The comparison shows a high degree of correlation that validates the quality of the automatic segmentation approach.

The Influence of Endmember Selection Method in Extracting Impervious Surface from Airborne Hyperspectral Imagery

NASA Astrophysics Data System (ADS)

Wang, J.; Feng, B.

2016-12-01

Impervious surface area (ISA) has long been studied as an important input into moisture flux models. In general, ISA impedes groundwater recharge, increases stormflow/flood frequency, and alters in-stream and riparian habitats. Urban area is recognized as one of the richest ISA environment. Urban ISA mapping assists flood prevention and urban planning. Hyperspectral imagery (HI), for its ability to detect subtle spectral signature, becomes an ideal candidate in urban ISA mapping. To map ISA from HI involves endmember (EM) selection. The high degree of spatial and spectral heterogeneity of urban environment puts great difficulty in this task: a compromise point is needed between the automatic degree and the good representativeness of the method. The study tested one manual and two semi-automatic EM selection strategies. The manual and the first semi-automatic methods have been widely used in EM selection. The second semi-automatic EM selection method is rather new and has been only proposed for moderate spatial resolution satellite. The manual method visually selected the EM candidates from eight landcover types in the original image. The first semi-automatic method chose the EM candidates using a threshold over the pixel purity index (PPI) map. The second semi-automatic method used the triangle shape of the HI scatter plot in the n-Dimension visualizer to identify the V-I-S (vegetation-impervious surface-soil) EM candidates: the pixels locate at the triangle points. The initial EM candidates from the three methods were further refined by three indexes (EM average RMSE, minimum average spectral angle, and count based EM selection) and generated three spectral libraries, which were used to classify the test image. Spectral angle mapper was applied. The accuracy reports for the classification results were generated. The overall accuracy are 85% for the manual method, 81% for the PPI method, and 87% for the V-I-S method. The V-I-S EM selection method performs best in this study. This fact proves the value of V-I-S EM selection method in not only moderate spatial resolution satellite image but also the more and more accessible high spatial resolution airborne image. This semi-automatic EM selection method can be adopted into a wide range of remote sensing images and provide ISA map for hydrology analysis.
Achieving Accurate Automatic Sleep Staging on Manually Pre-processed EEG Data Through Synchronization Feature Extraction and Graph Metrics.

PubMed

Chriskos, Panteleimon; Frantzidis, Christos A; Gkivogkli, Polyxeni T; Bamidis, Panagiotis D; Kourtidou-Papadeli, Chrysoula

2018-01-01

Sleep staging, the process of assigning labels to epochs of sleep, depending on the stage of sleep they belong, is an arduous, time consuming and error prone process as the initial recordings are quite often polluted by noise from different sources. To properly analyze such data and extract clinical knowledge, noise components must be removed or alleviated. In this paper a pre-processing and subsequent sleep staging pipeline for the sleep analysis of electroencephalographic signals is described. Two novel methods of functional connectivity estimation (Synchronization Likelihood/SL and Relative Wavelet Entropy/RWE) are comparatively investigated for automatic sleep staging through manually pre-processed electroencephalographic recordings. A multi-step process that renders signals suitable for further analysis is initially described. Then, two methods that rely on extracting synchronization features from electroencephalographic recordings to achieve computerized sleep staging are proposed, based on bivariate features which provide a functional overview of the brain network, contrary to most proposed methods that rely on extracting univariate time and frequency features. Annotation of sleep epochs is achieved through the presented feature extraction methods by training classifiers, which are in turn able to accurately classify new epochs. Analysis of data from sleep experiments on a randomized, controlled bed-rest study, which was organized by the European Space Agency and was conducted in the "ENVIHAB" facility of the Institute of Aerospace Medicine at the German Aerospace Center (DLR) in Cologne, Germany attains high accuracy rates, over 90% based on ground truth that resulted from manual sleep staging by two experienced sleep experts. Therefore, it can be concluded that the above feature extraction methods are suitable for semi-automatic sleep staging.
Achieving Accurate Automatic Sleep Staging on Manually Pre-processed EEG Data Through Synchronization Feature Extraction and Graph Metrics

PubMed Central

Chriskos, Panteleimon; Frantzidis, Christos A.; Gkivogkli, Polyxeni T.; Bamidis, Panagiotis D.; Kourtidou-Papadeli, Chrysoula

2018-01-01

Sleep staging, the process of assigning labels to epochs of sleep, depending on the stage of sleep they belong, is an arduous, time consuming and error prone process as the initial recordings are quite often polluted by noise from different sources. To properly analyze such data and extract clinical knowledge, noise components must be removed or alleviated. In this paper a pre-processing and subsequent sleep staging pipeline for the sleep analysis of electroencephalographic signals is described. Two novel methods of functional connectivity estimation (Synchronization Likelihood/SL and Relative Wavelet Entropy/RWE) are comparatively investigated for automatic sleep staging through manually pre-processed electroencephalographic recordings. A multi-step process that renders signals suitable for further analysis is initially described. Then, two methods that rely on extracting synchronization features from electroencephalographic recordings to achieve computerized sleep staging are proposed, based on bivariate features which provide a functional overview of the brain network, contrary to most proposed methods that rely on extracting univariate time and frequency features. Annotation of sleep epochs is achieved through the presented feature extraction methods by training classifiers, which are in turn able to accurately classify new epochs. Analysis of data from sleep experiments on a randomized, controlled bed-rest study, which was organized by the European Space Agency and was conducted in the “ENVIHAB” facility of the Institute of Aerospace Medicine at the German Aerospace Center (DLR) in Cologne, Germany attains high accuracy rates, over 90% based on ground truth that resulted from manual sleep staging by two experienced sleep experts. Therefore, it can be concluded that the above feature extraction methods are suitable for semi-automatic sleep staging. PMID:29628883
BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing.

PubMed

Pang, Chao; Hendriksen, Dennis; Dijkstra, Martijn; van der Velde, K Joeri; Kuiper, Joel; Hillege, Hans L; Swertz, Morris A

2015-01-01

Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.
BiobankConnect: software to rapidly connect data elements for pooled analysis across biobanks using ontological and lexical indexing

PubMed Central

Pang, Chao; Hendriksen, Dennis; Dijkstra, Martijn; van der Velde, K Joeri; Kuiper, Joel; Hillege, Hans L; Swertz, Morris A

2015-01-01

Objective Pooling data across biobanks is necessary to increase statistical power, reveal more subtle associations, and synergize the value of data sources. However, searching for desired data elements among the thousands of available elements and harmonizing differences in terminology, data collection, and structure, is arduous and time consuming. Materials and methods To speed up biobank data pooling we developed BiobankConnect, a system to semi-automatically match desired data elements to available elements by: (1) annotating the desired elements with ontology terms using BioPortal; (2) automatically expanding the query for these elements with synonyms and subclass information using OntoCAT; (3) automatically searching available elements for these expanded terms using Lucene lexical matching; and (4) shortlisting relevant matches sorted by matching score. Results We evaluated BiobankConnect using human curated matches from EU-BioSHaRE, searching for 32 desired data elements in 7461 available elements from six biobanks. We found 0.75 precision at rank 1 and 0.74 recall at rank 10 compared to a manually curated set of relevant matches. In addition, best matches chosen by BioSHaRE experts ranked first in 63.0% and in the top 10 in 98.4% of cases, indicating that our system has the potential to significantly reduce manual matching work. Conclusions BiobankConnect provides an easy user interface to significantly speed up the biobank harmonization process. It may also prove useful for other forms of biomedical data integration. All the software can be downloaded as a MOLGENIS open source app from http://www.github.com/molgenis, with a demo available at http://www.biobankconnect.org. PMID:25361575
AutoFACT: An Automatic Functional Annotation and Classification Tool

PubMed Central

Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

2005-01-01

Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857
Formalizing Evidence Type Definitions for Drug-Drug Interaction Studies to Improve Evidence Base Curation.

PubMed

Utecht, Joseph; Brochhausen, Mathias; Judkins, John; Schneider, Jodi; Boyce, Richard D

2017-01-01

In this research we aim to demonstrate that an ontology-based system can categorize potential drug-drug interaction (PDDI) evidence items into complex types based on a small set of simple questions. Such a method could increase the transparency and reliability of PDDI evidence evaluation, while also reducing the variations in content and seriousness ratings present in PDDI knowledge bases. We extended the DIDEO ontology with 44 formal evidence type definitions. We then manually annotated the evidence types of 30 evidence items. We tested an RDF/OWL representation of answers to a small number of simple questions about each of these 30 evidence items and showed that automatic inference can determine the detailed evidence types based on this small number of simpler questions. These results show proof-of-concept for a decision support infrastructure that frees the evidence evaluator from mastering relatively complex written evidence type definitions.
Applied and implied semantics in crystallographic publishing

PubMed Central

2012-01-01

Background Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles. Results Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value. Conclusions The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines. PMID:22932420
A Graph-Based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications

PubMed Central

Cameron, Delroy; Bodenreider, Olivier; Yalamanchili, Hima; Danh, Tu; Vallabhaneni, Sreeram; Thirunarayan, Krishnaprasad; Sheth, Amit P.; Rindflesch, Thomas C.

2014-01-01

Objectives This paper presents a methodology for recovering and decomposing Swanson’s Raynaud Syndrome–Fish Oil Hypothesis semi-automatically. The methodology leverages the semantics of assertions extracted from biomedical literature (called semantic predications) along with structured background knowledge and graph-based algorithms to semi-automatically capture the informative associations originally discovered manually by Swanson. Demonstrating that Swanson’s manually intensive techniques can be undertaken semi-automatically, paves the way for fully automatic semantics-based hypothesis generation from scientific literature. Methods Semantic predications obtained from biomedical literature allow the construction of labeled directed graphs which contain various associations among concepts from the literature. By aggregating such associations into informative subgraphs, some of the relevant details originally articulated by Swanson has been uncovered. However, by leveraging background knowledge to bridge important knowledge gaps in the literature, a methodology for semi-automatically capturing the detailed associations originally explicated in natural language by Swanson has been developed. Results Our methodology not only recovered the 3 associations commonly recognized as Swanson’s Hypothesis, but also decomposed them into an additional 16 detailed associations, formulated as chains of semantic predications. Altogether, 14 out of the 19 associations that can be attributed to Swanson were retrieved using our approach. To the best of our knowledge, such an in-depth recovery and decomposition of Swanson’s Hypothesis has never been attempted. Conclusion In this work therefore, we presented a methodology for semi- automatically recovering and decomposing Swanson’s RS-DFO Hypothesis using semantic representations and graph algorithms. Our methodology provides new insights into potential prerequisites for semantics-driven Literature-Based Discovery (LBD). These suggest that three critical aspects of LBD include: 1) the need for more expressive representations beyond Swanson’s ABC model; 2) an ability to accurately extract semantic information from text; and 3) the semantic integration of scientific literature with structured background knowledge. PMID:23026233
Semi-Automatic Extraction Algorithm for Images of the Ciliary Muscle

PubMed Central

Kao, Chiu-Yen; Richdale, Kathryn; Sinnott, Loraine T.; Ernst, Lauren E.; Bailey, Melissa D.

2011-01-01

Purpose To development and evaluate a semi-automatic algorithm for segmentation and morphological assessment of the dimensions of the ciliary muscle in Visante™ Anterior Segment Optical Coherence Tomography images. Methods Geometric distortions in Visante images analyzed as binary files were assessed by imaging an optical flat and human donor tissue. The appropriate pixel/mm conversion factor to use for air (n = 1) was estimated by imaging calibration spheres. A semi-automatic algorithm was developed to extract the dimensions of the ciliary muscle from Visante images. Measurements were also made manually using Visante software calipers. Interclass correlation coefficients (ICC) and Bland-Altman analyses were used to compare the methods. A multilevel model was fitted to estimate the variance of algorithm measurements that was due to differences within- and between-examiners in scleral spur selection versus biological variability. Results The optical flat and the human donor tissue were imaged and appeared without geometric distortions in binary file format. Bland-Altman analyses revealed that caliper measurements tended to underestimate ciliary muscle thickness at 3 mm posterior to the scleral spur in subjects with the thickest ciliary muscles (t = 3.6, p < 0.001). The percent variance due to within- or between-examiner differences in scleral spur selection was found to be small (6%) when compared to the variance due to biological difference across subjects (80%). Using the mean of measurements from three images achieved an estimated ICC of 0.85. Conclusions The semi-automatic algorithm successfully segmented the ciliary muscle for further measurement. Using the algorithm to follow the scleral curvature to locate more posterior measurements is critical to avoid underestimating thickness measurements. This semi-automatic algorithm will allow for repeatable, efficient, and masked ciliary muscle measurements in large datasets. PMID:21169877
Ontology design patterns to disambiguate relations between genes and gene products in GENIA

PubMed Central

2011-01-01

Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/. PMID:22166341
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

PubMed

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

2016-01-01

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.
EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, wellmore » documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.« less
EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

DOE PAGES

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; ...

2016-01-01

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, wellmore » documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.« less
Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG

PubMed Central

Bigdely-Shamlo, Nima; Cockfield, Jeremy; Makeig, Scott; Rognon, Thomas; La Valle, Chris; Miyakoshi, Makoto; Robbins, Kay A.

2016-01-01

Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies. PMID:27799907
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

PubMed

Stanescu, Ana; Caragea, Doina

2015-01-01

Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.
An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

PubMed Central

2015-01-01

Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316
Application of a Novel Semi-Automatic Technique for Determining the Bilateral Symmetry Plane of the Facial Skeleton of Normal Adult Males.

PubMed

Roumeliotis, Grayson; Willing, Ryan; Neuert, Mark; Ahluwalia, Romy; Jenkyn, Thomas; Yazdani, Arjang

2015-09-01

The accurate assessment of symmetry in the craniofacial skeleton is important for cosmetic and reconstructive craniofacial surgery. Although there have been several published attempts to develop an accurate system for determining the correct plane of symmetry, all are inaccurate and time consuming. Here, the authors applied a novel semi-automatic method for the calculation of craniofacial symmetry, based on principal component analysis and iterative corrective point computation, to a large sample of normal adult male facial computerized tomography scans obtained clinically (n = 32). The authors hypothesized that this method would generate planes of symmetry that would result in less error when one side of the face was compared to the other than a symmetry plane generated using a plane defined by cephalometric landmarks. When a three-dimensional model of one side of the face was reflected across the semi-automatic plane of symmetry there was less error than when reflected across the cephalometric plane. The semi-automatic plane was also more accurate when the locations of bilateral cephalometric landmarks (eg, frontozygomatic sutures) were compared across the face. The authors conclude that this method allows for accurate and fast measurements of craniofacial symmetry. This has important implications for studying the development of the facial skeleton, and clinical application for reconstruction.
Multi-label literature classification based on the Gene Ontology graph.

PubMed

Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

2008-12-08

The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
An algorithm for optimal fusion of atlases with different labeling protocols

PubMed Central

Iglesias, Juan Eugenio; Sabuncu, Mert Rory; Aganj, Iman; Bhatt, Priyanka; Casillas, Christen; Salat, David; Boxer, Adam; Fischl, Bruce; Van Leemput, Koen

2014-01-01

In this paper we present a novel label fusion algorithm suited for scenarios in which different manual delineation protocols with potentially disparate structures have been used to annotate the training scans (hereafter referred to as “atlases”). Such scenarios arise when atlases have missing structures, when they have been labeled with different levels of detail, or when they have been taken from different heterogeneous databases. The proposed algorithm can be used to automatically label a novel scan with any of the protocols from the training data. Further, it enables us to generate new labels that are not present in any delineation protocol by defining intersections on the underling labels. We first use probabilistic models of label fusion to generalize three popular label fusion techniques to the multi-protocol setting: majority voting, semi-locally weighted voting and STAPLE. Then, we identify some shortcomings of the generalized methods, namely the inability to produce meaningful posterior probabilities for the different labels (majority voting, semi-locally weighted voting) and to exploit the similarities between the atlases (all three methods). Finally, we propose a novel generative label fusion model that can overcome these drawbacks. We use the proposed method to combine four brain MRI datasets labeled with different protocols (with a total of 102 unique labeled structures) to produce segmentations of 148 brain regions. Using cross-validation, we show that the proposed algorithm outperforms the generalizations of majority voting, semi-locally weighted voting and STAPLE (mean Dice score 83%, vs. 77%, 80% and 79%, respectively). We also evaluated the proposed algorithm in an aging study, successfully reproducing some well-known results in cortical and subcortical structures. PMID:25463466

Semantic Annotation of Computational Components

NASA Technical Reports Server (NTRS)

Vanderbilt, Peter; Mehrotra, Piyush

2004-01-01

This paper describes a methodology to specify machine-processable semantic descriptions of computational components to enable them to be shared and reused. A particular focus of this scheme is to enable automatic compositon of such components into simple work-flows.
Generating quality word sense disambiguation test sets based on MeSH indexing.

PubMed

Fan, Jung-Wei; Friedman, Carol

2009-11-14

Word sense disambiguation (WSD) determines the correct meaning of a word that has more than one meaning, and is a critical step in biomedical natural language processing, as interpretation of information in text can be correct only if the meanings of their component terms are correctly identified first. Quality evaluation sets are important to WSD because they can be used as representative samples for developing automatic programs and as referees for comparing different WSD programs. To help create quality test sets for WSD, we developed a MeSH-based automatic sense-tagging method that preferentially annotates terms being topical of the text. Preliminary results were promising and revealed important issues to be addressed in biomedical WSD research. We also suggest that, by cross-validating with 2 or 3 annotators, the method should be able to efficiently generate quality WSD test sets. Online supplement is available at: http://www.dbmi.columbia.edu/~juf7002/AMIA09.
AdjScales: Visualizing Differences between Adjectives for Language Learners

NASA Astrophysics Data System (ADS)

Sheinman, Vera; Tokunaga, Takenobu

In this study we introduce AdjScales, a method for scaling similar adjectives by their strength. It combines existing Web-based computational linguistic techniques in order to automatically differentiate between similar adjectives that describe the same property by strength. Though this kind of information is rarely present in most of the lexical resources and dictionaries, it may be useful for language learners that try to distinguish between similar words. Additionally, learners might gain from a simple visualization of these differences using unidimensional scales. The method is evaluated by comparison with annotation on a subset of adjectives from WordNet by four native English speakers. It is also compared against two non-native speakers of English. The collected annotation is an interesting resource in its own right. This work is a first step toward automatic differentiation of meaning between similar words for language learners. AdjScales can be useful for lexical resource enhancement.
Evaluation of an automatic MR-based gold fiducial marker localisation method for MR-only prostate radiotherapy

NASA Astrophysics Data System (ADS)

Maspero, Matteo; van den Berg, Cornelis A. T.; Zijlstra, Frank; Sikkes, Gonda G.; de Boer, Hans C. J.; Meijer, Gert J.; Kerkmeijer, Linda G. W.; Viergever, Max A.; Lagendijk, Jan J. W.; Seevinck, Peter R.

2017-10-01

An MR-only radiotherapy planning (RTP) workflow would reduce the cost, radiation exposure and uncertainties introduced by CT-MRI registrations. In the case of prostate treatment, one of the remaining challenges currently holding back the implementation of an RTP workflow is the MR-based localisation of intraprostatic gold fiducial markers (FMs), which is crucial for accurate patient positioning. Currently, MR-based FM localisation is clinically performed manually. This is sub-optimal, as manual interaction increases the workload. Attempts to perform automatic FM detection often rely on being able to detect signal voids induced by the FMs in magnitude images. However, signal voids may not always be sufficiently specific, hampering accurate and robust automatic FM localisation. Here, we present an approach that aims at automatic MR-based FM localisation. This method is based on template matching using a library of simulated complex-valued templates, and exploiting the behaviour of the complex MR signal in the vicinity of the FM. Clinical evaluation was performed on seventeen prostate cancer patients undergoing external beam radiotherapy treatment. Automatic MR-based FM localisation was compared to manual MR-based and semi-automatic CT-based localisation (the current gold standard) in terms of detection rate and the spatial accuracy and precision of localisation. The proposed method correctly detected all three FMs in 15/17 patients. The spatial accuracy (mean) and precision (STD) were 0.9 mm and 0.5 mm respectively, which is below the voxel size of 1.1 × 1.1 × 1.2 mm3 and comparable to MR-based manual localisation. FM localisation failed (3/51 FMs) in the presence of bleeding or calcifications in the direct vicinity of the FM. The method was found to be spatially accurate and precise, which is essential for clinical use. To overcome any missed detection, we envision the use of the proposed method along with verification by an observer. This will result in a semi-automatic workflow facilitating the introduction of an MR-only workflow.
Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study

PubMed Central

Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

2016-01-01

Background Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Objective Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Methods Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. Results One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). Conclusions The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow. PMID:27066806
Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.

PubMed

Hochheiser, Harry; Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

2016-04-11

Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow.
Methods for eliciting, annotating, and analyzing databases for child speech development.

PubMed

Beckman, Mary E; Plummer, Andrew R; Munson, Benjamin; Reidy, Patrick F

2017-09-01

Methods from automatic speech recognition (ASR), such as segmentation and forced alignment, have facilitated the rapid annotation and analysis of very large adult speech databases and databases of caregiver-infant interaction, enabling advances in speech science that were unimaginable just a few decades ago. This paper centers on two main problems that must be addressed in order to have analogous resources for developing and exploiting databases of young children's speech. The first problem is to understand and appreciate the differences between adult and child speech that cause ASR models developed for adult speech to fail when applied to child speech. These differences include the fact that children's vocal tracts are smaller than those of adult males and also changing rapidly in size and shape over the course of development, leading to between-talker variability across age groups that dwarfs the between-talker differences between adult men and women. Moreover, children do not achieve fully adult-like speech motor control until they are young adults, and their vocabularies and phonological proficiency are developing as well, leading to considerably more within-talker variability as well as more between-talker variability. The second problem then is to determine what annotation schemas and analysis techniques can most usefully capture relevant aspects of this variability. Indeed, standard acoustic characterizations applied to child speech reveal that adult-centered annotation schemas fail to capture phenomena such as the emergence of covert contrasts in children's developing phonological systems, while also revealing children's nonuniform progression toward community speech norms as they acquire the phonological systems of their native languages. Both problems point to the need for more basic research into the growth and development of the articulatory system (as well as of the lexicon and phonological system) that is oriented explicitly toward the construction of age-appropriate computational models.
MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

PubMed Central

Numa, Hisataka; Itoh, Takeshi

2014-01-01

The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915
Prepare-Participate-Connect: Active Learning with Video Annotation

ERIC Educational Resources Information Center

Colasante, Meg; Douglas, Kathy

2016-01-01

Annotation of video provides students with the opportunity to view and engage with audiovisual content in an interactive and participatory way rather than in passive-receptive mode. This article discusses research into the use of video annotation in four vocational programs at RMIT University in Melbourne, which allowed students to interact with…
Assisted annotation of medical free text using RapTAT

PubMed Central

Gobbel, Glenn T; Garvin, Jennifer; Reeves, Ruth; Cronin, Robert M; Heavirland, Julia; Williams, Jenifer; Weaver, Allison; Jayaramaraja, Shrimalini; Giuse, Dario; Speroff, Theodore; Brown, Steven H; Xu, Hua; Matheny, Michael E

2014-01-01

Objective To determine whether assisted annotation using interactive training can reduce the time required to annotate a clinical document corpus without introducing bias. Materials and methods A tool, RapTAT, was designed to assist annotation by iteratively pre-annotating probable phrases of interest within a document, presenting the annotations to a reviewer for correction, and then using the corrected annotations for further machine learning-based training before pre-annotating subsequent documents. Annotators reviewed 404 clinical notes either manually or using RapTAT assistance for concepts related to quality of care during heart failure treatment. Notes were divided into 20 batches of 19–21 documents for iterative annotation and training. Results The number of correct RapTAT pre-annotations increased significantly and annotation time per batch decreased by ∼50% over the course of annotation. Annotation rate increased from batch to batch for assisted but not manual reviewers. Pre-annotation F-measure increased from 0.5 to 0.6 to >0.80 (relative to both assisted reviewer and reference annotations) over the first three batches and more slowly thereafter. Overall inter-annotator agreement was significantly higher between RapTAT-assisted reviewers (0.89) than between manual reviewers (0.85). Discussion The tool reduced workload by decreasing the number of annotations needing to be added and helping reviewers to annotate at an increased rate. Agreement between the pre-annotations and reference standard, and agreement between the pre-annotations and assisted annotations, were similar throughout the annotation process, which suggests that pre-annotation did not introduce bias. Conclusions Pre-annotations generated by a tool capable of interactive training can reduce the time required to create an annotated document corpus by up to 50%. PMID:24431336
System for definition of the central-chest vasculature

NASA Astrophysics Data System (ADS)

Taeprasartsit, Pinyo; Higgins, William E.

2009-02-01

Accurate definition of the central-chest vasculature from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. For instance, the aorta and pulmonary artery help in automatic definition of the Mountain lymph-node stations for lung-cancer staging. This work presents a system for defining major vascular structures in the central chest. The system provides automatic methods for extracting the aorta and pulmonary artery and semi-automatic methods for extracting the other major central chest arteries/veins, such as the superior vena cava and azygos vein. Automatic aorta and pulmonary artery extraction are performed by model fitting and selection. The system also extracts certain vascular structure information to validate outputs. A semi-automatic method extracts vasculature by finding the medial axes between provided important sites. Results of the system are applied to lymph-node station definition and guidance of bronchoscopic biopsy.
Note: An automated image analysis method for high-throughput classification of surface-bound bacterial cell motions.

PubMed

Shen, Simon; Syal, Karan; Tao, Nongjian; Wang, Shaopeng

2015-12-01

We present a Single-Cell Motion Characterization System (SiCMoCS) to automatically extract bacterial cell morphological features from microscope images and use those features to automatically classify cell motion for rod shaped motile bacterial cells. In some imaging based studies, bacteria cells need to be attached to the surface for time-lapse observation of cellular processes such as cell membrane-protein interactions and membrane elasticity. These studies often generate large volumes of images. Extracting accurate bacterial cell morphology features from these images is critical for quantitative assessment. Using SiCMoCS, we demonstrated simultaneous and automated motion tracking and classification of hundreds of individual cells in an image sequence of several hundred frames. This is a significant improvement from traditional manual and semi-automated approaches to segmenting bacterial cells based on empirical thresholds, and a first attempt to automatically classify bacterial motion types for motile rod shaped bacterial cells, which enables rapid and quantitative analysis of various types of bacterial motion.
Fuzzy logic and image processing techniques for the interpretation of seismic data

NASA Astrophysics Data System (ADS)

Orozco-del-Castillo, M. G.; Ortiz-Alemán, C.; Urrutia-Fucugauchi, J.; Rodríguez-Castellanos, A.

2011-06-01

Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

PubMed

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.Available on: http://babar.unige.ch:8082/neXtA5Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp. © The Author(s) 2016. Published by Oxford University Press.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt

PubMed Central

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp PMID:27374119
AIRS Maps from Space Processing Software

NASA Technical Reports Server (NTRS)

Thompson, Charles K.; Licata, Stephen J.

2012-01-01

This software package processes Atmospheric Infrared Sounder (AIRS) Level 2 swath standard product geophysical parameters, and generates global, colorized, annotated maps. It automatically generates daily and multi-day averaged colorized and annotated maps of various AIRS Level 2 swath geophysical parameters. It also generates AIRS input data sets for Eyes on Earth, Puffer-sphere, and Magic Planet. This program is tailored to AIRS Level 2 data products. It re-projects data into 1/4-degree grids that can be combined and averaged for any number of days. The software scales and colorizes global grids utilizing AIRS-specific color tables, and annotates images with title and color bar. This software can be tailored for use with other swath data products for the purposes of visualization.
Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases.

PubMed

Gobeill, Julien; Pasche, Emilie; Vishnyakova, Dina; Ruch, Patrick

2013-01-01

The available curated data lag behind current biological knowledge contained in the literature. Text mining can assist biologists and curators to locate and access this knowledge, for instance by characterizing the functional profile of publications. Gene Ontology (GO) category assignment in free text already supports various applications, such as powering ontology-based search engines, finding curation-relevant articles (triage) or helping the curator to identify and encode functions. Popular text mining tools for GO classification are based on so called thesaurus-based--or dictionary-based--approaches, which exploit similarities between the input text and GO terms themselves. But their effectiveness remains limited owing to the complex nature of GO terms, which rarely occur in text. In contrast, machine learning approaches exploit similarities between the input text and already curated instances contained in a knowledge base to infer a functional profile. GO Annotations (GOA) and MEDLINE make possible to exploit a growing amount of curated abstracts (97 000 in November 2012) for populating this knowledge base. Our study compares a state-of-the-art thesaurus-based system with a machine learning system (based on a k-Nearest Neighbours algorithm) for the task of proposing a functional profile for unseen MEDLINE abstracts, and shows how resources and performances have evolved. Systems are evaluated on their ability to propose for a given abstract the GO terms (2.8 on average) used for curation in GOA. We show that since 2006, although a massive effort was put into adding synonyms in GO (+300%), our thesaurus-based system effectiveness is rather constant, reaching from 0.28 to 0.31 for Recall at 20 (R20). In contrast, thanks to its knowledge base growth, our machine learning system has steadily improved, reaching from 0.38 in 2006 to 0.56 for R20 in 2012. Integrated in semi-automatic workflows or in fully automatic pipelines, such systems are more and more efficient to provide assistance to biologists. DATABASE URL: http://eagl.unige.ch/GOCat/
Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods.

PubMed

Martin, Guillaume; Baurens, Franc-Christophe; Droc, Gaëtan; Rouard, Mathieu; Cenci, Alberto; Kilian, Andrzej; Hastie, Alex; Doležel, Jaroslav; Aury, Jean-Marc; Alberti, Adriana; Carreel, Françoise; D'Hont, Angélique

2016-03-16

Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
Optimizing Automatic Deployment Using Non-functional Requirement Annotations

NASA Astrophysics Data System (ADS)

Kugele, Stefan; Haberl, Wolfgang; Tautschnig, Michael; Wechs, Martin

Model-driven development has become common practice in design of safety-critical real-time systems. High-level modeling constructs help to reduce the overall system complexity apparent to developers. This abstraction caters for fewer implementation errors in the resulting systems. In order to retain correctness of the model down to the software executed on a concrete platform, human faults during implementation must be avoided. This calls for an automatic, unattended deployment process including allocation, scheduling, and platform configuration.
Chest wall segmentation in automated 3D breast ultrasound scans.

PubMed

Tan, Tao; Platel, Bram; Mann, Ritse M; Huisman, Henkjan; Karssemeijer, Nico

2013-12-01

In this paper, we present an automatic method to segment the chest wall in automated 3D breast ultrasound images. Determining the location of the chest wall in automated 3D breast ultrasound images is necessary in computer-aided detection systems to remove automatically detected cancer candidates beyond the chest wall and it can be of great help for inter- and intra-modal image registration. We show that the visible part of the chest wall in an automated 3D breast ultrasound image can be accurately modeled by a cylinder. We fit the surface of our cylinder model to a set of automatically detected rib-surface points. The detection of the rib-surface points is done by a classifier using features representing local image intensity patterns and presence of rib shadows. Due to attenuation of the ultrasound signal, a clear shadow is visible behind the ribs. Evaluation of our segmentation method is done by computing the distance of manually annotated rib points to the surface of the automatically detected chest wall. We examined the performance on images obtained with the two most common 3D breast ultrasound devices in the market. In a dataset of 142 images, the average mean distance of the annotated points to the segmented chest wall was 5.59 ± 3.08 mm. Copyright © 2012 Elsevier B.V. All rights reserved.

Automatic 3D reconstruction of electrophysiology catheters from two-view monoplane C-arm image sequences.

PubMed

Baur, Christoph; Milletari, Fausto; Belagiannis, Vasileios; Navab, Nassir; Fallavollita, Pascal

2016-07-01

Catheter guidance is a vital task for the success of electrophysiology interventions. It is usually provided through fluoroscopic images that are taken intra-operatively. The cardiologists, who are typically equipped with C-arm systems, scan the patient from multiple views rotating the fluoroscope around one of its axes. The resulting sequences allow the cardiologists to build a mental model of the 3D position of the catheters and interest points from the multiple views. We describe and compare different 3D catheter reconstruction strategies and ultimately propose a novel and robust method for the automatic reconstruction of 3D catheters in non-synchronized fluoroscopic sequences. This approach does not purely rely on triangulation but incorporates prior knowledge about the catheters. In conjunction with an automatic detection method, we demonstrate the performance of our method compared to ground truth annotations. In our experiments that include 20 biplane datasets, we achieve an average reprojection error of 0.43 mm and an average reconstruction error of 0.67 mm compared to gold standard annotation. In clinical practice, catheters suffer from complex motion due to the combined effect of heartbeat and respiratory motion. As a result, any 3D reconstruction algorithm via triangulation is imprecise. We have proposed a new method that is fully automatic and highly accurate to reconstruct catheters in three dimensions.
A Case Study of Using a Social Annotation Tool to Support Collaboratively Learning

ERIC Educational Resources Information Center

Gao, Fei

2013-01-01

The purpose of the study was to understand student interaction and learning supported by a collaboratively social annotation tool--Diigo. The researcher examined through a case study how students participated and interacted when learning an online text with the social annotation tool--Diigo, and how they perceived their experience. The findings…
Towards the automated analysis and database development of defibrillator data from cardiac arrest.

PubMed

Eftestøl, Trygve; Sherman, Lawrence D

2014-01-01

During resuscitation of cardiac arrest victims a variety of information in electronic format is recorded as part of the documentation of the patient care contact and in order to be provided for case review for quality improvement. Such review requires considerable effort and resources. There is also the problem of interobserver effects. We show that it is possible to efficiently analyze resuscitation episodes automatically using a minimal set of the available information. A minimal set of variables is defined which describe therapeutic events (compression sequences and defibrillations) and corresponding patient response events (annotated rhythm transitions). From this a state sequence representation of the resuscitation episode is constructed and an algorithm is developed for reasoning with this representation and extract review variables automatically. As a case study, the method is applied to the data abstraction process used in the King County EMS. The automatically generated variables are compared to the original ones with accuracies ≥ 90% for 18 variables and ≥ 85% for the remaining four variables. It is possible to use the information present in the CPR process data recorded by the AED along with rhythm and chest compression annotations to automate the episode review.
Optimizing the 3D-reconstruction technique for serial block-face scanning electron microscopy.

PubMed

Wernitznig, Stefan; Sele, Mariella; Urschler, Martin; Zankel, Armin; Pölt, Peter; Rind, F Claire; Leitinger, Gerd

2016-05-01

Elucidating the anatomy of neuronal circuits and localizing the synaptic connections between neurons, can give us important insights in how the neuronal circuits work. We are using serial block-face scanning electron microscopy (SBEM) to investigate the anatomy of a collision detection circuit including the Lobula Giant Movement Detector (LGMD) neuron in the locust, Locusta migratoria. For this, thousands of serial electron micrographs are produced that allow us to trace the neuronal branching pattern. The reconstruction of neurons was previously done manually by drawing cell outlines of each cell in each image separately. This approach was very time consuming and troublesome. To make the process more efficient a new interactive software was developed. It uses the contrast between the neuron under investigation and its surrounding for semi-automatic segmentation. For segmentation the user sets starting regions manually and the algorithm automatically selects a volume within the neuron until the edges corresponding to the neuronal outline are reached. Internally the algorithm optimizes a 3D active contour segmentation model formulated as a cost function taking the SEM image edges into account. This reduced the reconstruction time, while staying close to the manual reference segmentation result. Our algorithm is easy to use for a fast segmentation process, unlike previous methods it does not require image training nor an extended computing capacity. Our semi-automatic segmentation algorithm led to a dramatic reduction in processing time for the 3D-reconstruction of identified neurons. Copyright © 2016 Elsevier B.V. All rights reserved.
Integrating systems biology models and biomedical ontologies

PubMed Central

2011-01-01

Background Systems biology is an approach to biology that emphasizes the structure and dynamic behavior of biological systems and the interactions that occur within them. To succeed, systems biology crucially depends on the accessibility and integration of data across domains and levels of granularity. Biomedical ontologies were developed to facilitate such an integration of data and are often used to annotate biosimulation models in systems biology. Results We provide a framework to integrate representations of in silico systems biology with those of in vivo biology as described by biomedical ontologies and demonstrate this framework using the Systems Biology Markup Language. We developed the SBML Harvester software that automatically converts annotated SBML models into OWL and we apply our software to those biosimulation models that are contained in the BioModels Database. We utilize the resulting knowledge base for complex biological queries that can bridge levels of granularity, verify models based on the biological phenomenon they represent and provide a means to establish a basic qualitative layer on which to express the semantics of biosimulation models. Conclusions We establish an information flow between biomedical ontologies and biosimulation models and we demonstrate that the integration of annotated biosimulation models and biomedical ontologies enables the verification of models as well as expressive queries. Establishing a bi-directional information flow between systems biology and biomedical ontologies has the potential to enable large-scale analyses of biological systems that span levels of granularity from molecules to organisms. PMID:21835028
Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.

PubMed

Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J

2014-01-01

Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.
A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

PubMed

Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

2014-10-12

BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
Annotations of Mexican bullfighting videos for semantic index

NASA Astrophysics Data System (ADS)

Montoya Obeso, Abraham; Oropesa Morales, Lester Arturo; Fernando Vázquez, Luis; Cocolán Almeda, Sara Ivonne; Stoian, Andrei; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Montiel Perez, Jesús Yalja; de la O Torres, Saul; Ramírez Acosta, Alejandro Alvaro

2015-09-01

The video annotation is important for web indexing and browsing systems. Indeed, in order to evaluate the performance of video query and mining techniques, databases with concept annotations are required. Therefore, it is necessary generate a database with a semantic indexing that represents the digital content of the Mexican bullfighting atmosphere. This paper proposes a scheme to make complex annotations in a video in the frame of multimedia search engine project. Each video is partitioned using our segmentation algorithm that creates shots of different length and different number of frames. In order to make complex annotations about the video, we use ELAN software. The annotations are done in two steps: First, we take note about the whole content in each shot. Second, we describe the actions as parameters of the camera like direction, position and deepness. As a consequence, we obtain a more complete descriptor of every action. In both cases we use the concepts of the TRECVid 2014 dataset. We also propose new concepts. This methodology allows to generate a database with the necessary information to create descriptors and algorithms capable to detect actions to automatically index and classify new bullfighting multimedia content.
A web-based video annotation system for crowdsourcing surveillance videos

NASA Astrophysics Data System (ADS)

Gadgil, Neeraj J.; Tahboub, Khalid; Kirsh, David; Delp, Edward J.

2014-03-01

Video surveillance systems are of a great value to prevent threats and identify/investigate criminal activities. Manual analysis of a huge amount of video data from several cameras over a long period of time often becomes impracticable. The use of automatic detection methods can be challenging when the video contains many objects with complex motion and occlusions. Crowdsourcing has been proposed as an effective method for utilizing human intelligence to perform several tasks. Our system provides a platform for the annotation of surveillance video in an organized and controlled way. One can monitor a surveillance system using a set of tools such as training modules, roles and labels, task management. This system can be used in a real-time streaming mode to detect any potential threats or as an investigative tool to analyze past events. Annotators can annotate video contents assigned to them for suspicious activity or criminal acts. First responders are then able to view the collective annotations and receive email alerts about a newly reported incident. They can also keep track of the annotators' training performance, manage their activities and reward their success. By providing this system, the process of video analysis is made more efficient.
A dorsolateral prefrontal cortex semi-automatic segmenter

NASA Astrophysics Data System (ADS)

Al-Hakim, Ramsey; Fallon, James; Nain, Delphine; Melonakos, John; Tannenbaum, Allen

2006-03-01

Structural, functional, and clinical studies in schizophrenia have, for several decades, consistently implicated dysfunction of the prefrontal cortex in the etiology of the disease. Functional and structural imaging studies, combined with clinical, psychometric, and genetic analyses in schizophrenia have confirmed the key roles played by the prefrontal cortex and closely linked "prefrontal system" structures such as the striatum, amygdala, mediodorsal thalamus, substantia nigra-ventral tegmental area, and anterior cingulate cortices. The nodal structure of the prefrontal system circuit is the dorsal lateral prefrontal cortex (DLPFC), or Brodmann area 46, which also appears to be the most commonly studied and cited brain area with respect to schizophrenia. 1, 2, 3, 4 In 1986, Weinberger et. al. tied cerebral blood flow in the DLPFC to schizophrenia.1 In 2001, Perlstein et. al. demonstrated that DLPFC activation is essential for working memory tasks commonly deficient in schizophrenia. 2 More recently, groups have linked morphological changes due to gene deletion and increased DLPFC glutamate concentration to schizophrenia. 3, 4 Despite the experimental and clinical focus on the DLPFC in structural and functional imaging, the variability of the location of this area, differences in opinion on exactly what constitutes DLPFC, and inherent difficulties in segmenting this highly convoluted cortical region have contributed to a lack of widely used standards for manual or semi-automated segmentation programs. Given these implications, we developed a semi-automatic tool to segment the DLPFC from brain MRI scans in a reproducible way to conduct further morphological and statistical studies. The segmenter is based on expert neuroanatomist rules (Fallon-Kindermann rules), inspired by cytoarchitectonic data and reconstructions presented by Rajkowska and Goldman-Rakic. 5 It is semi-automated to provide essential user interactivity. We present our results and provide details on our DLPFC open-source tool.
Alignment-Annotator web server: rendering and annotating sequence alignments.

PubMed

Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

2014-07-01

Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Alignment-Annotator web server: rendering and annotating sequence alignments

PubMed Central

Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

2014-01-01

Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. Availability: http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. PMID:24813445
Automated computer-based detection of encounter behaviours in groups of honeybees.

PubMed

Blut, Christina; Crespi, Alessandro; Mersch, Danielle; Keller, Laurent; Zhao, Linlin; Kollmann, Markus; Schellscheidt, Benjamin; Fülber, Carsten; Beye, Martin

2017-12-15

Honeybees form societies in which thousands of members integrate their behaviours to act as a single functional unit. We have little knowledge on how the collaborative features are regulated by workers' activities because we lack methods that enable collection of simultaneous and continuous behavioural information for each worker bee. In this study, we introduce the Bee Behavioral Annotation System (BBAS), which enables the automated detection of bees' behaviours in small observation hives. Continuous information on position and orientation were obtained by marking worker bees with 2D barcodes in a small observation hive. We computed behavioural and social features from the tracking information to train a behaviour classifier for encounter behaviours (interaction of workers via antennation) using a machine learning-based system. The classifier correctly detected 93% of the encounter behaviours in a group of bees, whereas 13% of the falsely classified behaviours were unrelated to encounter behaviours. The possibility of building accurate classifiers for automatically annotating behaviours may allow for the examination of individual behaviours of worker bees in the social environments of small observation hives. We envisage that BBAS will be a powerful tool for detecting the effects of experimental manipulation of social attributes and sub-lethal effects of pesticides on behaviour.
Mapping PDB chains to UniProtKB entries.

PubMed

Martin, Andrew C R

2005-12-01

UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamdani, Hazrina Yusof, E-mail: hazrina@mfrlab.org; Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas; Artymiuk, Peter J., E-mail: p.artymiuk@sheffield.ac.uk

A fundamental understanding of the atomic level interactions in ribonucleic acid (RNA) and how they contribute towards RNA architecture is an important knowledge platform to develop through the discovery of motifs from simple arrangements base pairs, to more complex arrangements such as triples and larger patterns involving non-standard interactions. The network of hydrogen bond interactions is important in connecting bases to form potential tertiary motifs. Therefore, there is an urgent need for the development of automated methods for annotating RNA 3D structures based on hydrogen bond interactions. COnnection tables Graphs for Nucleic ACids (COGNAC) is automated annotation system using graphmore » theoretical approaches that has been developed for the identification of RNA 3D motifs. This program searches for patterns in the unbroken networks of hydrogen bonds for RNA structures and capable of annotating base pairs and higher-order base interactions, which ranges from triples to sextuples. COGNAC was able to discover 22 out of 32 quadruples occurrences of the Haloarcula marismortui large ribosomal subunit (PDB ID: 1FFK) and two out of three occurrences of quintuple interaction reported by the non-canonical interactions in RNA (NCIR) database. These and several other interactions of interest will be discussed in this paper. These examples demonstrate that the COGNAC program can serve as an automated annotation system that can be used to annotate conserved base-base interactions and could be added as additional information to established RNA secondary structure prediction methods.« less
Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects.

PubMed

Pérez-Pérez, Martín; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Lourenço, Anália

2015-02-01

Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Initial Experience With Ultra High-Density Mapping of Human Right Atria.

PubMed

Bollmann, Andreas; Hilbert, Sebastian; John, Silke; Kosiuk, Jedrzej; Hindricks, Gerhard

2016-02-01

Recently, an automatic, high-resolution mapping system has been presented to accurately and quickly identify right atrial geometry and activation patterns in animals, but human data are lacking. This study aims to assess the clinical feasibility and accuracy of high-density electroanatomical mapping of various RA arrhythmias. Electroanatomical maps of the RA (35 partial and 24 complete) were created in 23 patients using a novel mini-basket catheter with 64 electrodes and automatic electrogram annotation. Median acquisition time was 6:43 minutes (0:39-23:05 minutes) with shorter times for partial (4.03 ± 4.13 minutes) than for complete maps (9.41 ± 4.92 minutes). During mapping 3,236 (710-16,306) data points were automatically annotated without manual correction. Maps obtained during sinus rhythm created geometry consistent with CT imaging and demonstrated activation originating at the middle to superior crista terminalis, while maps during CS pacing showed right atrial activation beginning at the infero-septal region. Activation patterns were consistent with cavotricuspid isthmus-dependent atrial flutter (n = 4), complex reentry tachycardia (n = 1), or ectopic atrial tachycardia (n = 2). His bundle and fractionated potentials in the slow pathway region were automatically detected in all patients. Ablation of the cavotricuspid isthmus (n = 9), the atrio-ventricular node (n = 2), atrial ectopy (n = 2), and the slow pathway (n = 3) was successfully and safely performed. RA mapping with this automatic high-density mapping system is fast, feasible, and safe. It is possible to reproducibly identify propagation of atrial activation during sinus rhythm, various tachycardias, and also complex reentrant arrhythmias. © 2015 Wiley Periodicals, Inc.
Assistive Technology as an artificial intelligence opportunity: Case study of letter-based, head movement driven communication.

PubMed

Miksztai-Réthey, Brigitta; Faragó, Kinga Bettina

2015-01-01

We studied an artificial intelligent assisted interaction between a computer and a human with severe speech and physical impairments (SSPI). In order to speed up AAC, we extended a former study of typing performance optimization using a framework that included head movement controlled assistive technology and an onscreen writing device. Quantitative and qualitative data were collected and analysed with mathematical methods, manual interpretation and semi-supervised machine video annotation. As the result of our research, in contrast to the former experiment's conclusions, we found that our participant had at least two different typing strategies. To maximize his communication efficiency, a more complex assistive tool is suggested, which takes the different methods into consideration.
SNAD: Sequence Name Annotation-based Designer.

PubMed

Sidorov, Igor A; Reshetov, Denis A; Gorbalenya, Alexander E

2009-08-14

A growing diversity of biological data is tagged with unique identifiers (UIDs) associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Here we introduce SNAD (Sequence Name Annotation-based Designer) that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list) into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.
Automated Gene Ontology annotation for anonymous sequence data.

PubMed

Hennig, Steffen; Groth, Detlef; Lehrach, Hans

2003-07-01

Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome

PubMed Central

Thibaud-Nissen, Françoise; Campbell, Matthew; Hamilton, John P; Zhu, Wei; Buell, C Robin

2007-01-01

Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website , as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at . PMID:17961238
Semi-automatic volume measurement for orbital fat and total extraocular muscles based on Cube FSE-flex sequence in patients with thyroid-associated ophthalmopathy.

PubMed

Tang, X; Liu, H; Chen, L; Wang, Q; Luo, B; Xiang, N; He, Y; Zhu, W; Zhang, J

2018-05-24

To investigate the accuracy of two semi-automatic segmentation measurements based on magnetic resonance imaging (MRI) three-dimensional (3D) Cube fast spin echo (FSE)-flex sequence in phantoms, and to evaluate the feasibility of determining the volumetric alterations of orbital fat (OF) and total extraocular muscles (TEM) in patients with thyroid-associated ophthalmopathy (TAO) by semi-automatic segmentation. Forty-four fatty (n=22) and lean (n=22) phantoms were scanned by using Cube FSE-flex sequence with a 3 T MRI system. Their volumes were measured by manual segmentation (MS) and two semi-automatic segmentation algorithms (regional growing [RG], multi-dimensional threshold [MDT]). Pearson correlation and Bland-Altman analysis were used to evaluate the measuring accuracy of MS, RG, and MDT in phantoms as compared with the true volume. Then, OF and TEM volumes of 15 TAO patients and 15 normal controls were measured using MDT. Paired-sample t-tests were used to compare the volumes and volume ratios of different orbital tissues between TAO patients and controls. Each segmentation (MS RG, MDT) has a significant correlation (p<0.01) with true volume. There was a minimal bias for MS, and a stronger agreement between MDT and the true volume than RG and the true volume both in fatty and lean phantoms. The reproducibility of Cube FSE-flex determined MDT was adequate. The volumetric ratios of OF/globe (p<0.01), TEM/globe (p<0.01), whole orbit/globe (p<0.01) and bone orbit/globe (p<0.01) were significantly greater in TAO patients than those in healthy controls. MRI Cube FSE-flex determined MDT is a relatively accurate semi-automatic segmentation that can be used to evaluate OF and TEM volumes in clinic. Copyright © 2018 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
DMET-analyzer: automatic analysis of Affymetrix DMET data.

PubMed

Guzzi, Pietro Hiram; Agapito, Giuseppe; Di Martino, Maria Teresa; Arbitrio, Mariamena; Tassone, Pierfrancesco; Tagliaferri, Pierosandro; Cannataro, Mario

2012-10-05

Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) on genes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding the analysis of clinical datasets produced in the University Hospital of Catanzaro, Italy. DMET Analyzer is a novel tool able to automatically analyse data produced by the DMET-platform in case-control association studies. Using such tool user may avoid wasting time in the manual execution of multiple statistical tests avoiding possible errors and reducing the amount of time needed for a whole experiment. Moreover annotations and the direct link to external databases may increase the biological knowledge extracted. The system is freely available for academic purposes at: https://sourceforge.net/projects/dmetanalyzer/files/
3D reconstruction of highly fragmented bone fractures

NASA Astrophysics Data System (ADS)

Willis, Andrew; Anderson, Donald; Thomas, Thad; Brown, Thomas; Marsh, J. Lawrence

2007-03-01

A system for the semi-automatic reconstruction of highly fragmented bone fractures, developed to aid in treatment planning, is presented. The system aligns bone fragment surfaces derived from segmentation of volumetric CT scan data. Each fragment surface is partitioned into intact- and fracture-surfaces, corresponding more or less to cortical and cancellous bone, respectively. A user then interactively selects fracture-surface patches in pairs that coarsely correspond. A final optimization step is performed automatically to solve the N-body rigid alignment problem. The work represents the first example of a 3D bone fracture reconstruction system and addresses two new problems unique to the reconstruction of fractured bones: (1) non-stationary noise inherent in surfaces generated from a difficult segmentation problem and (2) the possibility that a single fracture surface on a fragment may correspond to many other fragments.
Automatic Tool for Local Assembly Structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whole community shotgun sequencing of total DNA (i.e. metagenomics) and total RNA (i.e. metatranscriptomics) has provided a wealth of information in the microbial community structure, predicted functions, metabolic networks, and is even able to reconstruct complete genomes directly. Here we present ATLAS (Automatic Tool for Local Assembly Structures) a comprehensive pipeline for assembly, annotation, genomic binning of metagenomic and metatranscriptomic data with an integrated framework for Multi-Omics. This will provide an open source tool for the Multi-Omic community at large.
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

PubMed Central

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.

PubMed

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Workflow oriented software support for image guided radiofrequency ablation of focal liver malignancies

NASA Astrophysics Data System (ADS)

Weihusen, Andreas; Ritter, Felix; Kröger, Tim; Preusser, Tobias; Zidowitz, Stephan; Peitgen, Heinz-Otto

2007-03-01

Image guided radiofrequency (RF) ablation has taken a significant part in the clinical routine as a minimally invasive method for the treatment of focal liver malignancies. Medical imaging is used in all parts of the clinical workflow of an RF ablation, incorporating treatment planning, interventional targeting and result assessment. This paper describes a software application, which has been designed to support the RF ablation workflow under consideration of the requirements of clinical routine, such as easy user interaction and a high degree of robust and fast automatic procedures, in order to keep the physician from spending too much time at the computer. The application therefore provides a collection of specialized image processing and visualization methods for treatment planning and result assessment. The algorithms are adapted to CT as well as to MR imaging. The planning support contains semi-automatic methods for the segmentation of liver tumors and the surrounding vascular system as well as an interactive virtual positioning of RF applicators and a concluding numerical estimation of the achievable heat distribution. The assessment of the ablation result is supported by the segmentation of the coagulative necrosis and an interactive registration of pre- and post-interventional image data for the comparison of tumor and necrosis segmentation masks. An automatic quantification of surface distances is performed to verify the embedding of the tumor area into the thermal lesion area. The visualization methods support representations in the commonly used orthogonal 2D view as well as in 3D scenes.
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.

PubMed

Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

2007-10-11

By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
VESPA: Software to Facilitate Genomic Annotation of Prokaryotic Organisms Through Integration of Proteomic and Transcriptomic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.

2012-04-25

Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.

PubMed

Makarova, Kira S; Sorokin, Alexander V; Novichkov, Pavel S; Wolf, Yuri I; Koonin, Eugene V

2007-11-27

An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.
A Benchmark of Vehicle Maintenance Training Between the U.S. Air Force and a Civilian Industry Leader

DTIC Science & Technology

1992-09-01

Aas :nosen to 113entlfy tasKs oerformed Dv reczcnizeo :omoe:ent automotive serv’ce Personnel :intry level o"ersonnei 4ere iot , ,ic udec i n tie sirve...Diagnose the cause of poor, intermittent, or no electric door and hatch/trunk lock operation. 10. Repair or replace switches, relays, actuators ...Semi-4utomative Temoerature Controls i. Cnecx ooeration of automatic ana semi-automatic neating, HP ventalation ana air-conaitioning ( HVAC ) control
Implementation of a microcontroller-based semi-automatic coagulator.

PubMed

Chan, K; Kirumira, A; Elkateeb, A

2001-01-01

The coagulator is an instrument used in hospitals to detect clot formation as a function of time. Generally, these coagulators are very expensive and therefore not affordable by a doctors' office and small clinics. The objective of this project is to design and implement a low cost semi-automatic coagulator (SAC) prototype. The SAC is capable of assaying up to 12 samples and can perform the following tests: prothrombin time (PT), activated partial thromboplastin time (APTT), and PT/APTT combination. The prototype has been tested successfully.
Semi-automatic version of the potentiometric titration method for characterization of uranium compounds.

PubMed

Cristiano, Bárbara F G; Delgado, José Ubiratan; da Silva, José Wanderley S; de Barros, Pedro D; de Araújo, Radier M S; Dias, Fábio C; Lopes, Ricardo T

2012-09-01

The potentiometric titration method was used for characterization of uranium compounds to be applied in intercomparison programs. The method is applied with traceability assured using a potassium dichromate primary standard. A semi-automatic version was developed to reduce the analysis time and the operator variation. The standard uncertainty in determining the total concentration of uranium was around 0.01%, which is suitable for uranium characterization and compatible with those obtained by manual techniques. Copyright © 2012 Elsevier Ltd. All rights reserved.
Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter.

PubMed

Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Scotch, Matthew; Smith, Karen; Malone, Dan; Gonzalez, Graciela

2016-03-01

Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Our primary aims were to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts. We collected Twitter user posts (tweets) associated with three commonly abused medications (Adderall(®), oxycodone, and quetiapine). We manually annotated 6400 tweets mentioning these three medications and a control medication (metformin) that is not the subject of abuse due to its mechanism of action. We performed quantitative and qualitative analyses of the annotated data to determine whether posts on Twitter contain signals of prescription medication abuse. Finally, we designed an automatic supervised classification technique to distinguish posts containing signals of medication abuse from those that do not and assessed the utility of Twitter in investigating patterns of abuse over time. Our analyses show that clear signals of medication abuse can be drawn from Twitter posts and the percentage of tweets containing abuse signals are significantly higher for the three case medications (Adderall(®): 23 %, quetiapine: 5.0 %, oxycodone: 12 %) than the proportion for the control medication (metformin: 0.3 %). Our automatic classification approach achieves 82 % accuracy overall (medication abuse class recall: 0.51, precision: 0.41, F measure: 0.46). To illustrate the utility of automatic classification, we show how the classification data can be used to analyze abuse patterns over time. Our study indicates that social media can be a crucial resource for obtaining abuse-related information for medications, and that automatic approaches involving supervised classification and natural language processing hold promises for essential future monitoring and intervention tasks.
Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

PubMed Central

Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

2014-01-01

Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581
Reconstituting protein interaction networks using parameter-dependent domain-domain interactions

PubMed Central

2013-01-01

Background We can describe protein-protein interactions (PPIs) as sets of distinct domain-domain interactions (DDIs) that mediate the physical interactions between proteins. Experimental data confirm that DDIs are more consistent than their corresponding PPIs, lending support to the notion that analyses of DDIs may improve our understanding of PPIs and lead to further insights into cellular function, disease, and evolution. However, currently available experimental DDI data cover only a small fraction of all existing PPIs and, in the absence of structural data, determining which particular DDI mediates any given PPI is a challenge. Results We present two contributions to the field of domain interaction analysis. First, we introduce a novel computational strategy to merge domain annotation data from multiple databases. We show that when we merged yeast domain annotations from six annotation databases we increased the average number of domains per protein from 1.05 to 2.44, bringing it closer to the estimated average value of 3. Second, we introduce a novel computational method, parameter-dependent DDI selection (PADDS), which, given a set of PPIs, extracts a small set of domain pairs that can reconstruct the original set of protein interactions, while attempting to minimize false positives. Based on a set of PPIs from multiple organisms, our method extracted 27% more experimentally detected DDIs than existing computational approaches. Conclusions We have provided a method to merge domain annotation data from multiple sources, ensuring large and consistent domain annotation for any given organism. Moreover, we provided a method to extract a small set of DDIs from the underlying set of PPIs and we showed that, in contrast to existing approaches, our method was not biased towards DDIs with low or high occurrence counts. Finally, we used these two methods to highlight the influence of the underlying annotation density on the characteristics of extracted DDIs. Although increased annotations greatly expanded the possible DDIs, the lack of knowledge of the true biological false positive interactions still prevents an unambiguous assignment of domain interactions responsible for all protein network interactions. Executable files and examples are given at: http://www.bhsai.org/downloads/padds/ PMID:23651452
Rice SNP-seek database update: new SNPs, indels, and queries.

PubMed

Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

2017-01-04

We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Automatic segmentation of MR brain images of preterm infants using supervised classification.

PubMed

Moeskops, Pim; Benders, Manon J N L; Chiţ, Sabina M; Kersbergen, Karina J; Groenendaal, Floris; de Vries, Linda S; Viergever, Max A; Išgum, Ivana

2015-09-01

Preterm birth is often associated with impaired brain development. The state and expected progression of preterm brain development can be evaluated using quantitative assessment of MR images. Such measurements require accurate segmentation of different tissue types in those images. This paper presents an algorithm for the automatic segmentation of unmyelinated white matter (WM), cortical grey matter (GM), and cerebrospinal fluid in the extracerebral space (CSF). The algorithm uses supervised voxel classification in three subsequent stages. In the first stage, voxels that can easily be assigned to one of the three tissue types are labelled. In the second stage, dedicated analysis of the remaining voxels is performed. The first and the second stages both use two-class classification for each tissue type separately. Possible inconsistencies that could result from these tissue-specific segmentation stages are resolved in the third stage, which performs multi-class classification. A set of T1- and T2-weighted images was analysed, but the optimised system performs automatic segmentation using a T2-weighted image only. We have investigated the performance of the algorithm when using training data randomly selected from completely annotated images as well as when using training data from only partially annotated images. The method was evaluated on images of preterm infants acquired at 30 and 40weeks postmenstrual age (PMA). When the method was trained using random selection from the completely annotated images, the average Dice coefficients were 0.95 for WM, 0.81 for GM, and 0.89 for CSF on an independent set of images acquired at 30weeks PMA. When the method was trained using only the partially annotated images, the average Dice coefficients were 0.95 for WM, 0.78 for GM and 0.87 for CSF for the images acquired at 30weeks PMA, and 0.92 for WM, 0.80 for GM and 0.85 for CSF for the images acquired at 40weeks PMA. Even though the segmentations obtained using training data from the partially annotated images resulted in slightly lower Dice coefficients, the performance in all experiments was close to that of a second human expert (0.93 for WM, 0.79 for GM and 0.86 for CSF for the images acquired at 30weeks, and 0.94 for WM, 0.76 for GM and 0.87 for CSF for the images acquired at 40weeks). These results show that the presented method is robust to age and acquisition protocol and that it performs accurate segmentation of WM, GM, and CSF when the training data is extracted from complete annotations as well as when the training data is extracted from partial annotations only. This extends the applicability of the method by reducing the time and effort necessary to create training data in a population with different characteristics. Copyright © 2015 Elsevier Inc. All rights reserved.
Integrating GPCR-specific information with full text articles

PubMed Central

2011-01-01

Background With the continued growth in the volume both of experimental G protein-coupled receptor (GPCR) data and of the related peer-reviewed literature, the ability of GPCR researchers to keep up-to-date is becoming increasingly curtailed. Results We present work that integrates the biological data and annotations in the GPCR information system (GPCRDB) with next-generation methods for intelligently exploring, visualising and interacting with the scientific articles used to disseminate them. This solution automatically retrieves relevant information from GPCRDB and displays it both within and as an adjunct to an article. Conclusions This approach allows researchers to extract more knowledge more swiftly from literature. Importantly, it allows reinterpretation of data in articles published before GPCR structure data became widely available, thereby rescuing these valuable data from long-dormant sources. PMID:21910883

Protein function prediction using neighbor relativity in protein-protein interaction network.

PubMed

Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

2013-04-01

There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.
Comparison of the effects of model-based iterative reconstruction and filtered back projection algorithms on software measurements in pulmonary subsolid nodules.

PubMed

Cohen, Julien G; Kim, Hyungjin; Park, Su Bin; van Ginneken, Bram; Ferretti, Gilbert R; Lee, Chang Hyun; Goo, Jin Mo; Park, Chang Min

2017-08-01

To evaluate the differences between filtered back projection (FBP) and model-based iterative reconstruction (MBIR) algorithms on semi-automatic measurements in subsolid nodules (SSNs). Unenhanced CT scans of 73 SSNs obtained using the same protocol and reconstructed with both FBP and MBIR algorithms were evaluated by two radiologists. Diameter, mean attenuation, mass and volume of whole nodules and their solid components were measured. Intra- and interobserver variability and differences between FBP and MBIR were then evaluated using Bland-Altman method and Wilcoxon tests. Longest diameter, volume and mass of nodules and those of their solid components were significantly higher using MBIR (p < 0.05) with mean differences of 1.1% (limits of agreement, -6.4 to 8.5%), 3.2% (-20.9 to 27.3%) and 2.9% (-16.9 to 22.7%) and 3.2% (-20.5 to 27%), 6.3% (-51.9 to 64.6%), 6.6% (-50.1 to 63.3%), respectively. The limits of agreement between FBP and MBIR were within the range of intra- and interobserver variability for both algorithms with respect to the diameter, volume and mass of nodules and their solid components. There were no significant differences in intra- or interobserver variability between FBP and MBIR (p > 0.05). Semi-automatic measurements of SSNs significantly differed between FBP and MBIR; however, the differences were within the range of measurement variability. • Intra- and interobserver reproducibility of measurements did not differ between FBP and MBIR. • Differences in SSNs' semi-automatic measurement induced by reconstruction algorithms were not clinically significant. • Semi-automatic measurement may be conducted regardless of reconstruction algorithm. • SSNs' semi-automated classification agreement (pure vs. part-solid) did not significantly differ between algorithms.
A semi-automatic computer-aided method for surgical template design

NASA Astrophysics Data System (ADS)

Chen, Xiaojun; Xu, Lu; Yang, Yue; Egger, Jan

2016-02-01

This paper presents a generalized integrated framework of semi-automatic surgical template design. Several algorithms were implemented including the mesh segmentation, offset surface generation, collision detection, ruled surface generation, etc., and a special software named TemDesigner was developed. With a simple user interface, a customized template can be semi- automatically designed according to the preoperative plan. Firstly, mesh segmentation with signed scalar of vertex is utilized to partition the inner surface from the input surface mesh based on the indicated point loop. Then, the offset surface of the inner surface is obtained through contouring the distance field of the inner surface, and segmented to generate the outer surface. Ruled surface is employed to connect inner and outer surfaces. Finally, drilling tubes are generated according to the preoperative plan through collision detection and merging. It has been applied to the template design for various kinds of surgeries, including oral implantology, cervical pedicle screw insertion, iliosacral screw insertion and osteotomy, demonstrating the efficiency, functionality and generality of our method.
A semi-automatic computer-aided method for surgical template design

PubMed Central

Chen, Xiaojun; Xu, Lu; Yang, Yue; Egger, Jan

2016-01-01

This paper presents a generalized integrated framework of semi-automatic surgical template design. Several algorithms were implemented including the mesh segmentation, offset surface generation, collision detection, ruled surface generation, etc., and a special software named TemDesigner was developed. With a simple user interface, a customized template can be semi- automatically designed according to the preoperative plan. Firstly, mesh segmentation with signed scalar of vertex is utilized to partition the inner surface from the input surface mesh based on the indicated point loop. Then, the offset surface of the inner surface is obtained through contouring the distance field of the inner surface, and segmented to generate the outer surface. Ruled surface is employed to connect inner and outer surfaces. Finally, drilling tubes are generated according to the preoperative plan through collision detection and merging. It has been applied to the template design for various kinds of surgeries, including oral implantology, cervical pedicle screw insertion, iliosacral screw insertion and osteotomy, demonstrating the efficiency, functionality and generality of our method. PMID:26843434
A semi-automatic computer-aided method for surgical template design.

PubMed

Chen, Xiaojun; Xu, Lu; Yang, Yue; Egger, Jan

2016-02-04

This paper presents a generalized integrated framework of semi-automatic surgical template design. Several algorithms were implemented including the mesh segmentation, offset surface generation, collision detection, ruled surface generation, etc., and a special software named TemDesigner was developed. With a simple user interface, a customized template can be semi- automatically designed according to the preoperative plan. Firstly, mesh segmentation with signed scalar of vertex is utilized to partition the inner surface from the input surface mesh based on the indicated point loop. Then, the offset surface of the inner surface is obtained through contouring the distance field of the inner surface, and segmented to generate the outer surface. Ruled surface is employed to connect inner and outer surfaces. Finally, drilling tubes are generated according to the preoperative plan through collision detection and merging. It has been applied to the template design for various kinds of surgeries, including oral implantology, cervical pedicle screw insertion, iliosacral screw insertion and osteotomy, demonstrating the efficiency, functionality and generality of our method.
Towards the Real-Time Evaluation of Collaborative Activities: Integration of an Automatic Rater of Collaboration Quality in the Classroom from the Teacher's Perspective

ERIC Educational Resources Information Center

Chounta, Irene-Angelica; Avouris, Nikolaos

2016-01-01

This paper presents the integration of a real time evaluation method of collaboration quality in a monitoring application that supports teachers in class orchestration. The method is implemented as an automatic rater of collaboration quality and studied in a real time scenario of use. We argue that automatic and semi-automatic methods which…
MyWEST: my Web Extraction Software Tool for effective mining of annotations from web-based databanks.

PubMed

Masseroli, Marco; Stella, Andrea; Meani, Natalia; Alcalay, Myriam; Pinciroli, Francesco

2004-12-12

High-throughput technologies create the necessity to mine large amounts of gene annotations from diverse databanks, and to integrate the resulting data. Most databanks can be interrogated only via Web, for a single gene at a time, and query results are generally available only in the HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise and resources for locally reimplementing the databank. We developed MyWEST, a tool aimed at researchers without extensive informatics skills or resources, which exploits user-defined templates to easily mine selected annotations from different Web-interfaced databanks, and aggregates and structures results in an automatically updated database. Using microarray results from a model system of retinoic acid-induced differentiation, MyWEST effectively gathered relevant annotations from various biomolecular databanks, highlighted significant biological characteristics and supported a global approach to the understanding of complex cellular mechanisms. MyWEST is freely available for non-profit use at http://www.medinfopoli.polimi.it/MyWEST/
Datasets of Odontocete Sounds Annotated for Developing Automatic Detection Methods

DTIC Science & Technology

2007-09-01

for each of the following species: 1. Mesoplodon densirostris (Blainville’s beaked whale) 2. Steno bredanensis (rough-toothed dolphin) 3...39 files, 70200s total Risso’s Dolphins (Grampus griseus) AUTEC (NUWC): 10 files; 17100s total Rough-toothed Dolphins ( Steno bredanensis
A sentence sliding window approach to extract protein annotations from biomedical articles

PubMed Central

Krallinger, Martin; Padron, Maria; Valencia, Alfonso

2005-01-01

Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831
Literature classification for semi-automated updating of biological knowledgebases

PubMed Central

2013-01-01

Background As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature. Results We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature. Conclusion We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases. PMID:24564403
A System for the Semantic Multimodal Analysis of News Audio-Visual Content

NASA Astrophysics Data System (ADS)

Mezaris, Vasileios; Gidaros, Spyros; Papadopoulos, GeorgiosTh; Kasper, Walter; Steffen, Jörg; Ordelman, Roeland; Huijbregts, Marijn; de Jong, Franciska; Kompatsiaris, Ioannis; Strintzis, MichaelG

2010-12-01

News-related content is nowadays among the most popular types of content for users in everyday applications. Although the generation and distribution of news content has become commonplace, due to the availability of inexpensive media capturing devices and the development of media sharing services targeting both professional and user-generated news content, the automatic analysis and annotation that is required for supporting intelligent search and delivery of this content remains an open issue. In this paper, a complete architecture for knowledge-assisted multimodal analysis of news-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately and proposes a novel fusion technique based on the particular characteristics of news-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic annotations.
Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa

PubMed Central

Ridder, Lars; van der Hooft, Justin J. J.; Verhoeven, Stefan

2014-01-01

The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed. PMID:26819876
Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa.

PubMed

Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan

2014-01-01

The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.
ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard A.; Brown, Joseph M.; Colby, Sean M.

ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
Semi-automatic computerized approach to radiological quantification in rheumatoid arthritis

NASA Astrophysics Data System (ADS)

Steiner, Wolfgang; Schoeffmann, Sylvia; Prommegger, Andrea; Boegl, Karl; Klinger, Thomas; Peloschek, Philipp; Kainberger, Franz

2004-04-01

Rheumatoid Arthritis (RA) is a common systemic disease predominantly involving the joints. Precise diagnosis and follow-up therapy requires objective quantification. For this purpose, radiological analyses using standardized scoring systems are considered to be the most appropriate method. The aim of our study is to develop a semi-automatic image analysis software, especially applicable for scoring of joints in rheumatic disorders. The X-Ray RheumaCoach software delivers various scoring systems (Larsen-Score and Ratingen-Rau-Score) which can be applied by the scorer. In addition to the qualitative assessment of joints performed by the radiologist, a semi-automatic image analysis for joint detection and measurements of bone diameters and swollen tissue supports the image assessment process. More than 3000 radiographs from hands and feet of more than 200 RA patients were collected, analyzed, and statistically evaluated. Radiographs were quantified using conventional paper-based Larsen score and the X-Ray RheumaCoach software. The use of the software shortened the scoring time by about 25 percent and reduced the rate of erroneous scorings in all our studies. Compared to paper-based scoring methods, the X-Ray RheumaCoach software offers several advantages: (i) Structured data analysis and input that minimizes variance by standardization, (ii) faster and more precise calculation of sum scores and indices, (iii) permanent data storing and fast access to the software"s database, (iv) the possibility of cross-calculation to other scores, (v) semi-automatic assessment of images, and (vii) reliable documentation of results in the form of graphical printouts.
A new method of automatic landmark tagging for shape model construction via local curvature scale

NASA Astrophysics Data System (ADS)

Rueda, Sylvia; Udupa, Jayaram K.; Bai, Li

2008-03-01

Segmentation of organs in medical images is a difficult task requiring very often the use of model-based approaches. To build the model, we need an annotated training set of shape examples with correspondences indicated among shapes. Manual positioning of landmarks is a tedious, time-consuming, and error prone task, and almost impossible in the 3D space. To overcome some of these drawbacks, we devised an automatic method based on the notion of c-scale, a new local scale concept. For each boundary element b, the arc length of the largest homogeneous curvature region connected to b is estimated as well as the orientation of the tangent at b. With this shape description method, we can automatically locate mathematical landmarks selected at different levels of detail. The method avoids the use of landmarks for the generation of the mean shape. The selection of landmarks on the mean shape is done automatically using the c-scale method. Then, these landmarks are propagated to each shape in the training set, defining this way the correspondences among the shapes. Altogether 12 strategies are described along these lines. The methods are evaluated on 40 MRI foot data sets, the object of interest being the talus bone. The results show that, for the same number of landmarks, the proposed methods are more compact than manual and equally spaced annotations. The approach is applicable to spaces of any dimensionality, although we have focused in this paper on 2D shapes.
Automatic and semi-automatic approaches for arteriolar-to-venular computation in retinal photographs

NASA Astrophysics Data System (ADS)

Mendonça, Ana Maria; Remeseiro, Beatriz; Dashtbozorg, Behdad; Campilho, Aurélio

2017-03-01

The Arteriolar-to-Venular Ratio (AVR) is a popular dimensionless measure which allows the assessment of patients' condition for the early diagnosis of different diseases, including hypertension and diabetic retinopathy. This paper presents two new approaches for AVR computation in retinal photographs which include a sequence of automated processing steps: vessel segmentation, caliber measurement, optic disc segmentation, artery/vein classification, region of interest delineation, and AVR calculation. Both approaches have been tested on the INSPIRE-AVR dataset, and compared with a ground-truth provided by two medical specialists. The obtained results demonstrate the reliability of the fully automatic approach which provides AVR ratios very similar to at least one of the observers. Furthermore, the semi-automatic approach, which includes the manual modification of the artery/vein classification if needed, allows to significantly reduce the error to a level below the human error.
Interobserver variability in identification of breast tumors in MRI and its implications for prognostic biomarkers and radiogenomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saha, Ashirbani, E-mail: as698@duke.edu; Grimm, La

Purpose: To assess the interobserver variability of readers when outlining breast tumors in MRI, study the reasons behind the variability, and quantify the effect of the variability on algorithmic imaging features extracted from breast MRI. Methods: Four readers annotated breast tumors from the MRI examinations of 50 patients from one institution using a bounding box to indicate a tumor. All of the annotated tumors were biopsy proven cancers. The similarity of bounding boxes was analyzed using Dice coefficients. An automatic tumor segmentation algorithm was used to segment tumors from the readers’ annotations. The segmented tumors were then compared between readersmore » using Dice coefficients as the similarity metric. Cases showing high interobserver variability (average Dice coefficient <0.8) after segmentation were analyzed by a panel of radiologists to identify the reasons causing the low level of agreement. Furthermore, an imaging feature, quantifying tumor and breast tissue enhancement dynamics, was extracted from each segmented tumor for a patient. Pearson’s correlation coefficients were computed between the features for each pair of readers to assess the effect of the annotation on the feature values. Finally, the authors quantified the extent of variation in feature values caused by each of the individual reasons for low agreement. Results: The average agreement between readers in terms of the overlap (Dice coefficient) of the bounding box was 0.60. Automatic segmentation of tumor improved the average Dice coefficient for 92% of the cases to the average value of 0.77. The mean agreement between readers expressed by the correlation coefficient for the imaging feature was 0.96. Conclusions: There is a moderate variability between readers when identifying the rectangular outline of breast tumors on MRI. This variability is alleviated by the automatic segmentation of the tumors. Furthermore, the moderate interobserver variability in terms of the bounding box does not translate into a considerable variability in terms of assessment of enhancement dynamics. The authors propose some additional ways to further reduce the interobserver variability.« less
Wide coverage biomedical event extraction using multiple partially overlapping corpora

PubMed Central

2013-01-01

Background Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. Results We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. Conclusions The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. PMID:23731785
Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case.

PubMed

Amar, David; Frades, Itziar; Danek, Agnieszka; Goldberg, Tatyana; Sharma, Sanjeev K; Hedley, Pete E; Proux-Wera, Estelle; Andreasson, Erik; Shamir, Ron; Tzfadia, Oren; Alexandersson, Erik

2014-12-05

For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, 'omics', and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. We show that the automatic gene annotations of potato have low accuracy when compared to a "gold standard" based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of '-omics' datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper.

GelScape: a web-based server for interactively annotating, manipulating, comparing and archiving 1D and 2D gel images.

PubMed

Young, Nelson; Chang, Zhan; Wishart, David S

2004-04-12

GelScape is a web-based tool that permits facile, interactive annotation, comparison, manipulation and storage of protein gel images. It uses Java applet-servlet technology to allow rapid, remote image handling and image processing in a platform-independent manner. It supports many of the features found in commercial, stand-alone gel analysis software including spot annotation, spot integration, gel warping, image resizing, HTML image mapping, image overlaying as well as the storage of gel image and gel annotation data in compliance with Federated Gel Database requirements.
BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.

PubMed

Schomburg, Ida; Chang, Antje; Placzek, Sandra; Söhngen, Carola; Rother, Michael; Lang, Maren; Munaretto, Cornelia; Ulas, Susanne; Stelzer, Michael; Grote, Andreas; Scheer, Maurice; Schomburg, Dietmar

2013-01-01

The BRENDA (BRaunschweig ENzyme DAtabase) enzyme portal (http://www.brenda-enzymes.org) is the main information system of functional biochemical and molecular enzyme data and provides access to seven interconnected databases. BRENDA contains 2.7 million manually annotated data on enzyme occurrence, function, kinetics and molecular properties. Each entry is connected to a reference and the source organism. Enzyme ligands are stored with their structures and can be accessed via their names, synonyms or via a structure search. FRENDA (Full Reference ENzyme DAta) and AMENDA (Automatic Mining of ENzyme DAta) are based on text mining methods and represent a complete survey of PubMed abstracts with information on enzymes in different organisms, tissues or organelles. The supplemental database DRENDA provides more than 910 000 new EC number-disease relations in more than 510 000 references from automatic search and a classification of enzyme-disease-related information. KENDA (Kinetic ENzyme DAta), a new amendment extracts and displays kinetic values from PubMed abstracts. The integration of the EnzymeDetector offers an automatic comparison, evaluation and prediction of enzyme function annotations for prokaryotic genomes. The biochemical reaction database BKM-react contains non-redundant enzyme-catalysed and spontaneous reactions and was developed to facilitate and accelerate the construction of biochemical models.
A Collaborative Multimedia Annotation Tool for Enhancing Knowledge Sharing in CSCL

ERIC Educational Resources Information Center

Yang, Stephen J. H.; Zhang, Jia; Su, Addison Y. S.; Tsai, Jeffrey J. P.

2011-01-01

Knowledge sharing in computer supported collaborative learning (CSCL) requires intensive social interactions among participants, typically in the form of annotations. An annotation refers to an explicit expression of knowledge that is attached to a document to reveal the conceptual meanings of an annotator's implicit thoughts. In this research, we…
VideoANT: Extending Online Video Annotation beyond Content Delivery

ERIC Educational Resources Information Center

Hosack, Bradford

2010-01-01

This paper expands the boundaries of video annotation in education by outlining the need for extended interaction in online video use, identifying the challenges faced by existing video annotation tools, and introducing Video-ANT, a tool designed to create text-based annotations integrated within the time line of a video hosted online. Several…
Design for Manufacturing and Assembly in Apparel. Part 1. Handbook

DTIC Science & Technology

1994-02-01

reduced and the inverted pleat was eliminated to take advantage of the automatic seam stitcher . The shape and size of the side back section seam...coin pocket. The size and shape of the pocket would be designed to best utilize the equipment. An automatic dart stitcher may be utilized to stitch the...with stacker Semi-automatic serging units with stacker Automatic seaming units/profile stitchers Programmable seaming units for various operations
Apollo: a sequence annotation editor

PubMed Central

Lewis, SE; Searle, SMJ; Harris, N; Gibson, M; Iyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, S; Mungall, CJ; Clamp, ME

2002-01-01

The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. PMID:12537571
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov

PubMed Central

Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V

2016-01-01

Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

PubMed

Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

2016-07-01

Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Next Generation Models for Storage and Representation of Microbial Biological Annotation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quest, Daniel J; Land, Miriam L; Brettin, Thomas S

2010-01-01

Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software systemmore » to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.« less
RysannMD: A biomedical semantic annotator balancing speed and accuracy.

PubMed

Cuzzola, John; Jovanović, Jelena; Bagheri, Ebrahim

2017-07-01

Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F 1 measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F 1 measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously. Copyright © 2017 Elsevier Inc. All rights reserved.
Interactive 3D segmentation using connected orthogonal contours.

PubMed

de Bruin, P W; Dercksen, V J; Post, F H; Vossepoel, A M; Streekstra, G J; Vos, F M

2005-05-01

This paper describes a new method for interactive segmentation that is based on cross-sectional design and 3D modelling. The method represents a 3D model by a set of connected contours that are planar and orthogonal. Planar contours overlayed on image data are easily manipulated and linked contours reduce the amount of user interaction.1 This method solves the contour-to-contour correspondence problem and can capture extrema of objects in a more flexible way than manual segmentation of a stack of 2D images. The resulting 3D model is guaranteed to be free of geometric and topological errors. We show that manual segmentation using connected orthogonal contours has great advantages over conventional manual segmentation. Furthermore, the method provides effective feedback and control for creating an initial model for, and control and steering of, (semi-)automatic segmentation methods.
A procedural method for the efficient implementation of full-custom VLSI designs

NASA Technical Reports Server (NTRS)

Belk, P.; Hickey, N.

1987-01-01

An imbedded language system for the layout of very large scale integration (VLSI) circuits is examined. It is shown that through the judicious use of this system, a large variety of circuits can be designed with circuit density and performance comparable to traditional full-custom design methods, but with design costs more comparable to semi-custom design methods. The high performance of this methodology is attributable to the flexibility of procedural descriptions of VLSI layouts and to a number of automatic and semi-automatic tools within the system.
Application of semi-active RFID power meter in automatic verification pipeline and intelligent storage system

NASA Astrophysics Data System (ADS)

Chen, Xiangqun; Huang, Rui; Shen, Liman; chen, Hao; Xiong, Dezhi; Xiao, Xiangqi; Liu, Mouhai; Xu, Renheng

2018-03-01

In this paper, the semi-active RFID watt-hour meter is applied to automatic test lines and intelligent warehouse management, from the transmission system, test system and auxiliary system, monitoring system, realize the scheduling of watt-hour meter, binding, control and data exchange, and other functions, make its more accurate positioning, high efficiency of management, update the data quickly, all the information at a glance. Effectively improve the quality, efficiency and automation of verification, and realize more efficient data management and warehouse management.
A knowledge-based support system for mechanical ventilation of the lungs. The KUSIVAR concept and prototype.

PubMed

Rudowski, R; Frostell, C; Gill, H

1989-09-01

The KUSIVAR is an expert system for mechanical ventilation of adult patients suffering from respiratory insufficiency. Its main objective is to provide guidance in respirator management. The knowledge base includes both qualitative, rule-based knowledge and quantitative knowledge expressed in the form of mathematical models (expert control) which is used for prediction of arterial gas tensions and optimization purposes. The system is data driven and uses a forward chaining mechanism for rule invocation. The interaction with the user will be performed in advisory, critiquing, semi-automatic and automatic modes. The system is at present in an advanced prototype stage. Prototyping is performed using KEE (Knowledge Engineering Environment) on a Sperry Explorer workstation. For further development and clinical use the expert system will be downloaded to an advanced PC. The system is intended to support therapy with a Siemens-Elema Servoventilator 900 C.
Consistent model driven architecture

NASA Astrophysics Data System (ADS)

Niepostyn, Stanisław J.

2015-09-01

The goal of the MDA is to produce software systems from abstract models in a way where human interaction is restricted to a minimum. These abstract models are based on the UML language. However, the semantics of UML models is defined in a natural language. Subsequently the verification of consistency of these diagrams is needed in order to identify errors in requirements at the early stage of the development process. The verification of consistency is difficult due to a semi-formal nature of UML diagrams. We propose automatic verification of consistency of the series of UML diagrams originating from abstract models implemented with our consistency rules. This Consistent Model Driven Architecture approach enables us to generate automatically complete workflow applications from consistent and complete models developed from abstract models (e.g. Business Context Diagram). Therefore, our method can be used to check practicability (feasibility) of software architecture models.
DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

PubMed Central

Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

2015-01-01

Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

PubMed

Stubbs, Amber; Uzuner, Özlem

2015-12-01

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations. Copyright © 2015 Elsevier Inc. All rights reserved.
An ontology-based annotation of cardiac implantable electronic devices to detect therapy changes in a national registry.

PubMed

Rosier, Arnaud; Mabo, Philippe; Chauvin, Michel; Burgun, Anita

2015-05-01

The patient population benefitting from cardiac implantable electronic devices (CIEDs) is increasing. This study introduces a device annotation method that supports the consistent description of the functional attributes of cardiac devices and evaluates how this method can detect device changes from a CIED registry. We designed the Cardiac Device Ontology, an ontology of CIEDs and device functions. We annotated 146 cardiac devices with this ontology and used it to detect therapy changes with respect to atrioventricular pacing, cardiac resynchronization therapy, and defibrillation capability in a French national registry of patients with implants (STIDEFIX). We then analyzed a set of 6905 device replacements from the STIDEFIX registry. Ontology-based identification of therapy changes (upgraded, downgraded, or similar) was accurate (6905 cases) and performed better than straightforward analysis of the registry codes (F-measure 1.00 versus 0.75 to 0.97). This study demonstrates the feasibility and effectiveness of ontology-based functional annotation of devices in the cardiac domain. Such annotation allowed a better description and in-depth analysis of STIDEFIX. This method was useful for the automatic detection of therapy changes and may be reused for analyzing data from other device registries.
Interactive Classification of Construction Materials: Feedback Driven Framework for Annotation and Analysis of 3d Point Clouds

NASA Astrophysics Data System (ADS)

Hess, M. R.; Petrovic, V.; Kuester, F.

2017-08-01

Digital documentation of cultural heritage structures is increasingly more common through the application of different imaging techniques. Many works have focused on the application of laser scanning and photogrammetry techniques for the acquisition of threedimensional (3D) geometry detailing cultural heritage sites and structures. With an abundance of these 3D data assets, there must be a digital environment where these data can be visualized and analyzed. Presented here is a feedback driven visualization framework that seamlessly enables interactive exploration and manipulation of massive point cloud data. The focus of this work is on the classification of different building materials with the goal of building more accurate as-built information models of historical structures. User defined functions have been tested within the interactive point cloud visualization framework to evaluate automated and semi-automated classification of 3D point data. These functions include decisions based on observed color, laser intensity, normal vector or local surface geometry. Multiple case studies are presented here to demonstrate the flexibility and utility of the presented point cloud visualization framework to achieve classification objectives.
Cloud-Based Evaluation of Anatomical Structure Segmentation and Landmark Detection Algorithms: VISCERAL Anatomy Benchmarks.

PubMed

Jimenez-Del-Toro, Oscar; Muller, Henning; Krenn, Markus; Gruenberg, Katharina; Taha, Abdel Aziz; Winterstein, Marianne; Eggel, Ivan; Foncubierta-Rodriguez, Antonio; Goksel, Orcun; Jakab, Andras; Kontokotsios, Georgios; Langs, Georg; Menze, Bjoern H; Salas Fernandez, Tomas; Schaer, Roger; Walleyo, Anna; Weber, Marc-Andre; Dicente Cid, Yashin; Gass, Tobias; Heinrich, Mattias; Jia, Fucang; Kahl, Fredrik; Kechichian, Razmig; Mai, Dominic; Spanier, Assaf B; Vincent, Graham; Wang, Chunliang; Wyeth, Daniel; Hanbury, Allan

2016-11-01

Variations in the shape and appearance of anatomical structures in medical images are often relevant radiological signs of disease. Automatic tools can help automate parts of this manual process. A cloud-based evaluation framework is presented in this paper including results of benchmarking current state-of-the-art medical imaging algorithms for anatomical structure segmentation and landmark detection: the VISCERAL Anatomy benchmarks. The algorithms are implemented in virtual machines in the cloud where participants can only access the training data and can be run privately by the benchmark administrators to objectively compare their performance in an unseen common test set. Overall, 120 computed tomography and magnetic resonance patient volumes were manually annotated to create a standard Gold Corpus containing a total of 1295 structures and 1760 landmarks. Ten participants contributed with automatic algorithms for the organ segmentation task, and three for the landmark localization task. Different algorithms obtained the best scores in the four available imaging modalities and for subsets of anatomical structures. The annotation framework, resulting data set, evaluation setup, results and performance analysis from the three VISCERAL Anatomy benchmarks are presented in this article. Both the VISCERAL data set and Silver Corpus generated with the fusion of the participant algorithms on a larger set of non-manually-annotated medical images are available to the research community.

Automatic blood detection in capsule endoscopy video

NASA Astrophysics Data System (ADS)

Novozámský, Adam; Flusser, Jan; Tachecí, Ilja; Sulík, Lukáš; Bureš, Jan; Krejcar, Ondřej

2016-12-01

We propose two automatic methods for detecting bleeding in wireless capsule endoscopy videos of the small intestine. The first one uses solely the color information, whereas the second one incorporates the assumptions about the blood spot shape and size. The original idea is namely the definition of a new color space that provides good separability of blood pixels and intestinal wall. Both methods can be applied either individually or their results can be fused together for the final decision. We evaluate their individual performance and various fusion rules on real data, manually annotated by an endoscopist.
A method for automatically extracting infectious disease-related primers and probes from the literature

PubMed Central

2010-01-01

Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
Toward automated biochemotype annotation for large compound libraries.

PubMed

Chen, Xian; Liang, Yizeng; Xu, Jun

2006-08-01

Combinatorial chemistry allows scientists to probe large synthetically accessible chemical space. However, identifying the sub-space which is selectively associated with an interested biological target, is crucial to drug discovery and life sciences. This paper describes a process to automatically annotate biochemotypes of compounds in a library and thus to identify bioactivity related chemotypes (biochemotypes) from a large library of compounds. The process consists of two steps: (1) predicting all possible bioactivities for each compound in a library, and (2) deriving possible biochemotypes based on predictions. The Prediction of Activity Spectra for Substances program (PASS) was used in the first step. In second step, structural similarity and scaffold-hopping technologies are employed. These technologies are used to derive biochemotypes from bioactivity predictions and the corresponding annotated biochemotypes from MDL Drug Data Report (MDDR) database. About a one million (982,889) commercially available compound library (CACL) has been tested using this process. This paper demonstrates the feasibility of automatically annotating biochemotypes for large libraries of compounds. Nevertheless, some issues need to be considered in order to improve the process. First, the prediction accuracy of PASS program has no significant correlation with the number of compounds in a training set. Larger training sets do not necessarily increase the maximal error of prediction (MEP), nor do they increase the hit structural diversity. Smaller training sets do not necessarily decrease MEP, nor do they decrease the hit structural diversity. Second, the success of systematic bioactivity prediction relies on modeling, training data, and the definition of bioactivities (biochemotype ontology). Unfortunately, the biochemotype ontology was not well developed in the PASS program. Consequently, "ill-defined" bioactivities can reduce the quality of predictions. This paper suggests the ways in which the systematic bioactivities prediction program should be improved.
Design and implementation of a database for Brucella melitensis genome annotation.

PubMed

De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric

2008-03-18

The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.
Standards-based curation of a decade-old digital repository dataset of molecular information.

PubMed

Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Murray-Rust, Peter; Rzepa, Henry S; Stewart, James J P

2015-01-01

The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.
10 CFR Appendix J to Subpart B of... - Uniform Test Method for Measuring the Energy Consumption of Automatic and Semi-Automatic Clothes...

Code of Federal Regulations, 2010 CFR

2010-01-01

...'s true energy consumption characteristics as to provide materially inaccurate comparative data... clothes washers should be totally representative of the design, construction, and control system that will...
10 CFR Appendix J to Subpart B of... - Uniform Test Method for Measuring the Energy Consumption of Automatic and Semi-Automatic Clothes...

Code of Federal Regulations, 2012 CFR

2012-01-01

...'s true energy consumption characteristics as to provide materially inaccurate comparative data... clothes washers should be totally representative of the design, construction, and control system that will...
10 CFR Appendix J to Subpart B of... - Uniform Test Method for Measuring the Energy Consumption of Automatic and Semi-Automatic Clothes...

Code of Federal Regulations, 2011 CFR

2011-01-01

...'s true energy consumption characteristics as to provide materially inaccurate comparative data... clothes washers should be totally representative of the design, construction, and control system that will...
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction.

PubMed

Gupta, Shashank; Pawar, Sachin; Ramrakhiyani, Nitin; Palshikar, Girish Keshav; Varma, Vasudeva

2018-06-13

Social media is a useful platform to share health-related information due to its vast reach. This makes it a good candidate for public-health monitoring tasks, specifically for pharmacovigilance. We study the problem of extraction of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from Twitter. Medical information extraction from social media is challenging, mainly due to short and highly informal nature of text, as compared to more technical and formal medical reports. Current methods in ADR mention extraction rely on supervised learning methods, which suffer from labeled data scarcity problem. The state-of-the-art method uses deep neural networks, specifically a class of Recurrent Neural Network (RNN) which is Long-Short-Term-Memory network (LSTM). Deep neural networks, due to their large number of free parameters rely heavily on large annotated corpora for learning the end task. But in the real-world, it is hard to get large labeled data, mainly due to the heavy cost associated with the manual annotation. To this end, we propose a novel semi-supervised learning based RNN model, which can leverage unlabeled data also present in abundance on social media. Through experiments we demonstrate the effectiveness of our method, achieving state-of-the-art performance in ADR mention extraction. In this study, we tackle the problem of labeled data scarcity for Adverse Drug Reaction mention extraction from social media and propose a novel semi-supervised learning based method which can leverage large unlabeled corpus available in abundance on the web. Through empirical study, we demonstrate that our proposed method outperforms fully supervised learning based baseline which relies on large manually annotated corpus for a good performance.
A Hybrid Human-Computer Approach to the Extraction of Scientific Facts from the Literature.

PubMed

Tchoua, Roselyne B; Chard, Kyle; Audus, Debra; Qin, Jian; de Pablo, Juan; Foster, Ian

2016-01-01

A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.
A multiresolution prostate representation for automatic segmentation in magnetic resonance images.

PubMed

Alvarez, Charlens; Martínez, Fabio; Romero, Eduardo

2017-04-01

Accurate prostate delineation is necessary in radiotherapy processes for concentrating the dose onto the prostate and reducing side effects in neighboring organs. Currently, manual delineation is performed over magnetic resonance imaging (MRI) taking advantage of its high soft tissue contrast property. Nevertheless, as human intervention is a consuming task with high intra- and interobserver variability rates, (semi)-automatic organ delineation tools have emerged to cope with these challenges, reducing the time spent for these tasks. This work presents a multiresolution representation that defines a novel metric and allows to segment a new prostate by combining a set of most similar prostates in a dataset. The proposed method starts by selecting the set of most similar prostates with respect to a new one using the proposed multiresolution representation. This representation characterizes the prostate through a set of salient points, extracted from a region of interest (ROI) that encloses the organ and refined using structural information, allowing to capture main relevant features of the organ boundary. Afterward, the new prostate is automatically segmented by combining the nonrigidly registered expert delineations associated to the previous selected similar prostates using a weighted patch-based strategy. Finally, the prostate contour is smoothed based on morphological operations. The proposed approach was evaluated with respect to the expert manual segmentation under a leave-one-out scheme using two public datasets, obtaining averaged Dice coefficients of 82% ± 0.07 and 83% ± 0.06, and demonstrating a competitive performance with respect to atlas-based state-of-the-art methods. The proposed multiresolution representation provides a feature space that follows a local salient point criteria and a global rule of the spatial configuration among these points to find out the most similar prostates. This strategy suggests an easy adaptation in the clinical routine, as supporting tool for annotation. © 2017 American Association of Physicists in Medicine.
To do it or to let an automatic tool do it? The priority of control over effort.

PubMed

Osiurak, François; Wagner, Clara; Djerbi, Sara; Navarro, Jordan

2013-01-01

The aim of the present study is to provide experimental data relevant to the issue of what leads humans to use automatic tools. Two answers can be offered. The first is that humans strive to minimize physical and/or cognitive effort (principle of least effort). The second is that humans tend to keep their perceived control over the environment (principle of more control). These two factors certainly play a role, but the question raised here is to what do people give priority in situations wherein both manual and automatic actions take the same time - minimizing effort or keeping perceived control? To answer that question, we built four experiments in which participants were confronted with a recurring choice between performing a task manually (physical effort) or in a semi-automatic way (cognitive effort) versus using an automatic tool that completes the task for them (no effort). In this latter condition, participants were required to follow the progression of the automatic tool step by step. Our results showed that participants favored the manual or semi-automatic condition over the automatic condition. However, when they were offered the opportunity to perform recreational tasks in parallel, the shift toward manual condition disappeared. The findings give support to the idea that people give priority to keeping control over minimizing effort.
A technology prototype system for rating therapist empathy from audio recordings in addiction counseling.

PubMed

Xiao, Bo; Huang, Chewei; Imel, Zac E; Atkins, David C; Georgiou, Panayiotis; Narayanan, Shrikanth S

2016-04-01

Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
A technology prototype system for rating therapist empathy from audio recordings in addiction counseling

PubMed Central

Xiao, Bo; Huang, Chewei; Imel, Zac E.; Atkins, David C.; Georgiou, Panayiotis; Narayanan, Shrikanth S.

2016-01-01

Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training. PMID:28286867
A challenging issue: Detection of white matter hyperintensities in neonatal brain MRI.

PubMed

Morel, Baptiste; Yongchao Xu; Virzi, Alessio; Geraud, Thierry; Adamsbaum, Catherine; Bloch, Isabelle

2016-08-01

The progress of magnetic resonance imaging (MRI) allows for a precise exploration of the brain of premature infants at term equivalent age. The so-called DEHSI (diffuse excessive high signal intensity) of the white matter of premature brains remains a challenging issue in terms of definition, and thus of interpretation. We propose a semi-automatic detection and quantification method of white matter hyperintensities in MRI relying on morphological operators and max-tree representations, which constitutes a powerful tool to help radiologists to improve their interpretation. Results show better reproducibility and robustness than interactive segmentation.
Incorporating the whole-mount prostate histology reconstruction program Histostitcher into the extensible imaging platform (XIP) framework

NASA Astrophysics Data System (ADS)

Toth, Robert; Chappelow, Jonathan; Vetter, Christoph; Kutter, Oliver; Russ, Christoph; Feldman, Michael; Tomaszewski, John; Shih, Natalie; Madabhushi, Anant

2012-03-01

There is a need for identifying quantitative imaging (e.g. MRI) signatures for prostate cancer (CaP), so that computer-aided diagnostic methods can be trained to detect disease extent in vivo. Determining CaP extent on in vivo MRI is difficult to do; however, with the availability of ex vivo surgical whole mount histological sections (WMHS) for CaP patients undergoing radical prostatectomy, co-registration methods can be applied to align and map disease extent onto pre-operative MR imaging from the post-operative histology. Yet obtaining digitized images of WHMS for co-registration with the pre-operative MRI is cumbersome since (a) most digital slide scanners are unable to accommodate the entire section, and (b) significant technical expertise is required for whole mount slide preparation. Consequently, most centers opt to construct quartered sections of each histology slice. Prior to co-registration with MRI, however, these quartered sections need to be digitally stitched together to reconstitute a digital, pseudo WMHS. Histostitcheris an interactive software program that uses semi-automatic registration tools to digitally stitch quartered sections into pseudo WMHS. Histostitcherwas originally developed using the GUI tools provided by the Matlab programming interface, but the clinical use was limited due to the inefficiency of the interface. The limitations of the Matlab based GUI include (a) an inability to edit the fiducials, (b) the rendering being extremely slow, and (c) lack of interactive and rapid visualization tools. In this work, Histostitcherhas been integrated into the eXtensible Imaging Platform (XIP TM ) framework (a set of libraries containing functionalities for analyzing and visualizing medical image data). XIP TM lends the stitching tool much greater flexibility and functionality by (a) allowing interactive and seamless navigation through the full resolution histology images, (b) the ability to easily add, edit, or remove fiducials and annotations in order to register the quadrants and map the disease extent. In this work, we showcase examples of digital stitching of quartered histological sections into pseudo-WHMS using Histostitcher via the new XIP TM interface. This tool will be particularly useful in clinical trials and large cohort studies where a quick, interactive way of digitally reconstructing pseudo WMHS is required.
GarlicESTdb: an online database and mining tool for garlic EST sequences.

PubMed

Kim, Dae-Won; Jung, Tae-Sung; Nam, Seong-Hyeuk; Kwon, Hyuk-Ryul; Kim, Aeri; Chae, Sung-Hwa; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Park, Hong-Seog

2009-05-18

Allium sativum., commonly known as garlic, is a species in the onion genus (Allium), which is a large and diverse one containing over 1,250 species. Its close relatives include chives, onion, leek and shallot. Garlic has been used throughout recorded history for culinary, medicinal use and health benefits. Currently, the interest in garlic is highly increasing due to nutritional and pharmaceutical value including high blood pressure and cholesterol, atherosclerosis and cancer. For all that, there are no comprehensive databases available for Expressed Sequence Tags(EST) of garlic for gene discovery and future efforts of genome annotation. That is why we developed a new garlic database and applications to enable comprehensive analysis of garlic gene expression. GarlicESTdb is an integrated database and mining tool for large-scale garlic (Allium sativum) EST sequencing. A total of 21,595 ESTs collected from an in-house cDNA library were used to construct the database. The analysis pipeline is an automated system written in JAVA and consists of the following components: automatic preprocessing of EST reads, assembly of raw sequences, annotation of the assembled sequences, storage of the analyzed information into MySQL databases, and graphic display of all processed data. A web application was implemented with the latest J2EE (Java 2 Platform Enterprise Edition) software technology (JSP/EJB/JavaServlet) for browsing and querying the database, for creation of dynamic web pages on the client side, and for mapping annotated enzymes to KEGG pathways, the AJAX framework was also used partially. The online resources, such as putative annotation, single nucleotide polymorphisms (SNP) and tandem repeat data sets, can be searched by text, explored on the website, searched using BLAST, and downloaded. To archive more significant BLAST results, a curation system was introduced with which biologists can easily edit best-hit annotation information for others to view. The GarlicESTdb web application is freely available at http://garlicdb.kribb.re.kr. GarlicESTdb is the first incorporated online information database of EST sequences isolated from garlic that can be freely accessed and downloaded. It has many useful features for interactive mining of EST contigs and datasets from each library, including curation of annotated information, expression profiling, information retrieval, and summary of statistics of functional annotation. Consequently, the development of GarlicESTdb will provide a crucial contribution to biologists for data-mining and more efficient experimental studies.
Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach.

PubMed

Rinaldi, Fabio; Schneider, Gerold; Kaljurand, Kaarel; Hess, Michael; Andronis, Christos; Konstandi, Ourania; Persidis, Andreas

2007-02-01

The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.
Integrating concept ontology and multitask learning to achieve more effective classifier training for multilevel image annotation.

PubMed

Fan, Jianping; Gao, Yuli; Luo, Hangzai

2008-03-01

In this paper, we have developed a new scheme for achieving multilevel annotations of large-scale images automatically. To achieve more sufficient representation of various visual properties of the images, both the global visual features and the local visual features are extracted for image content representation. To tackle the problem of huge intraconcept visual diversity, multiple types of kernels are integrated to characterize the diverse visual similarity relationships between the images more precisely, and a multiple kernel learning algorithm is developed for SVM image classifier training. To address the problem of huge interconcept visual similarity, a novel multitask learning algorithm is developed to learn the correlated classifiers for the sibling image concepts under the same parent concept and enhance their discrimination and adaptation power significantly. To tackle the problem of huge intraconcept visual diversity for the image concepts at the higher levels of the concept ontology, a novel hierarchical boosting algorithm is developed to learn their ensemble classifiers hierarchically. In order to assist users on selecting more effective hypotheses for image classifier training, we have developed a novel hyperbolic framework for large-scale image visualization and interactive hypotheses assessment. Our experiments on large-scale image collections have also obtained very positive results.
Using video-annotation software to identify interactions in group therapies for schizophrenia: assessing reliability and associations with outcomes.

PubMed

Orfanos, Stavros; Akther, Syeda Ferhana; Abdul-Basit, Muhammad; McCabe, Rosemarie; Priebe, Stefan

2017-02-10

Research has shown that interactions in group therapies for people with schizophrenia are associated with a reduction in negative symptoms. However, it is unclear which specific interactions in groups are linked with these improvements. The aims of this exploratory study were to i) develop and test the reliability of using video-annotation software to measure interactions in group therapies in schizophrenia and ii) explore the relationship between interactions in group therapies for schizophrenia with clinically relevant changes in negative symptoms. Video-annotation software was used to annotate interactions from participants selected across nine video-recorded out-patient therapy groups (N = 81). Using the Individual Group Member Interpersonal Process Scale, interactions were coded from participants who demonstrated either a clinically significant improvement (N = 9) or no change (N = 8) in negative symptoms at the end of therapy. Interactions were measured from the first and last sessions of attendance (>25 h of therapy). Inter-rater reliability between two independent raters was measured. Binary logistic regression analysis was used to explore the association between the frequency of interactive behaviors and changes in negative symptoms, assessed using the Positive and Negative Syndrome Scale. Of the 1275 statements that were annotated using ELAN, 1191 (93%) had sufficient audio and visual quality to be coded using the Individual Group Member Interpersonal Process Scale. Rater-agreement was high across all interaction categories (>95% average agreement). A higher frequency of self-initiated statements measured in the first session was associated with improvements in negative symptoms. The frequency of questions and giving advice measured in the first session of attendance was associated with improvements in negative symptoms; although this was only a trend. Video-annotation software can be used to reliably identify interactive behaviors in groups for schizophrenia. The results suggest that proactive communicative gestures, as assessed by the video-analysis, predict outcomes. Future research should use this novel method in larger and clinically different samples to explore which aspects of therapy facilitate such proactive communication early on in therapy.

Validation of a semi-automatic protocol for the assessment of the tear meniscus central area based on open-source software

NASA Astrophysics Data System (ADS)

Pena-Verdeal, Hugo; Garcia-Resua, Carlos; Yebra-Pimentel, Eva; Giraldez, Maria J.

2017-08-01

Purpose: Different lower tear meniscus parameters can be clinical assessed on dry eye diagnosis. The aim of this study was to propose and analyse the variability of a semi-automatic method for measuring lower tear meniscus central area (TMCA) by using open source software. Material and methods: On a group of 105 subjects, one video of the lower tear meniscus after fluorescein instillation was generated by a digital camera attached to a slit-lamp. A short light beam (3x5 mm) with moderate illumination in the central portion of the meniscus (6 o'clock) was used. Images were extracted from each video by a masked observer. By using an open source software based on Java (NIH ImageJ), a further observer measured in a masked and randomized order the TMCA in the short light beam illuminated area by two methods: (1) manual method, where TMCA images was "manually" measured; (2) semi-automatic method, where TMCA images were transformed in an 8-bit-binary image, then holes inside this shape were filled and on the isolated shape, the area size was obtained. Finally, both measurements, manual and semi-automatic, were compared. Results: Paired t-test showed no statistical difference between both techniques results (p = 0.102). Pearson correlation between techniques show a significant positive near to perfect correlation (r = 0.99; p < 0.001). Conclusions: This study showed a useful tool to objectively measure the frontal central area of the meniscus in photography by free open source software.
Semi-automatic mapping of cultural heritage from airborne laser scanning using deep learning

NASA Astrophysics Data System (ADS)

Due Trier, Øivind; Salberg, Arnt-Børre; Holger Pilø, Lars; Tonning, Christer; Marius Johansen, Hans; Aarsten, Dagrun

2016-04-01

This paper proposes to use deep learning to improve semi-automatic mapping of cultural heritage from airborne laser scanning (ALS) data. Automatic detection methods, based on traditional pattern recognition, have been applied in a number of cultural heritage mapping projects in Norway for the past five years. Automatic detection of pits and heaps have been combined with visual interpretation of the ALS data for the mapping of deer hunting systems, iron production sites, grave mounds and charcoal kilns. However, the performance of the automatic detection methods varies substantially between ALS datasets. For the mapping of deer hunting systems on flat gravel and sand sediment deposits, the automatic detection results were almost perfect. However, some false detections appeared in the terrain outside of the sediment deposits. These could be explained by other pit-like landscape features, like parts of river courses, spaces between boulders, and modern terrain modifications. However, these were easy to spot during visual interpretation, and the number of missed individual pitfall traps was still low. For the mapping of grave mounds, the automatic method produced a large number of false detections, reducing the usefulness of the semi-automatic approach. The mound structure is a very common natural terrain feature, and the grave mounds are less distinct in shape than the pitfall traps. Still, applying automatic mound detection on an entire municipality did lead to a new discovery of an Iron Age grave field with more than 15 individual mounds. Automatic mound detection also proved to be useful for a detailed re-mapping of Norway's largest Iron Age grave yard, which contains almost 1000 individual graves. Combined pit and mound detection has been applied to the mapping of more than 1000 charcoal kilns that were used by an iron work 350-200 years ago. The majority of charcoal kilns were indirectly detected as either pits on the circumference, a central mound, or both. However, kilns with a flat interior and a shallow ditch along the circumference were often missed by the automatic detection method. The successfulness of automatic detection seems to depend on two factors: (1) the density of ALS ground hits on the cultural heritage structures being sought, and (2) to what extent these structures stand out from natural terrain structures. The first factor may, to some extent, be improved by using a higher number of ALS pulses per square meter. The second factor is difficult to change, and also highlights another challenge: how to make a general automatic method that is applicable in all types of terrain within a country. The mixed experience with traditional pattern recognition for semi-automatic mapping of cultural heritage led us to consider deep learning as an alternative approach. The main principle is that a general feature detector has been trained on a large image database. The feature detector is then tailored to a specific task by using a modest number of images of true and false examples of the features being sought. Results of using deep learning are compared with previous results using traditional pattern recognition.
NCBI prokaryotic genome annotation pipeline.

PubMed

Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

2016-08-19

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Automated contour detection in X-ray left ventricular angiograms using multiview active appearance models and dynamic programming.

PubMed

Oost, Elco; Koning, Gerhard; Sonka, Milan; Oemrawsingh, Pranobe V; Reiber, Johan H C; Lelieveldt, Boudewijn P F

2006-09-01

This paper describes a new approach to the automated segmentation of X-ray left ventricular (LV) angiograms, based on active appearance models (AAMs) and dynamic programming. A coupling of shape and texture information between the end-diastolic (ED) and end-systolic (ES) frame was achieved by constructing a multiview AAM. Over-constraining of the model was compensated for by employing dynamic programming, integrating both intensity and motion features in the cost function. Two applications are compared: a semi-automatic method with manual model initialization, and a fully automatic algorithm. The first proved to be highly robust and accurate, demonstrating high clinical relevance. Based on experiments involving 70 patient data sets, the algorithm's success rate was 100% for ED and 99% for ES, with average unsigned border positioning errors of 0.68 mm for ED and 1.45 mm for ES. Calculated volumes were accurate and unbiased. The fully automatic algorithm, with intrinsically less user interaction was less robust, but showed a high potential, mostly due to a controlled gradient descent in updating the model parameters. The success rate of the fully automatic method was 91% for ED and 83% for ES, with average unsigned border positioning errors of 0.79 mm for ED and 1.55 mm for ES.
Exploring Plant Co-Expression and Gene-Gene Interactions with CORNET 3.0.

PubMed

Van Bel, Michiel; Coppens, Frederik

2017-01-01

Selecting and filtering a reference expression and interaction dataset when studying specific pathways and regulatory interactions can be a very time-consuming and error-prone task. In order to reduce the duplicated efforts required to amass such datasets, we have created the CORNET (CORrelation NETworks) platform which allows for easy access to a wide variety of data types: coexpression data, protein-protein interactions, regulatory interactions, and functional annotations. The CORNET platform outputs its results in either text format or through the Cytoscape framework, which is automatically launched by the CORNET website.CORNET 3.0 is the third iteration of the web platform designed for the user exploration of the coexpression space of plant genomes, with a focus on the model species Arabidopsis thaliana. Here we describe the platform: the tools, data, and best practices when using the platform. We indicate how the platform can be used to infer networks from a set of input genes, such as upregulated genes from an expression experiment. By exploring the network, new target and regulator genes can be discovered, allowing for follow-up experiments and more in-depth study. We also indicate how to avoid common pitfalls when evaluating the networks and how to avoid over interpretation of the results.All CORNET versions are available at http://bioinformatics.psb.ugent.be/cornet/ .
Semi-automated CCTV surveillance: the effects of system confidence, system accuracy and task complexity on operator vigilance, reliance and workload.

PubMed

Dadashi, N; Stedmon, A W; Pridmore, T P

2013-09-01

Recent advances in computer vision technology have lead to the development of various automatic surveillance systems, however their effectiveness is adversely affected by many factors and they are not completely reliable. This study investigated the potential of a semi-automated surveillance system to reduce CCTV operator workload in both detection and tracking activities. A further focus of interest was the degree of user reliance on the automated system. A simulated prototype was developed which mimicked an automated system that provided different levels of system confidence information. Dependent variable measures were taken for secondary task performance, reliance and subjective workload. When the automatic component of a semi-automatic CCTV surveillance system provided reliable system confidence information to operators, workload significantly decreased and spare mental capacity significantly increased. Providing feedback about system confidence and accuracy appears to be one important way of making the status of the automated component of the surveillance system more 'visible' to users and hence more effective to use. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Literature mining of protein-residue associations with graph rules learned through distant supervision.

PubMed

Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin

2012-10-05

We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.
Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy.

PubMed

Letunic, Ivica; Bork, Peer

2011-07-01

Interactive Tree Of Life (http://itol.embl.de) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. It is freely available and open to everyone. In addition to classical tree viewer functions, iTOL offers many novel ways of annotating trees with various additional data. Current version introduces numerous new features and greatly expands the number of supported data set types. Trees can be interactively manipulated and edited. A free personal account system is available, providing management and sharing of trees in user defined workspaces and projects. Export to various bitmap and vector graphics formats is supported. Batch access interface is available for programmatic access or inclusion of interactive trees into other web services.
Automatic, semi-automatic and manual validation of urban drainage data.

PubMed

Branisavljević, N; Prodanović, D; Pavlović, D

2010-01-01

Advances in sensor technology and the possibility of automated long distance data transmission have made continuous measurements the preferable way of monitoring urban drainage processes. Usually, the collected data have to be processed by an expert in order to detect and mark the wrong data, remove them and replace them with interpolated data. In general, the first step in detecting the wrong, anomaly data is called the data quality assessment or data validation. Data validation consists of three parts: data preparation, validation scores generation and scores interpretation. This paper will present the overall framework for the data quality improvement system, suitable for automatic, semi-automatic or manual operation. The first two steps of the validation process are explained in more detail, using several validation methods on the same set of real-case data from the Belgrade sewer system. The final part of the validation process, which is the scores interpretation, needs to be further investigated on the developed system.
Semi-automatic brain tumor segmentation by constrained MRFs using structural trajectories.

PubMed

Zhao, Liang; Wu, Wei; Corso, Jason J

2013-01-01

Quantifying volume and growth of a brain tumor is a primary prognostic measure and hence has received much attention in the medical imaging community. Most methods have sought a fully automatic segmentation, but the variability in shape and appearance of brain tumor has limited their success and further adoption in the clinic. In reaction, we present a semi-automatic brain tumor segmentation framework for multi-channel magnetic resonance (MR) images. This framework does not require prior model construction and only requires manual labels on one automatically selected slice. All other slices are labeled by an iterative multi-label Markov random field optimization with hard constraints. Structural trajectories-the medical image analog to optical flow and 3D image over-segmentation are used to capture pixel correspondences between consecutive slices for pixel labeling. We show robustness and effectiveness through an evaluation on the 2012 MICCAI BRATS Challenge Dataset; our results indicate superior performance to baselines and demonstrate the utility of the constrained MRF formulation.
Chemical Principles Revisited: Annotating Reaction Equations.

ERIC Educational Resources Information Center

Tykodi, R. J.

1987-01-01

Urges chemistry teachers to have students annotate the chemical reactions in aqueous-solutions that they see in their textbooks and witness in the laboratory. Suggests this will help students recognize the reaction type more readily. Examples are given for gas formation, precipitate formation, redox interaction, acid-base interaction, and…
Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana

2012-03-27

Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to themore » un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, and a transcriptional regulator, among other proteins, most of which are annotated as hypothetical, that were missed during annotation.« less
Annotation-Based Learner's Personality Modeling in Distance Learning Context

ERIC Educational Resources Information Center

Omheni, Nizar; Kalboussi, Anis; Mazhoud, Omar; Kacem, Ahmed Hadj

2016-01-01

Researchers in distance education are interested in observing and modeling learners' personality profiles, and adapting their learning experiences accordingly. When learners read and interact with their reading materials, they do unselfconscious activities like annotation which may be key feature of their personalities. Annotation activity…
iELM—a web server to explore short linear motif-mediated interactions

PubMed Central

Weatheritt, Robert J.; Jehl, Peter; Dinkel, Holger; Gibson, Toby J.

2012-01-01

The recent expansion in our knowledge of protein–protein interactions (PPIs) has allowed the annotation and prediction of hundreds of thousands of interactions. However, the function of many of these interactions remains elusive. The interactions of Eukaryotic Linear Motif (iELM) web server provides a resource for predicting the function and positional interface for a subset of interactions mediated by short linear motifs (SLiMs). The iELM prediction algorithm is based on the annotated SLiM classes from the Eukaryotic Linear Motif (ELM) resource and allows users to explore both annotated and user-generated PPI networks for SLiM-mediated interactions. By incorporating the annotated information from the ELM resource, iELM provides functional details of PPIs. This can be used in proteomic analysis, for example, to infer whether an interaction promotes complex formation or degradation. Furthermore, details of the molecular interface of the SLiM-mediated interactions are also predicted. This information is displayed in a fully searchable table, as well as graphically with the modular architecture of the participating proteins extracted from the UniProt and Phospho.ELM resources. A network figure is also presented to aid the interpretation of results. The iELM server supports single protein queries as well as large-scale proteomic submissions and is freely available at http://i.elm.eu.org. PMID:22638578
Development of Semi-Automatic Lathe by using Intelligent Soft Computing Technique

NASA Astrophysics Data System (ADS)

Sakthi, S.; Niresh, J.; Vignesh, K.; Anand Raj, G.

2018-03-01

This paper discusses the enhancement of conventional lathe machine to semi-automated lathe machine by implementing a soft computing method. In the present scenario, lathe machine plays a vital role in the engineering division of manufacturing industry. While the manual lathe machines are economical, the accuracy and efficiency are not up to the mark. On the other hand, CNC machine provide the desired accuracy and efficiency, but requires a huge capital. In order to over come this situation, a semi-automated approach towards the conventional lathe machine is developed by employing stepper motors to the horizontal and vertical drive, that can be controlled by Arduino UNO -microcontroller. Based on the input parameters of the lathe operation the arduino coding is been generated and transferred to the UNO board. Thus upgrading from manual to semi-automatic lathe machines can significantly increase the accuracy and efficiency while, at the same time, keeping a check on investment cost and consequently provide a much needed escalation to the manufacturing industry.
Using Ontologies to Interlink Linguistic Annotations and Improve Their Accuracy

ERIC Educational Resources Information Center

Pareja-Lora, Antonio

2016-01-01

For the new approaches to language e-learning (e.g. language blended learning, language autonomous learning or mobile-assisted language learning) to succeed, some automatic functions for error correction (for instance, in exercises) will have to be included in the long run in the corresponding environments and/or applications. A possible way to…
iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata

ERIC Educational Resources Information Center

Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen

2012-01-01

Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…
The Greek National Observatory of Forest Fires (NOFFi)

NASA Astrophysics Data System (ADS)

Tompoulidou, Maria; Stefanidou, Alexandra; Grigoriadis, Dionysios; Dragozi, Eleni; Stavrakoudis, Dimitris; Gitas, Ioannis Z.

2016-08-01

Efficient forest fire management is a key element for alleviating the catastrophic impacts of wildfires. Overall, the effective response to fire events necessitates adequate planning and preparedness before the start of the fire season, as well as quantifying the environmental impacts in case of wildfires. Moreover, the estimation of fire danger provides crucial information required for the optimal allocation and distribution of the available resources. The Greek National Observatory of Forest Fires (NOFFi)—established by the Greek Forestry Service in collaboration with the Laboratory of Forest Management and Remote Sensing of the Aristotle University of Thessaloniki and the International Balkan Center—aims to develop a series of modern products and services for supporting the efficient forest fire prevention management in Greece and the Balkan region, as well as to stimulate the development of transnational fire prevention and impacts mitigation policies. More specifically, NOFFi provides three main fire-related products and services: a) a remote sensing-based fuel type mapping methodology, b) a semi-automatic burned area mapping service, and c) a dynamically updatable fire danger index providing mid- to long-term predictions. The fuel type mapping methodology was developed and applied across the country, following an object-oriented approach and using Landsat 8 OLI satellite imagery. The results showcase the effectiveness of the generated methodology in obtaining highly accurate fuel type maps on a national level. The burned area mapping methodology was developed as a semi-automatic object-based classification process, carefully crafted to minimize user interaction and, hence, be easily applicable on a near real-time operational level as well as for mapping historical events. NOFFi's products can be visualized through the interactive Fire Forest portal, which allows the involvement and awareness of the relevant stakeholders via the Public Participation GIS (PPGIS) tool.
Automated prediction of protein function and detection of functional sites from structure.

PubMed

Pazos, Florencio; Sternberg, Michael J E

2004-10-12

Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.
Virus Particle Detection by Convolutional Neural Network in Transmission Electron Microscopy Images.

PubMed

Ito, Eisuke; Sato, Takaaki; Sano, Daisuke; Utagawa, Etsuko; Kato, Tsuyoshi

2018-06-01

A new computational method for the detection of virus particles in transmission electron microscopy (TEM) images is presented. Our approach is to use a convolutional neural network that transforms a TEM image to a probabilistic map that indicates where virus particles exist in the image. Our proposed approach automatically and simultaneously learns both discriminative features and classifier for virus particle detection by machine learning, in contrast to existing methods that are based on handcrafted features that yield many false positives and require several postprocessing steps. The detection performance of the proposed method was assessed against a dataset of TEM images containing feline calicivirus particles and compared with several existing detection methods, and the state-of-the-art performance of the developed method for detecting virus was demonstrated. Since our method is based on supervised learning that requires both the input images and their corresponding annotations, it is basically used for detection of already-known viruses. However, the method is highly flexible, and the convolutional networks can adapt themselves to any virus particles by learning automatically from an annotated dataset.

Automatic cerebrospinal fluid segmentation in non-contrast CT images using a 3D convolutional network

NASA Astrophysics Data System (ADS)

Patel, Ajay; van de Leemput, Sil C.; Prokop, Mathias; van Ginneken, Bram; Manniesing, Rashindra

2017-03-01

Segmentation of anatomical structures is fundamental in the development of computer aided diagnosis systems for cerebral pathologies. Manual annotations are laborious, time consuming and subject to human error and observer variability. Accurate quantification of cerebrospinal fluid (CSF) can be employed as a morphometric measure for diagnosis and patient outcome prediction. However, segmenting CSF in non-contrast CT images is complicated by low soft tissue contrast and image noise. In this paper we propose a state-of-the-art method using a multi-scale three-dimensional (3D) fully convolutional neural network (CNN) to automatically segment all CSF within the cranial cavity. The method is trained on a small dataset comprised of four manually annotated cerebral CT images. Quantitative evaluation of a separate test dataset of four images shows a mean Dice similarity coefficient of 0.87 +/- 0.01 and mean absolute volume difference of 4.77 +/- 2.70 %. The average prediction time was 68 seconds. Our method allows for fast and fully automated 3D segmentation of cerebral CSF in non-contrast CT, and shows promising results despite a limited amount of training data.
Automatic classification of radiological reports for clinical care.

PubMed

Gerevini, Alfonso Emilio; Lavelli, Alberto; Maffi, Alessandro; Maroldi, Roberto; Minard, Anne-Lyse; Serina, Ivan; Squassina, Guido

2018-06-07

Radiological reporting generates a large amount of free-text clinical narratives, a potentially valuable source of information for improving clinical care and supporting research. The use of automatic techniques to analyze such reports is necessary to make their content effectively available to radiologists in an aggregated form. In this paper we focus on the classification of chest computed tomography reports according to a classification schema proposed for this task by radiologists of the Italian hospital ASST Spedali Civili di Brescia. The proposed system is built exploiting a training data set containing reports annotated by radiologists. Each report is classified according to the schema developed by radiologists and textual evidences are marked in the report. The annotations are then used to train different machine learning based classifiers. We present in this paper a method based on a cascade of classifiers which make use of a set of syntactic and semantic features. The resulting system is a novel hierarchical classification system for the given task, that we have experimentally evaluated. Copyright © 2018 Elsevier B.V. All rights reserved.
Automatic detection of apical roots in oral radiographs

NASA Astrophysics Data System (ADS)

Wu, Yi; Xie, Fangfang; Yang, Jie; Cheng, Erkang; Megalooikonomou, Vasileios; Ling, Haibin

2012-03-01

The apical root regions play an important role in analysis and diagnosis of many oral diseases. Automatic detection of such regions is consequently the first step toward computer-aided diagnosis of these diseases. In this paper we propose an automatic method for periapical root region detection by using the state-of-theart machine learning approaches. Specifically, we have adapted the AdaBoost classifier for apical root detection. One challenge in the task is the lack of training cases especially for diseased ones. To handle this problem, we boost the training set by including more root regions that are close to the annotated ones and decompose the original images to randomly generate negative samples. Based on these training samples, the Adaboost algorithm in combination with Haar wavelets is utilized in this task to train an apical root detector. The learned detector usually generates a large amount of true and false positives. In order to reduce the number of false positives, a confidence score for each candidate detection result is calculated for further purification. We first merge the detected regions by combining tightly overlapped detected candidate regions and then we use the confidence scores from the Adaboost detector to eliminate the false positives. The proposed method is evaluated on a dataset containing 39 annotated digitized oral X-Ray images from 21 patients. The experimental results show that our approach can achieve promising detection accuracy.
Automatic Summarization of MEDLINE Citations for Evidence–Based Medical Treatment: A Topic-Oriented Evaluation

PubMed Central

Fiszman, Marcelo; Demner-Fushman, Dina; Kilicoglu, Halil; Rindflesch, Thomas C.

2009-01-01

As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for fifty-three diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p < 0.01) and the increase in the overall score of clinical usefulness was 0.39 (p < 0.05). PMID:19022398
AnnoLnc: a web server for systematically annotating novel human lncRNAs.

PubMed

Hou, Mei; Tang, Xing; Tian, Feng; Shi, Fangyuan; Liu, Fenglin; Gao, Ge

2016-11-16

Long noncoding RNAs (lncRNAs) have been shown to play essential roles in almost every important biological process through multiple mechanisms. Although the repertoire of human lncRNAs has rapidly expanded, their biological function and regulation remain largely elusive, calling for a systematic and integrative annotation tool. Here we present AnnoLnc ( http://annolnc.cbi.pku.edu.cn ), a one-stop portal for systematically annotating novel human lncRNAs. Based on more than 700 data sources and various tool chains, AnnoLnc enables a systematic annotation covering genomic location, secondary structure, expression patterns, transcriptional regulation, miRNA interaction, protein interaction, genetic association and evolution. An intuitive web interface is available for interactive analysis through both desktops and mobile devices, and programmers can further integrate AnnoLnc into their pipeline through standard JSON-based Web Service APIs. To the best of our knowledge, AnnoLnc is the only web server to provide on-the-fly and systematic annotation for newly identified human lncRNAs. Compared with similar tools, the annotation generated by AnnoLnc covers a much wider spectrum with intuitive visualization. Case studies demonstrate the power of AnnoLnc in not only rediscovering known functions of human lncRNAs but also inspiring novel hypotheses.
Semi-automatic breast ultrasound image segmentation based on mean shift and graph cuts.

PubMed

Zhou, Zhuhuang; Wu, Weiwei; Wu, Shuicai; Tsui, Po-Hsiang; Lin, Chung-Chih; Zhang, Ling; Wang, Tianfu

2014-10-01

Computerized tumor segmentation on breast ultrasound (BUS) images remains a challenging task. In this paper, we proposed a new method for semi-automatic tumor segmentation on BUS images using Gaussian filtering, histogram equalization, mean shift, and graph cuts. The only interaction required was to select two diagonal points to determine a region of interest (ROI) on an input image. The ROI image was shrunken by a factor of 2 using bicubic interpolation to reduce computation time. The shrunken image was smoothed by a Gaussian filter and then contrast-enhanced by histogram equalization. Next, the enhanced image was filtered by pyramid mean shift to improve homogeneity. The object and background seeds for graph cuts were automatically generated on the filtered image. Using these seeds, the filtered image was then segmented by graph cuts into a binary image containing the object and background. Finally, the binary image was expanded by a factor of 2 using bicubic interpolation, and the expanded image was processed by morphological opening and closing to refine the tumor contour. The method was implemented with OpenCV 2.4.3 and Visual Studio 2010 and tested for 38 BUS images with benign tumors and 31 BUS images with malignant tumors from different ultrasound scanners. Experimental results showed that our method had a true positive rate (TP) of 91.7%, a false positive (FP) rate of 11.9%, and a similarity (SI) rate of 85.6%. The mean run time on Intel Core 2.66 GHz CPU and 4 GB RAM was 0.49 ± 0.36 s. The experimental results indicate that the proposed method may be useful in BUS image segmentation. © The Author(s) 2014.
Robust extraction of the aorta and pulmonary artery from 3D MDCT image data

NASA Astrophysics Data System (ADS)

Taeprasartsit, Pinyo; Higgins, William E.

2010-03-01

Accurate definition of the aorta and pulmonary artery from three-dimensional (3D) multi-detector CT (MDCT) images is important for pulmonary applications. This work presents robust methods for defining the aorta and pulmonary artery in the central chest. The methods work on both contrast enhanced and no-contrast 3D MDCT image data. The automatic methods use a common approach employing model fitting and selection and adaptive refinement. During the occasional event that more precise vascular extraction is desired or the method fails, we also have an alternate semi-automatic fail-safe method. The semi-automatic method extracts the vasculature by extending the medial axes into a user-guided direction. A ground-truth study over a series of 40 human 3D MDCT images demonstrates the efficacy, accuracy, robustness, and efficiency of the methods.
Semi-automatic motion compensation of contrast-enhanced ultrasound images from abdominal organs for perfusion analysis.

PubMed

Schäfer, Sebastian; Nylund, Kim; Sævik, Fredrik; Engjom, Trond; Mézl, Martin; Jiřík, Radovan; Dimcevski, Georg; Gilja, Odd Helge; Tönnies, Klaus

2015-08-01

This paper presents a system for correcting motion influences in time-dependent 2D contrast-enhanced ultrasound (CEUS) images to assess tissue perfusion characteristics. The system consists of a semi-automatic frame selection method to find images with out-of-plane motion as well as a method for automatic motion compensation. Translational and non-rigid motion compensation is applied by introducing a temporal continuity assumption. A study consisting of 40 clinical datasets was conducted to compare the perfusion with simulated perfusion using pharmacokinetic modeling. Overall, the proposed approach decreased the mean average difference between the measured perfusion and the pharmacokinetic model estimation. It was non-inferior for three out of four patient cohorts to a manual approach and reduced the analysis time by 41% compared to manual processing. Copyright © 2014 Elsevier Ltd. All rights reserved.
UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

PubMed

Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis

2016-01-01

The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
A Methodology and Implementation for Annotating Digital Images for Context-appropriate Use in an Academic Health Care Environment

PubMed Central

Goede, Patricia A.; Lauman, Jason R.; Cochella, Christopher; Katzman, Gregory L.; Morton, David A.; Albertine, Kurt H.

2004-01-01

Use of digital medical images has become common over the last several years, coincident with the release of inexpensive, mega-pixel quality digital cameras and the transition to digital radiology operation by hospitals. One problem that clinicians, medical educators, and basic scientists encounter when handling images is the difficulty of using business and graphic arts commercial-off-the-shelf (COTS) software in multicontext authoring and interactive teaching environments. The authors investigated and developed software-supported methodologies to help clinicians, medical educators, and basic scientists become more efficient and effective in their digital imaging environments. The software that the authors developed provides the ability to annotate images based on a multispecialty methodology for annotation and visual knowledge representation. This annotation methodology is designed by consensus, with contributions from the authors and physicians, medical educators, and basic scientists in the Departments of Radiology, Neurobiology and Anatomy, Dermatology, and Ophthalmology at the University of Utah. The annotation methodology functions as a foundation for creating, using, reusing, and extending dynamic annotations in a context-appropriate, interactive digital environment. The annotation methodology supports the authoring process as well as output and presentation mechanisms. The annotation methodology is the foundation for a Windows implementation that allows annotated elements to be represented as structured eXtensible Markup Language and stored separate from the image(s). PMID:14527971
Segmentation of whole cells and cell nuclei from 3-D optical microscope images using dynamic programming.

PubMed

McCullough, D P; Gudla, P R; Harris, B S; Collins, J A; Meaburn, K J; Nakaya, M A; Yamaguchi, T P; Misteli, T; Lockett, S J

2008-05-01

Communications between cells in large part drive tissue development and function, as well as disease-related processes such as tumorigenesis. Understanding the mechanistic bases of these processes necessitates quantifying specific molecules in adjacent cells or cell nuclei of intact tissue. However, a major restriction on such analyses is the lack of an efficient method that correctly segments each object (cell or nucleus) from 3-D images of an intact tissue specimen. We report a highly reliable and accurate semi-automatic algorithmic method for segmenting fluorescence-labeled cells or nuclei from 3-D tissue images. Segmentation begins with semi-automatic, 2-D object delineation in a user-selected plane, using dynamic programming (DP) to locate the border with an accumulated intensity per unit length greater that any other possible border around the same object. Then the two surfaces of the object in planes above and below the selected plane are found using an algorithm that combines DP and combinatorial searching. Following segmentation, any perceived errors can be interactively corrected. Segmentation accuracy is not significantly affected by intermittent labeling of object surfaces, diffuse surfaces, or spurious signals away from surfaces. The unique strength of the segmentation method was demonstrated on a variety of biological tissue samples where all cells, including irregularly shaped cells, were accurately segmented based on visual inspection.
Denoising and 4D visualization of OCT images

PubMed Central

Gargesha, Madhusudhana; Jenkins, Michael W.; Rollins, Andrew M.; Wilson, David L.

2009-01-01

We are using Optical Coherence Tomography (OCT) to image structure and function of the developing embryonic heart in avian models. Fast OCT imaging produces very large 3D (2D + time) and 4D (3D volumes + time) data sets, which greatly challenge ones ability to visualize results. Noise in OCT images poses additional challenges. We created an algorithm with a quick, data set specific optimization for reduction of both shot and speckle noise and applied it to 3D visualization and image segmentation in OCT. When compared to baseline algorithms (median, Wiener, orthogonal wavelet, basic non-orthogonal wavelet), a panel of experts judged the new algorithm to give much improved volume renderings concerning both noise and 3D visualization. Specifically, the algorithm provided a better visualization of the myocardial and endocardial surfaces, and the interaction of the embryonic heart tube with surrounding tissue. Quantitative evaluation using an image quality figure of merit also indicated superiority of the new algorithm. Noise reduction aided semi-automatic 2D image segmentation, as quantitatively evaluated using a contour distance measure with respect to an expert segmented contour. In conclusion, the noise reduction algorithm should be quite useful for visualization and quantitative measurements (e.g., heart volume, stroke volume, contraction velocity, etc.) in OCT embryo images. With its semi-automatic, data set specific optimization, we believe that the algorithm can be applied to OCT images from other applications. PMID:18679509
Framework for automatic information extraction from research papers on nanocrystal devices

PubMed Central

Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

2015-01-01

Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057
Design and pilot evaluation of a system to develop computer-based site-specific practice guidelines from decision models.

PubMed

Sanders, G D; Nease, R F; Owens, D K

2000-01-01

Local tailoring of clinical practice guidelines (CPGs) requires experts in medicine and evidence synthesis unavailable in many practice settings. The authors' computer-based system enables developers and users to create, disseminate, and tailor CPGs, using normative decision models (DMs). ALCHEMIST, a web-based system, analyzes a DM, creates a CPG in the form of an annotated algorithm, and displays for the guideline user the optimal strategy. ALCHEMIST'S interface enables remote users to tailor the guideline by changing underlying input variables and observing the new annotated algorithm that is developed automatically. In a pilot evaluation of the system, a DM was used to evaluate strategies for staging non-small-cell lung cancer. Subjects (n = 15) compared the automatically created CPG with published guidelines for this staging and critiqued both using a previously developed instrument to rate the CPGs' usability, accountability, and accuracy on a scale of 0 (worst) to 2 (best), with higher scores reflecting higher quality. The mean overall score for the ALCHEMIST CPG was 1.502, compared with the published-CPG score of 0.987 (p = 0.002). The ALCHEMIST CPG scores for usability, accountability, and accuracy were 1.683, 1.393, and 1.430, respectively; the published CPG scores were 1.192, 0.941, and 0.830 (each comparison p < 0.05). On a scale of 1 (worst) to 5 (best), users' mean ratings of ALCHEMIST'S ease of use, usefulness of content, and presentation format were 4.76, 3.98, and 4.64, respectively. The results demonstrate the feasibility of a web-based system that automatically analyzes a DM and creates a CPG as an annotated algorithm, enabling remote users to develop site-specific CPGs. In the pilot evaluation, the ALCHEMIST guidelines met established criteria for quality and compared favorably with national CPGs. The high usability and usefulness ratings suggest that such systems can be a good tool for guideline development.
Framework for automatic information extraction from research papers on nanocrystal devices.

PubMed

Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

2015-01-01

To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.
Computing of Learner's Personality Traits Based on Digital Annotations

ERIC Educational Resources Information Center

Omheni, Nizar; Kalboussi, Anis; Mazhoud, Omar; Kacem, Ahmed Hadj

2017-01-01

Researchers in education are interested in modeling of learner's profile and adapt their learning experiences accordingly. When learners read and interact with their reading materials, they do unconscious practices like annotations which may be, a key feature of their personalities. Annotation activity requires readers to be active, to think…
Recognition of Learner's Personality Traits through Digital Annotations in Distance Learning

ERIC Educational Resources Information Center

Omheni, Nizar; Kalboussi, Anis; Mazhoud, Omar; Kacem, Ahmed Hadj

2017-01-01

Researchers in distance education are interested in observing and modelling of learner's personality profile, and adapting their learning experiences accordingly. When learners read and interact with their reading materials, they do unselfconscious activities like annotation which may be a key feature of their personalities. Annotation activity…
Validating automatic semantic annotation of anatomy in DICOM CT images

NASA Astrophysics Data System (ADS)

Pathak, Sayan D.; Criminisi, Antonio; Shotton, Jamie; White, Steve; Robertson, Duncan; Sparks, Bobbi; Munasinghe, Indeera; Siddiqui, Khan

2011-03-01

In the current health-care environment, the time available for physicians to browse patients' scans is shrinking due to the rapid increase in the sheer number of images. This is further aggravated by mounting pressure to become more productive in the face of decreasing reimbursement. Hence, there is an urgent need to deliver technology which enables faster and effortless navigation through sub-volume image visualizations. Annotating image regions with semantic labels such as those derived from the RADLEX ontology can vastly enhance image navigation and sub-volume visualization. This paper uses random regression forests for efficient, automatic detection and localization of anatomical structures within DICOM 3D CT scans. A regression forest is a collection of decision trees which are trained to achieve direct mapping from voxels to organ location and size in a single pass. This paper focuses on comparing automated labeling with expert-annotated ground-truth results on a database of 50 highly variable CT scans. Initial investigations show that regression forest derived localization errors are smaller and more robust than those achieved by state-of-the-art global registration approaches. The simplicity of the algorithm's context-rich visual features yield typical runtimes of less than 10 seconds for a 5123 voxel DICOM CT series on a single-threaded, single-core machine running multiple trees; each tree taking less than a second. Furthermore, qualitative evaluation demonstrates that using the detected organs' locations as index into the image volume improves the efficiency of the navigational workflow in all the CT studies.
Measuring Patient Mobility in the ICU Using a Novel Noninvasive Sensor.

PubMed

Ma, Andy J; Rawat, Nishi; Reiter, Austin; Shrock, Christine; Zhan, Andong; Stone, Alex; Rabiee, Anahita; Griffin, Stephanie; Needham, Dale M; Saria, Suchi

2017-04-01

To develop and validate a noninvasive mobility sensor to automatically and continuously detect and measure patient mobility in the ICU. Prospective, observational study. Surgical ICU at an academic hospital. Three hundred sixty-two hours of sensor color and depth image data were recorded and curated into 109 segments, each containing 1,000 images, from eight patients. None. Three Microsoft Kinect sensors (Microsoft, Beijing, China) were deployed in one ICU room to collect continuous patient mobility data. We developed software that automatically analyzes the sensor data to measure mobility and assign the highest level within a time period. To characterize the highest mobility level, a validated 11-point mobility scale was collapsed into four categories: nothing in bed, in-bed activity, out-of-bed activity, and walking. Of the 109 sensor segments, the noninvasive mobility sensor was developed using 26 of these from three ICU patients and validated on 83 remaining segments from five different patients. Three physicians annotated each segment for the highest mobility level. The weighted Kappa (κ) statistic for agreement between automated noninvasive mobility sensor output versus manual physician annotation was 0.86 (95% CI, 0.72-1.00). Disagreement primarily occurred in the "nothing in bed" versus "in-bed activity" categories because "the sensor assessed movement continuously," which was significantly more sensitive to motion than physician annotations using a discrete manual scale. Noninvasive mobility sensor is a novel and feasible method for automating evaluation of ICU patient mobility.
Development and validation of automatic tools for interactive recurrence analysis in radiation therapy: optimization of treatment algorithms for locally advanced pancreatic cancer.

PubMed

Kessel, Kerstin A; Habermehl, Daniel; Jäger, Andreas; Floca, Ralf O; Zhang, Lanlan; Bendl, Rolf; Debus, Jürgen; Combs, Stephanie E

2013-06-07

In radiation oncology recurrence analysis is an important part in the evaluation process and clinical quality assurance of treatment concepts. With the example of 9 patients with locally advanced pancreatic cancer we developed and validated interactive analysis tools to support the evaluation workflow. After an automatic registration of the radiation planning CTs with the follow-up images, the recurrence volumes are segmented manually. Based on these volumes the DVH (dose volume histogram) statistic is calculated, followed by the determination of the dose applied to the region of recurrence and the distance between the boost and recurrence volume. We calculated the percentage of the recurrence volume within the 80%-isodose volume and compared it to the location of the recurrence within the boost volume, boost + 1 cm, boost + 1.5 cm and boost + 2 cm volumes. Recurrence analysis of 9 patients demonstrated that all recurrences except one occurred within the defined GTV/boost volume; one recurrence developed beyond the field border/outfield. With the defined distance volumes in relation to the recurrences, we could show that 7 recurrent lesions were within the 2 cm radius of the primary tumor. Two large recurrences extended beyond the 2 cm, however, this might be due to very rapid growth and/or late detection of the tumor progression. The main goal of using automatic analysis tools is to reduce time and effort conducting clinical analyses. We showed a first approach and use of a semi-automated workflow for recurrence analysis, which will be continuously optimized. In conclusion, despite the limitations of the automatic calculations we contributed to in-house optimization of subsequent study concepts based on an improved and validated target volume definition.

Self-calibrating models for dynamic monitoring and diagnosis

NASA Technical Reports Server (NTRS)

Kuipers, Benjamin

1996-01-01

A method for automatically building qualitative and semi-quantitative models of dynamic systems, and using them for monitoring and fault diagnosis, is developed and demonstrated. The qualitative approach and semi-quantitative method are applied to monitoring observation streams, and to design of non-linear control systems.
The BioC-BioGRID corpus: full text articles annotated for curation of protein–protein and genetic interactions

PubMed Central

Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike

2017-01-01

A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report. Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html PMID:28077563
Semi-discrete Galerkin solution of the compressible boundary-layer equations with viscous-inviscid interaction

NASA Technical Reports Server (NTRS)

Day, Brad A.; Meade, Andrew J., Jr.

1993-01-01

A semi-discrete Galerkin (SDG) method is under development to model attached, turbulent, and compressible boundary layers for transonic airfoil analysis problems. For the boundary-layer formulation the method models the spatial variable normal to the surface with linear finite elements and the time-like variable with finite differences. A Dorodnitsyn transformed system of equations is used to bound the infinite spatial domain thereby providing high resolution near the wall and permitting the use of a uniform finite element grid which automatically follows boundary-layer growth. The second-order accurate Crank-Nicholson scheme is applied along with a linearization method to take advantage of the parabolic nature of the boundary-layer equations and generate a non-iterative marching routine. The SDG code can be applied to any smoothly-connected airfoil shape without modification and can be coupled to any inviscid flow solver. In this analysis, a direct viscous-inviscid interaction is accomplished between the Euler and boundary-layer codes through the application of a transpiration velocity boundary condition. Results are presented for compressible turbulent flow past RAE 2822 and NACA 0012 airfoils at various freestream Mach numbers, Reynolds numbers, and angles of attack.
Very Portable Remote Automatic Weather Stations

Treesearch

John R. Warren

1987-01-01

Remote Automatic Weather Stations (RAWS) were introduced to Forest Service and Bureau of Land Management field units in 1978 following development, test, and evaluation activities conducted jointly by the two agencies. The original configuration was designed for semi-permanent installation. Subsequently, a need for a more portable RAWS was expressed, and one was...
Research in Automatic Russian-English Scientific and Technical Lexicography. Final Report.

ERIC Educational Resources Information Center

Wayne State Univ., Detroit, MI.

Techniques of reversing English-Russian scientific and technical dictionaries into Russian-English versions through semi-automated compilation are described. Sections on manual and automatic processing discuss pre- and post-editing, the task program, updater (correction of errors and revision by specialist in a given field), the system employed…
Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function.

PubMed

Busk, P K; Pilgaard, B; Lezyk, M J; Meyer, A S; Lange, L

2017-04-12

Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes.
Automated generation of patient-tailored electronic care pathways by translating computer-interpretable guidelines into hierarchical task networks.

PubMed

González-Ferrer, Arturo; ten Teije, Annette; Fdez-Olivares, Juan; Milian, Krystyna

2013-02-01

This paper describes a methodology which enables computer-aided support for the planning, visualization and execution of personalized patient treatments in a specific healthcare process, taking into account complex temporal constraints and the allocation of institutional resources. To this end, a translation from a time-annotated computer-interpretable guideline (CIG) model of a clinical protocol into a temporal hierarchical task network (HTN) planning domain is presented. The proposed method uses a knowledge-driven reasoning process to translate knowledge previously described in a CIG into a corresponding HTN Planning and Scheduling domain, taking advantage of HTNs known ability to (i) dynamically cope with temporal and resource constraints, and (ii) automatically generate customized plans. The proposed method, focusing on the representation of temporal knowledge and based on the identification of workflow and temporal patterns in a CIG, makes it possible to automatically generate time-annotated and resource-based care pathways tailored to the needs of any possible patient profile. The proposed translation is illustrated through a case study based on a 70 pages long clinical protocol to manage Hodgkin's disease, developed by the Spanish Society of Pediatric Oncology. We show that an HTN planning domain can be generated from the corresponding specification of the protocol in the Asbru language, providing a running example of this translation. Furthermore, the correctness of the translation is checked and also the management of ten different types of temporal patterns represented in the protocol. By interpreting the automatically generated domain with a state-of-art HTN planner, a time-annotated care pathway is automatically obtained, customized for the patient's and institutional needs. The generated care pathway can then be used by clinicians to plan and manage the patients long-term care. The described methodology makes it possible to automatically generate patient-tailored care pathways, leveraging an incremental knowledge-driven engineering process that starts from the expert knowledge of medical professionals. The presented approach makes the most of the strengths inherent in both CIG languages and HTN planning and scheduling techniques: for the former, knowledge acquisition and representation of the original clinical protocol, and for the latter, knowledge reasoning capabilities and an ability to deal with complex temporal and resource constraints. Moreover, the proposed approach provides immediate access to technologies such as business process management (BPM) tools, which are increasingly being used to support healthcare processes. Copyright © 2012 Elsevier B.V. All rights reserved.
New approaches for measuring changes in the cortical surface using an automatic reconstruction algorithm

NASA Astrophysics Data System (ADS)

Pham, Dzung L.; Han, Xiao; Rettmann, Maryam E.; Xu, Chenyang; Tosun, Duygu; Resnick, Susan; Prince, Jerry L.

2002-05-01

In previous work, the authors presented a multi-stage procedure for the semi-automatic reconstruction of the cerebral cortex from magnetic resonance images. This method suffered from several disadvantages. First, the tissue classification algorithm used can be sensitive to noise within the image. Second, manual interaction was required for masking out undesired regions of the brain image, such as the ventricles and putamen. Third, iterated median filters were used to perform a topology correction on the initial cortical surface, resulting in an overly smoothed initial surface. Finally, the deformable surface used to converge to the cortex had difficulty capturing narrow gyri. In this work, all four disadvantages of the procedure have been addressed. A more robust tissue classification algorithm is employed and the manual masking step is replaced by an automatic method involving level set deformable models. Instead of iterated median filters, an algorithm developed specifically for topology correction is used. The last disadvantage is addressed using an algorithm that artificially separates adjacent sulcal banks. The new procedure is more automated but also more accurate than the previous one. Its utility is demonstrated by performing a preliminary study on data from the Baltimore Longitudinal Study of Aging.
Construction of ontology augmented networks for protein complex prediction.

PubMed

Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

2013-01-01

Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
A Semi-Automatic Alignment Method for Math Educational Standards Using the MP (Materialization Pattern) Model

ERIC Educational Resources Information Center

Choi, Namyoun

2010-01-01

Educational standards alignment, which matches similar or equivalent concepts of educational standards, is a necessary task for educational resource discovery and retrieval. Automated or semi-automated alignment systems for educational standards have been recently available. However, existing systems frequently result in inconsistency in…
Semi-automated identification of leopard frogs

USGS Publications Warehouse

Petrovska-Delacrétaz, Dijana; Edwards, Aaron; Chiasson, John; Chollet, Gérard; Pilliod, David S.

2014-01-01

Principal component analysis is used to implement a semi-automatic recognition system to identify recaptured northern leopard frogs (Lithobates pipiens). Results of both open set and closed set experiments are given. The presented algorithm is shown to provide accurate identification of 209 individual leopard frogs from a total set of 1386 images.
SABRE--A Novel Software Tool for Bibliographic Post-Processing.

ERIC Educational Resources Information Center

Burge, Cecil D.

1989-01-01

Describes the software architecture and application of SABRE (Semi-Automated Bibliographic Environment), which is one of the first products to provide a semi-automatic environment for relevancy ranking of citations obtained from searches of bibliographic databases. Features designed to meet the review, categorization, culling, and reporting needs…
COMP Superscalar, an interoperable programming framework

NASA Astrophysics Data System (ADS)

Badia, Rosa M.; Conejero, Javier; Diaz, Carlos; Ejarque, Jorge; Lezzi, Daniele; Lordan, Francesc; Ramon-Cortes, Cristian; Sirvent, Raul

2015-12-01

COMPSs is a programming framework that aims to facilitate the parallelization of existing applications written in Java, C/C++ and Python scripts. For that purpose, it offers a simple programming model based on sequential development in which the user is mainly responsible for (i) identifying the functions to be executed as asynchronous parallel tasks and (ii) annotating them with annotations or standard Python decorators. A runtime system is in charge of exploiting the inherent concurrency of the code, automatically detecting and enforcing the data dependencies between tasks and spawning these tasks to the available resources, which can be nodes in a cluster, clouds or grids. In cloud environments, COMPSs provides scalability and elasticity features allowing the dynamic provision of resources.
Semi-automatic 3D lung nodule segmentation in CT using dynamic programming

NASA Astrophysics Data System (ADS)

Sargent, Dustin; Park, Sun Young

2017-02-01

We present a method for semi-automatic segmentation of lung nodules in chest CT that can be extended to general lesion segmentation in multiple modalities. Most semi-automatic algorithms for lesion segmentation or similar tasks use region-growing or edge-based contour finding methods such as level-set. However, lung nodules and other lesions are often connected to surrounding tissues, which makes these algorithms prone to growing the nodule boundary into the surrounding tissue. To solve this problem, we apply a 3D extension of the 2D edge linking method with dynamic programming to find a closed surface in a spherical representation of the nodule ROI. The algorithm requires a user to draw a maximal diameter across the nodule in the slice in which the nodule cross section is the largest. We report the lesion volume estimation accuracy of our algorithm on the FDA lung phantom dataset, and the RECIST diameter estimation accuracy on the lung nodule dataset from the SPIE 2016 lung nodule classification challenge. The phantom results in particular demonstrate that our algorithm has the potential to mitigate the disparity in measurements performed by different radiologists on the same lesions, which could improve the accuracy of disease progression tracking.
Rotor Wake/Stator Interaction Noise Prediction Code Technical Documentation and User's Manual

NASA Technical Reports Server (NTRS)

Topol, David A.; Mathews, Douglas C.

2010-01-01

This report documents the improvements and enhancements made by Pratt & Whitney to two NASA programs which together will calculate noise from a rotor wake/stator interaction. The code is a combination of subroutines from two NASA programs with many new features added by Pratt & Whitney. To do a calculation V072 first uses a semi-empirical wake prediction to calculate the rotor wake characteristics at the stator leading edge. Results from the wake model are then automatically input into a rotor wake/stator interaction analytical noise prediction routine which calculates inlet aft sound power levels for the blade-passage-frequency tones and their harmonics, along with the complex radial mode amplitudes. The code allows for a noise calculation to be performed for a compressor rotor wake/stator interaction, a fan wake/FEGV interaction, or a fan wake/core stator interaction. This report is split into two parts, the first part discusses the technical documentation of the program as improved by Pratt & Whitney. The second part is a user's manual which describes how input files are created and how the code is run.
Overview of the INEX 2008 Book Track

NASA Astrophysics Data System (ADS)

Kazai, Gabriella; Doucet, Antoine; Landoni, Monica

This paper provides an overview of the INEX 2008 Book Track. Now in its second year, the track aimed at broadening its scope by investigating topics of interest in the fields of information retrieval, human computer interaction, digital libraries, and eBooks. The main topics of investigation were defined around challenges for supporting users in reading, searching, and navigating the full texts of digitized books. Based on these themes, four tasks were defined: 1) The Book Retrieval task aimed at comparing traditional and book-specific retrieval approaches, 2) the Page in Context task aimed at evaluating the value of focused retrieval approaches for searching books, 3) the Structure Extraction task aimed to test automatic techniques for deriving structure from OCR and layout information, and 4) the Active Reading task aimed to explore suitable user interfaces for eBooks enabling reading, annotation, review, and summary across multiple books. We report on the setup and results of each of these tasks.
Interaction of a Vocabulary Quiz with Cognitive Instructional Strategies in First Language Learning among Japanese Undergraduates

ERIC Educational Resources Information Center

Tomita, Kei

2016-01-01

In response to concerns regarding effects of hyperlinked annotation on reading comprehension, this study was undertaken to compare hyperlinked annotation with student highlighting of unknown/difficult words. An online highlighting tool was used to help students reflect their prior vocabulary in a hyperlink-based annotated passage. Highlighting…
Automatic multi-label annotation of abdominal CT images using CBIR

NASA Astrophysics Data System (ADS)

Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.

2017-03-01

We present a technique to annotate multiple organs shown in 2-D abdominal/pelvic CT images using CBIR. This annotation task is motivated by our research interests in visual question-answering (VQA). We aim to apply results from this effort in Open-iSM, a multimodal biomedical search engine developed by the National Library of Medicine (NLM). Understanding visual content of biomedical images is a necessary step for VQA. Though sufficient annotational information about an image may be available in related textual metadata, not all may be useful as descriptive tags, particularly for anatomy on the image. In this paper, we develop and evaluate a multi-label image annotation method using CBIR. We evaluate our method on two 2-D CT image datasets we generated from 3-D volumetric data obtained from a multi-organ segmentation challenge hosted in MICCAI 2015. Shape and spatial layout information is used to encode visual characteristics of the anatomy. We adapt a weighted voting scheme to assign multiple labels to the query image by combining the labels of the images identified as similar by the method. Key parameters that may affect the annotation performance, such as the number of images used in the label voting and the threshold for excluding labels that have low weights, are studied. The method proposes a coarse-to-fine retrieval strategy which integrates the classification with the nearest-neighbor search. Results from our evaluation (using the MICCAI CT image datasets as well as figures from Open-i) are presented.
Instance-Based Question Answering

DTIC Science & Technology

2006-12-01

answer clustering, composition, and scoring. Moreover, with the effort dedicated to improving monolingual system performance, system parameters are...text collections: document type, manual or automatic annotations (if any), and stylistic and notational differences in technical terms. Monolingual ...forum in which cross language retrieval systems and question answering systems are tested for various Eu- ropean languages. The CLEF QA monolingual task
Automatically monitoring driftwood in large rivers: preliminary results

NASA Astrophysics Data System (ADS)

Piegay, H.; Lemaire, P.; MacVicar, B.; Mouquet-Noppe, C.; Tougne, L.

2014-12-01

Driftwood in rivers impact sediment transport, riverine habitat and human infrastructures. Quantifying it, in particular large woods on fairly large rivers where it can move easily, would allow us to improve our knowledge on fluvial transport processes. There are several means of studying this phenomenon, amongst which RFID sensors tracking, photo and video monitoring. In this abstract, we are interested in the latter, being easier and cheaper to deploy. However, video monitoring of driftwood generates a huge amount of images and manually labeling it is tedious. It is essential to automate such a monitoring process, which is a difficult task in the field of computer vision, and more specifically automatic video analysis. Detecting foreground into dynamic background remains an open problem to date. We installed a video camera at the riverside of a gauging station on the Ain River, a 3500 km² Piedmont River in France. Several floods were manually annotated by a human operator. We developed software that automatically extracts and characterizes wood blocks within a video stream. This algorithm is based upon a statistical model and combines static, dynamic and spatial data. Segmented wood objects are further described with the help of a skeleton-based approach that helps us to automatically determine its shape, diameter and length. The first detailed comparisons between manual annotations and automatically extracted data show that we can fairly well detect large wood until a given size (approximately 120 cm in length or 15 cm in diameter) whereas smaller ones are difficult to detect and tend to be missed by either the human operator, either the algorithm. Detection is fairly accurate in high flow conditions where the water channel is usually brown because of suspended sediment transport. In low flow context, our algorithm still needs improvement to reduce the number of false positive so as to better distinguish shadow or turbulence structures from wood pieces.

Prokaryotic Contig Annotation Pipeline Server: Web Application for a Prokaryotic Genome Annotation Pipeline Based on the Shiny App Package.

PubMed

Park, Byeonghyeok; Baek, Min-Jeong; Min, Byoungnam; Choi, In-Geol

2017-09-01

Genome annotation is a primary step in genomic research. To establish a light and portable prokaryotic genome annotation pipeline for use in individual laboratories, we developed a Shiny app package designated as "P-CAPS" (Prokaryotic Contig Annotation Pipeline Server). The package is composed of R and Python scripts that integrate publicly available annotation programs into a server application. P-CAPS is not only a browser-based interactive application but also a distributable Shiny app package that can be installed on any personal computer. The final annotation is provided in various standard formats and is summarized in an R markdown document. Annotation can be visualized and examined with a public genome browser. A benchmark test showed that the annotation quality and completeness of P-CAPS were reliable and compatible with those of currently available public pipelines.
10 CFR Appendix J2 to Subpart B of... - Uniform Test Method for Measuring the Energy Consumption of Automatic and Semi-Automatic Clothes...

Code of Federal Regulations, 2014 CFR

2014-01-01

... hardness or less) using 27.0 grams + 4.0 grams per pound of cloth load of AHAM Standard detergent Formula 3... repellent finishes, such as fluoropolymer stain resistant finishes shall not be applied to the test cloth...
The Use of Opto-Electronics in Viscometry.

ERIC Educational Resources Information Center

Mazza, R. J.; Washbourn, D. H.

1982-01-01

Describes a semi-automatic viscometer which incorporates a microprocessor system and uses optoelectronics to detect flow of liquid through the capillary, flow time being displayed on a timer with accuracy of 0.01 second. The system could be made fully automatic with an additional microprocessor circuit and inclusion of a pump. (Author/JN)
Integration of wireless sensor networks into automatic irrigation scheduling of a center pivot

USDA-ARS?s Scientific Manuscript database

A six-span center pivot system was used as a platform for testing two wireless sensor networks (WSN) of infrared thermometers. The cropped field was a semi-circle, divided into six pie shaped sections of which three were irrigated manually and three were irrigated automatically based on the time tem...
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

PubMed

Sarker, Abeed; Gonzalez, Graciela

2015-02-01

Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

PubMed Central

Gonzalez, Graciela

2014-01-01

Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Automatic categorization of diverse experimental information in the bioscience literature

PubMed Central

2012-01-01

Background Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. Results We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Conclusions Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort. PMID:22280404
Automatic categorization of diverse experimental information in the bioscience literature.

PubMed

Fang, Ruihua; Schindelman, Gary; Van Auken, Kimberly; Fernandes, Jolene; Chen, Wen; Wang, Xiaodong; Davis, Paul; Tuli, Mary Ann; Marygold, Steven J; Millburn, Gillian; Matthews, Beverley; Zhang, Haiyan; Brown, Nick; Gelbart, William M; Sternberg, Paul W

2012-01-26

Curation of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance. We successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction. Our methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.
CRISPRDetect: A flexible algorithm to define CRISPR arrays.

PubMed

Biswas, Ambarish; Staals, Raymond H J; Morales, Sergio E; Fineran, Peter C; Brown, Chris M

2016-05-17

CRISPR (clustered regularly interspaced short palindromic repeats) RNAs provide the specificity for noncoding RNA-guided adaptive immune defence systems in prokaryotes. CRISPR arrays consist of repeat sequences separated by specific spacer sequences. CRISPR arrays have previously been identified in a large proportion of prokaryotic genomes. However, currently available detection algorithms do not utilise recently discovered features regarding CRISPR loci. We have developed a new approach to automatically detect, predict and interactively refine CRISPR arrays. It is available as a web program and command line from bioanalysis.otago.ac.nz/CRISPRDetect. CRISPRDetect discovers putative arrays, extends the array by detecting additional variant repeats, corrects the direction of arrays, refines the repeat/spacer boundaries, and annotates different types of sequence variations (e.g. insertion/deletion) in near identical repeats. Due to these features, CRISPRDetect has significant advantages when compared to existing identification tools. As well as further support for small medium and large repeats, CRISPRDetect identified a class of arrays with 'extra-large' repeats in bacteria (repeats 44-50 nt). The CRISPRDetect output is integrated with other analysis tools. Notably, the predicted spacers can be directly utilised by CRISPRTarget to predict targets. CRISPRDetect enables more accurate detection of arrays and spacers and its gff output is suitable for inclusion in genome annotation pipelines and visualisation. It has been used to analyse all complete bacterial and archaeal reference genomes.
A conceptual study of automatic and semi-automatic quality assurance techniques for round image processing

NASA Technical Reports Server (NTRS)

1983-01-01

This report summarizes the results of a study conducted by Engineering and Economics Research (EER), Inc. under NASA Contract Number NAS5-27513. The study involved the development of preliminary concepts for automatic and semiautomatic quality assurance (QA) techniques for ground image processing. A distinction is made between quality assessment and the more comprehensive quality assurance which includes decision making and system feedback control in response to quality assessment.
GANESH: software for customized annotation of genome regions.

PubMed

Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek

2003-09-01

GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching and genome-analysis packages. The results are stored in compressed form in a relational database, and are updated automatically on a regular schedule so that they are always immediately available in their most up-to-date versions. A Java front-end, executed as a stand alone application or web applet, provides a graphical interface for navigating the database and for viewing the annotations. There are facilities for importing and exporting data in the format of the Distributed Annotation System (DAS), enabling a GANESH database to be used as a component of a DAS configuration. The system has been used to construct databases for about a dozen regions of human chromosomes and for three regions of mouse chromosomes.
Comparison of computer systems and ranking criteria for automatic melanoma detection in dermoscopic images.

PubMed

Møllersen, Kajsa; Zortea, Maciel; Schopf, Thomas R; Kirchesch, Herbert; Godtliebsen, Fred

2017-01-01

Melanoma is the deadliest form of skin cancer, and early detection is crucial for patient survival. Computer systems can assist in melanoma detection, but are not widespread in clinical practice. In 2016, an open challenge in classification of dermoscopic images of skin lesions was announced. A training set of 900 images with corresponding class labels and semi-automatic/manual segmentation masks was released for the challenge. An independent test set of 379 images, of which 75 were of melanomas, was used to rank the participants. This article demonstrates the impact of ranking criteria, segmentation method and classifier, and highlights the clinical perspective. We compare five different measures for diagnostic accuracy by analysing the resulting ranking of the computer systems in the challenge. Choice of performance measure had great impact on the ranking. Systems that were ranked among the top three for one measure, dropped to the bottom half when changing performance measure. Nevus Doctor, a computer system previously developed by the authors, was used to participate in the challenge, and investigate the impact of segmentation and classifier. The diagnostic accuracy when using an automatic versus the semi-automatic/manual segmentation is investigated. The unexpected small impact of segmentation method suggests that improvements of the automatic segmentation method w.r.t. resemblance to semi-automatic/manual segmentation will not improve diagnostic accuracy substantially. A small set of similar classification algorithms are used to investigate the impact of classifier on the diagnostic accuracy. The variability in diagnostic accuracy for different classifier algorithms was larger than the variability for segmentation methods, and suggests a focus for future investigations. From a clinical perspective, the misclassification of a melanoma as benign has far greater cost than the misclassification of a benign lesion. For computer systems to have clinical impact, their performance should be ranked by a high-sensitivity measure.
A methodology for the semi-automatic digital image analysis of fragmental impactites

NASA Astrophysics Data System (ADS)

Chanou, A.; Osinski, G. R.; Grieve, R. A. F.

2014-04-01

A semi-automated digital image analysis method is developed for the comparative textural study of impact melt-bearing breccias. This method uses the freeware software ImageJ developed by the National Institute of Health (NIH). Digital image analysis is performed on scans of hand samples (10-15 cm across), based on macroscopic interpretations of the rock components. All image processing and segmentation are done semi-automatically, with the least possible manual intervention. The areal fraction of components is estimated and modal abundances can be deduced, where the physical optical properties (e.g., contrast, color) of the samples allow it. Other parameters that can be measured include, for example, clast size, clast-preferred orientations, average box-counting dimension or fragment shape complexity, and nearest neighbor distances (NnD). This semi-automated method allows the analysis of a larger number of samples in a relatively short time. Textures, granulometry, and shape descriptors are of considerable importance in rock characterization. The methodology is used to determine the variations of the physical characteristics of some examples of fragmental impactites.
Methods for automatic detection of artifacts in microelectrode recordings.

PubMed

Bakštein, Eduard; Sieger, Tomáš; Wild, Jiří; Novák, Daniel; Schneider, Jakub; Vostatek, Pavel; Urgošík, Dušan; Jech, Robert

2017-10-01

Extracellular microelectrode recording (MER) is a prominent technique for studies of extracellular single-unit neuronal activity. In order to achieve robust results in more complex analysis pipelines, it is necessary to have high quality input data with a low amount of artifacts. We show that noise (mainly electromagnetic interference and motion artifacts) may affect more than 25% of the recording length in a clinical MER database. We present several methods for automatic detection of noise in MER signals, based on (i) unsupervised detection of stationary segments, (ii) large peaks in the power spectral density, and (iii) a classifier based on multiple time- and frequency-domain features. We evaluate the proposed methods on a manually annotated database of 5735 ten-second MER signals from 58 Parkinson's disease patients. The existing methods for artifact detection in single-channel MER that have been rigorously tested, are based on unsupervised change-point detection. We show on an extensive real MER database that the presented techniques are better suited for the task of artifact identification and achieve much better results. The best-performing classifiers (bagging and decision tree) achieved artifact classification accuracy of up to 89% on an unseen test set and outperformed the unsupervised techniques by 5-10%. This was close to the level of agreement among raters using manual annotation (93.5%). We conclude that the proposed methods are suitable for automatic MER denoising and may help in the efficient elimination of undesirable signal artifacts. Copyright © 2017 Elsevier B.V. All rights reserved.
Toward knowledge-enhanced viewing using encyclopedias and model-based segmentation

NASA Astrophysics Data System (ADS)

Kneser, Reinhard; Lehmann, Helko; Geller, Dieter; Qian, Yue-Chen; Weese, Jürgen

2009-02-01

To make accurate decisions based on imaging data, radiologists must associate the viewed imaging data with the corresponding anatomical structures. Furthermore, given a disease hypothesis possible image findings which verify the hypothesis must be considered and where and how they are expressed in the viewed images. If rare anatomical variants, rare pathologies, unfamiliar protocols, or ambiguous findings are present, external knowledge sources such as medical encyclopedias are consulted. These sources are accessed using keywords typically describing anatomical structures, image findings, pathologies. In this paper we present our vision of how a patient's imaging data can be automatically enhanced with anatomical knowledge as well as knowledge about image findings. On one hand, we propose the automatic annotation of the images with labels from a standard anatomical ontology. These labels are used as keywords for a medical encyclopedia such as STATdx to access anatomical descriptions, information about pathologies and image findings. On the other hand we envision encyclopedias to contain links to region- and finding-specific image processing algorithms. Then a finding is evaluated on an image by applying the respective algorithm in the associated anatomical region. Towards realization of our vision, we present our method and results of automatic annotation of anatomical structures in 3D MRI brain images. Thereby we develop a complex surface mesh model incorporating major structures of the brain and a model-based segmentation method. We demonstrate the validity by analyzing the results of several training and segmentation experiments with clinical data focusing particularly on the visual pathway.
Temporal Cyber Attack Detection.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ingram, Joey Burton; Draelos, Timothy J.; Galiardi, Meghan

Rigorous characterization of the performance and generalization ability of cyber defense systems is extremely difficult, making it hard to gauge uncertainty, and thus, confidence. This difficulty largely stems from a lack of labeled attack data that fully explores the potential adversarial space. Currently, performance of cyber defense systems is typically evaluated in a qualitative manner by manually inspecting the results of the system on live data and adjusting as needed. Additionally, machine learning has shown promise in deriving models that automatically learn indicators of compromise that are more robust than analyst-derived detectors. However, to generate these models, most algorithms requiremore » large amounts of labeled data (i.e., examples of attacks). Algorithms that do not require annotated data to derive models are similarly at a disadvantage, because labeled data is still necessary when evaluating performance. In this work, we explore the use of temporal generative models to learn cyber attack graph representations and automatically generate data for experimentation and evaluation. Training and evaluating cyber systems and machine learning models requires significant, annotated data, which is typically collected and labeled by hand for one-off experiments. Automatically generating such data helps derive/evaluate detection models and ensures reproducibility of results. Experimentally, we demonstrate the efficacy of generative sequence analysis techniques on learning the structure of attack graphs, based on a realistic example. These derived models can then be used to generate more data. Additionally, we provide a roadmap for future research efforts in this area.« less
Semi-automated quantitative Drosophila wings measurements.

PubMed

Loh, Sheng Yang Michael; Ogawa, Yoshitaka; Kawana, Sara; Tamura, Koichiro; Lee, Hwee Kuan

2017-06-28

Drosophila melanogaster is an important organism used in many fields of biological research such as genetics and developmental biology. Drosophila wings have been widely used to study the genetics of development, morphometrics and evolution. Therefore there is much interest in quantifying wing structures of Drosophila. Advancement in technology has increased the ease in which images of Drosophila can be acquired. However such studies have been limited by the slow and tedious process of acquiring phenotypic data. We have developed a system that automatically detects and measures key points and vein segments on a Drosophila wing. Key points are detected by performing image transformations and template matching on Drosophila wing images while vein segments are detected using an Active Contour algorithm. The accuracy of our key point detection was compared against key point annotations of users. We also performed key point detection using different training data sets of Drosophila wing images. We compared our software with an existing automated image analysis system for Drosophila wings and showed that our system performs better than the state of the art. Vein segments were manually measured and compared against the measurements obtained from our system. Our system was able to detect specific key points and vein segments from Drosophila wing images with high accuracy.
PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach

PubMed Central

Liu, Xiaofeng; Ouyang, Sisheng; Yu, Biao; Liu, Yabo; Huang, Kai; Gong, Jiayu; Zheng, Siyuan; Li, Zhihua; Li, Honglin; Jiang, Hualiang

2010-01-01

In silico drug target identification, which includes many distinct algorithms for finding disease genes and proteins, is the first step in the drug discovery pipeline. When the 3D structures of the targets are available, the problem of target identification is usually converted to finding the best interaction mode between the potential target candidates and small molecule probes. Pharmacophore, which is the spatial arrangement of features essential for a molecule to interact with a specific target receptor, is an alternative method for achieving this goal apart from molecular docking method. PharmMapper server is a freely accessed web server designed to identify potential target candidates for the given small molecules (drugs, natural products or other newly discovered compounds with unidentified binding targets) using pharmacophore mapping approach. PharmMapper hosts a large, in-house repertoire of pharmacophore database (namely PharmTargetDB) annotated from all the targets information in TargetBank, BindingDB, DrugBank and potential drug target database, including over 7000 receptor-based pharmacophore models (covering over 1500 drug targets information). PharmMapper automatically finds the best mapping poses of the query molecule against all the pharmacophore models in PharmTargetDB and lists the top N best-fitted hits with appropriate target annotations, as well as respective molecule’s aligned poses are presented. Benefited from the highly efficient and robust triangle hashing mapping method, PharmMapper bears high throughput ability and only costs 1 h averagely to screen the whole PharmTargetDB. The protocol was successful in finding the proper targets among the top 300 pharmacophore candidates in the retrospective benchmarking test of tamoxifen. PharmMapper is available at http://59.78.96.61/pharmmapper. PMID:20430828
MICRA: an automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data.

PubMed

Caboche, Ségolène; Even, Gaël; Loywick, Alexandre; Audebert, Christophe; Hot, David

2017-12-19

The increase in available sequence data has advanced the field of microbiology; however, making sense of these data without bioinformatics skills is still problematic. We describe MICRA, an automatic pipeline, available as a web interface, for microbial identification and characterization through reads analysis. MICRA uses iterative mapping against reference genomes to identify genes and variations. Additional modules allow prediction of antibiotic susceptibility and resistance and comparing the results of several samples. MICRA is fast, producing few false-positive annotations and variant calls compared to current methods, making it a tool of great interest for fully exploiting sequencing data.
VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

PubMed

Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M

2012-04-05

The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

PubMed Central

2012-01-01

Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257
INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

PubMed Central

Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.

2013-01-01

Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID:24324765
INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles.

PubMed

Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B

2013-01-01

The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.
The coffee genome hub: a resource for coffee genomes

PubMed Central

Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan

2015-01-01

The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
The Universal Protein Resource (UniProt): an expanding universe of protein information.

PubMed

Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

2006-01-01

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
Neurocarta: aggregating and sharing disease-gene relations for the neurosciences.

PubMed

Portales-Casamar, Elodie; Ch'ng, Carolyn; Lui, Frances; St-Georges, Nicolas; Zoubarev, Anton; Lai, Artemis Y; Lee, Mark; Kwok, Cathy; Kwok, Willie; Tseng, Luchia; Pavlidis, Paul

2013-02-26

Understanding the genetic basis of diseases is key to the development of better diagnoses and treatments. Unfortunately, only a small fraction of the existing data linking genes to phenotypes is available through online public resources and, when available, it is scattered across multiple access tools. Neurocarta is a knowledgebase that consolidates information on genes and phenotypes across multiple resources and allows tracking and exploring of the associations. The system enables automatic and manual curation of evidence supporting each association, as well as user-enabled entry of their own annotations. Phenotypes are recorded using controlled vocabularies such as the Disease Ontology to facilitate computational inference and linking to external data sources. The gene-to-phenotype associations are filtered by stringent criteria to focus on the annotations most likely to be relevant. Neurocarta is constantly growing and currently holds more than 30,000 lines of evidence linking over 7,000 genes to 2,000 different phenotypes. Neurocarta is a one-stop shop for researchers looking for candidate genes for any disorder of interest. In Neurocarta, they can review the evidence linking genes to phenotypes and filter out the evidence they're not interested in. In addition, researchers can enter their own annotations from their experiments and analyze them in the context of existing public annotations. Neurocarta's in-depth annotation of neurodevelopmental disorders makes it a unique resource for neuroscientists working on brain development.
Active subnetwork recovery with a mechanism-dependent scoring function; with application to angiogenesis and organogenesis studies

PubMed Central

2013-01-01

Background The learning active subnetworks problem involves finding subnetworks of a bio-molecular network that are active in a particular condition. Many approaches integrate observation data (e.g., gene expression) with the network topology to find candidate subnetworks. Increasingly, pathway databases contain additional annotation information that can be mined to improve prediction accuracy, e.g., interaction mechanism (e.g., transcription, microRNA, cleavage) annotations. We introduce a mechanism-based approach to active subnetwork recovery which exploits such annotations. We suggest that neighboring interactions in a network tend to be co-activated in a way that depends on the “correlation” of their mechanism annotations. e.g., neighboring phosphorylation and de-phosphorylation interactions may be more likely to be co-activated than neighboring phosphorylation and covalent bonding interactions. Results Our method iteratively learns the mechanism correlations and finds the most likely active subnetwork. We use a probabilistic graphical model with a Markov Random Field component which creates dependencies between the states (active or non-active) of neighboring interactions, that incorporates a mechanism-based component to the function. We apply a heuristic-based EM-based algorithm suitable for the problem. We validated our method’s performance using simulated data in networks downloaded from GeneGO against the same approach without the mechanism-based component, and two other existing methods. We validated our methods performance in correctly recovering (1) the true interaction states, and (2) global network properties of the original network against these other methods. We applied our method to networks generated from time-course gene expression studies in angiogenesis and lung organogenesis and validated the findings from a biological perspective against current literature. Conclusions The advantage of our mechanism-based approach is best seen in networks composed of connected regions with a large number of interactions annotated with a subset of mechanisms, e.g., a regulatory region of transcription interactions, or a cleavage cascade region. When applied to real datasets, our method recovered novel and biologically meaningful putative interactions, e.g., interactions from an integrin signaling pathway using the angiogenesis dataset, and a group of regulatory microRNA interactions in an organogenesis network. PMID:23432934
ANALYTiC: An Active Learning System for Trajectory Classification.

PubMed

Soares Junior, Amilcar; Renso, Chiara; Matwin, Stan

2017-01-01

The increasing availability and use of positioning devices has resulted in large volumes of trajectory data. However, semantic annotations for such data are typically added by domain experts, which is a time-consuming task. Machine-learning algorithms can help infer semantic annotations from trajectory data by learning from sets of labeled data. Specifically, active learning approaches can minimize the set of trajectories to be annotated while preserving good performance measures. The ANALYTiC web-based interactive tool visually guides users through this annotation process.
Modeling semantic aspects for cross-media image indexing.

PubMed

Monay, Florent; Gatica-Perez, Daniel

2007-10-01

To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indices for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis model (PLSA) for annotated images, and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard EM algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard dataset is presented, and a detailed evaluation of the performance shows the validity of our framework.
ODMSummary: A Tool for Automatic Structured Comparison of Multiple Medical Forms Based on Semantic Annotation with the Unified Medical Language System.

PubMed

Storck, Michael; Krumm, Rainer; Dugas, Martin

2016-01-01

Medical documentation is applied in various settings including patient care and clinical research. Since procedures of medical documentation are heterogeneous and developed further, secondary use of medical data is complicated. Development of medical forms, merging of data from different sources and meta-analyses of different data sets are currently a predominantly manual process and therefore difficult and cumbersome. Available applications to automate these processes are limited. In particular, tools to compare multiple documentation forms are missing. The objective of this work is to design, implement and evaluate the new system ODMSummary for comparison of multiple forms with a high number of semantically annotated data elements and a high level of usability. System requirements are the capability to summarize and compare a set of forms, enable to estimate the documentation effort, track changes in different versions of forms and find comparable items in different forms. Forms are provided in Operational Data Model format with semantic annotations from the Unified Medical Language System. 12 medical experts were invited to participate in a 3-phase evaluation of the tool regarding usability. ODMSummary (available at https://odmtoolbox.uni-muenster.de/summary/summary.html) provides a structured overview of multiple forms and their documentation fields. This comparison enables medical experts to assess multiple forms or whole datasets for secondary use. System usability was optimized based on expert feedback. The evaluation demonstrates that feedback from domain experts is needed to identify usability issues. In conclusion, this work shows that automatic comparison of multiple forms is feasible and the results are usable for medical experts.
Measuring Patient Mobility in the ICU Using a Novel Noninvasive Sensor

PubMed Central

Ma, Andy J.; Rawat, Nishi; Reiter, Austin; Shrock, Christine; Zhan, Andong; Stone, Alex; Rabiee, Anahita; Griffin, Stephanie; Needham, Dale M.; Saria, Suchi

2017-01-01

Objectives To develop and validate a noninvasive mobility sensor to automatically and continuously detect and measure patient mobility in the ICU. Design Prospective, observational study. Setting Surgical ICU at an academic hospital. Patients Three hundred sixty-two hours of sensor color and depth image data were recorded and curated into 109 segments, each containing 1,000 images, from eight patients. Interventions None. Measurements and Main Results Three Microsoft Kinect sensors (Microsoft, Beijing, China) were deployed in one ICU room to collect continuous patient mobility data. We developed software that automatically analyzes the sensor data to measure mobility and assign the highest level within a time period. To characterize the highest mobility level, a validated 11-point mobility scale was collapsed into four categories: nothing in bed, in-bed activity, out-of-bed activity, and walking. Of the 109 sensor segments, the noninvasive mobility sensor was developed using 26 of these from three ICU patients and validated on 83 remaining segments from five different patients. Three physicians annotated each segment for the highest mobility level. The weighted Kappa (κ) statistic for agreement between automated noninvasive mobility sensor output versus manual physician annotation was 0.86 (95% CI, 0.72–1.00). Disagreement primarily occurred in the “nothing in bed” versus “in-bed activity” categories because “the sensor assessed movement continuously,” which was significantly more sensitive to motion than physician annotations using a discrete manual scale. Conclusions Noninvasive mobility sensor is a novel and feasible method for automating evaluation of ICU patient mobility. PMID:28291092
Phenotyping for patient safety: algorithm development for electronic health record based automated adverse event and medical error detection in neonatal intensive care.

PubMed

Li, Qi; Melton, Kristin; Lingren, Todd; Kirkendall, Eric S; Hall, Eric; Zhai, Haijun; Ni, Yizhao; Kaiser, Megan; Stoutenborough, Laura; Solti, Imre

2014-01-01

Although electronic health records (EHRs) have the potential to provide a foundation for quality and safety algorithms, few studies have measured their impact on automated adverse event (AE) and medical error (ME) detection within the neonatal intensive care unit (NICU) environment. This paper presents two phenotyping AE and ME detection algorithms (ie, IV infiltrations, narcotic medication oversedation and dosing errors) and describes manual annotation of airway management and medication/fluid AEs from NICU EHRs. From 753 NICU patient EHRs from 2011, we developed two automatic AE/ME detection algorithms, and manually annotated 11 classes of AEs in 3263 clinical notes. Performance of the automatic AE/ME detection algorithms was compared to trigger tool and voluntary incident reporting results. AEs in clinical notes were double annotated and consensus achieved under neonatologist supervision. Sensitivity, positive predictive value (PPV), and specificity are reported. Twelve severe IV infiltrates were detected. The algorithm identified one more infiltrate than the trigger tool and eight more than incident reporting. One narcotic oversedation was detected demonstrating 100% agreement with the trigger tool. Additionally, 17 narcotic medication MEs were detected, an increase of 16 cases over voluntary incident reporting. Automated AE/ME detection algorithms provide higher sensitivity and PPV than currently used trigger tools or voluntary incident-reporting systems, including identification of potential dosing and frequency errors that current methods are unequipped to detect. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Automatic pelvis segmentation from x-ray images of a mouse model

NASA Astrophysics Data System (ADS)

Al Okashi, Omar M.; Du, Hongbo; Al-Assam, Hisham

2017-05-01

The automatic detection and quantification of skeletal structures has a variety of different applications for biological research. Accurate segmentation of the pelvis from X-ray images of mice in a high-throughput project such as the Mouse Genomes Project not only saves time and cost but also helps achieving an unbiased quantitative analysis within the phenotyping pipeline. This paper proposes an automatic solution for pelvis segmentation based on structural and orientation properties of the pelvis in X-ray images. The solution consists of three stages including pre-processing image to extract pelvis area, initial pelvis mask preparation and final pelvis segmentation. Experimental results on a set of 100 X-ray images showed consistent performance of the algorithm. The automated solution overcomes the weaknesses of a manual annotation procedure where intra- and inter-observer variations cannot be avoided.
Supporting Semantic Annotation of Educational Content by Automatic Extraction of Hierarchical Domain Relationships

ERIC Educational Resources Information Center

Vrablecová, Petra; Šimko, Marián

2016-01-01

The domain model is an essential part of an adaptive learning system. For each educational course, it involves educational content and semantics, which is also viewed as a form of conceptual metadata about educational content. Due to the size of a domain model, manual domain model creation is a challenging and demanding task for teachers or…
Datasets of Odontocete Sounds Annotated for Developing Automatic Detection Methods

DTIC Science & Technology

2010-12-01

Passive acoustic detection of Minke whales (Balaenoptera acutorostrata) off the West Coast of Kauai, HI. Book of abstracts, Fourth International...Workshop on Detection , Classification and Localization of Marine Mammals using Passive Acoustics , Pavia, Italy, Sept. 10- 13, 2009, p. 57. Roch, M., Y...Mellinger, and D. Gillespie. 2010. Comparison of beaked whale detection algorithms. Applied Acoustics 71:1043-1049. 8 References
Information Retrieval and Text Mining Technologies for Chemistry.

PubMed

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Zone analysis in biology articles as a basis for information extraction.

PubMed

Mizuta, Yoko; Korhonen, Anna; Mullen, Tony; Collier, Nigel

2006-06-01

In the field of biomedicine, an overwhelming amount of experimental data has become available as a result of the high throughput of research in this domain. The amount of results reported has now grown beyond the limits of what can be managed by manual means. This makes it increasingly difficult for the researchers in this area to keep up with the latest developments. Information extraction (IE) in the biological domain aims to provide an effective automatic means to dynamically manage the information contained in archived journal articles and abstract collections and thus help researchers in their work. However, while considerable advances have been made in certain areas of IE, pinpointing and organizing factual information (such as experimental results) remains a challenge. In this paper we propose tackling this task by incorporating into IE information about rhetorical zones, i.e. classification of spans of text in terms of argumentation and intellectual attribution. As the first step towards this goal, we introduce a scheme for annotating biological texts for rhetorical zones and provide a qualitative and quantitative analysis of the data annotated according to this scheme. We also discuss our preliminary research on automatic zone analysis, and its incorporation into our IE framework.
Effectiveness of image features and similarity measures in cluster-based approaches for content-based image retrieval

NASA Astrophysics Data System (ADS)

Du, Hongbo; Al-Jubouri, Hanan; Sellahewa, Harin

2014-05-01

Content-based image retrieval is an automatic process of retrieving images according to image visual contents instead of textual annotations. It has many areas of application from automatic image annotation and archive, image classification and categorization to homeland security and law enforcement. The key issues affecting the performance of such retrieval systems include sensible image features that can effectively capture the right amount of visual contents and suitable similarity measures to find similar and relevant images ranked in a meaningful order. Many different approaches, methods and techniques have been developed as a result of very intensive research in the past two decades. Among many existing approaches, is a cluster-based approach where clustering methods are used to group local feature descriptors into homogeneous regions, and search is conducted by comparing the regions of the query image against those of the stored images. This paper serves as a review of works in this area. The paper will first summarize the existing work reported in the literature and then present the authors' own investigations in this field. The paper intends to highlight not only achievements made by recent research but also challenges and difficulties still remaining in this area.
SURE reliability analysis: Program and mathematics

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; White, Allan L.

1988-01-01

The SURE program is a new reliability analysis tool for ultrareliable computer system architectures. The computational methods on which the program is based provide an efficient means for computing accurate upper and lower bounds for the death state probabilities of a large class of semi-Markov models. Once a semi-Markov model is described using a simple input language, the SURE program automatically computes the upper and lower bounds on the probability of system failure. A parameter of the model can be specified as a variable over a range of values directing the SURE program to perform a sensitivity analysis automatically. This feature, along with the speed of the program, makes it especially useful as a design tool.
The web server of IBM's Bioinformatics and Pattern Discovery group.

PubMed

Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo

2003-07-01

We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.

The web server of IBM's Bioinformatics and Pattern Discovery group

PubMed Central

Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo

2003-01-01

We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12824385
DASMI: exchanging, annotating and assessing molecular interaction data.

PubMed

Blankenburg, Hagen; Finn, Robert D; Prlić, Andreas; Jenkinson, Andrew M; Ramírez, Fidel; Emig, Dorothea; Schelhorn, Sven-Eric; Büch, Joachim; Lengauer, Thomas; Albrecht, Mario

2009-05-15

Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet. We introduce the DASMI system for the dynamic exchange, annotation and assessment of molecular interaction data. DASMI is based on the widely used Distributed Annotation System (DAS) and consists of a data exchange specification, web servers for providing the interaction data and clients for data integration and visualization. The decentralized architecture of DASMI affords the online retrieval of the most recent data from distributed sources and databases. DASMI can also be extended easily by adding new data sources and clients. We describe all DASMI components and demonstrate their use for protein and domain interactions. The DASMI tools are available at http://www.dasmi.de/ and http://ipfam.sanger.ac.uk/graph. The DAS registry and the DAS 1.53E specification is found at http://www.dasregistry.org/.
NCBI disease corpus: a resource for disease name recognition and concept normalization.

PubMed

Doğan, Rezarta Islamaj; Leaman, Robert; Lu, Zhiyong

2014-02-01

Information encoded in natural language in biomedical literature publications is only useful if efficient and reliable ways of accessing and analyzing that information are available. Natural language processing and text mining tools are therefore essential for extracting valuable information, however, the development of powerful, highly effective tools to automatically detect central biomedical concepts such as diseases is conditional on the availability of annotated corpora. This paper presents the disease name and concept annotations of the NCBI disease corpus, a collection of 793 PubMed abstracts fully annotated at the mention and concept level to serve as a research resource for the biomedical natural language processing community. Each PubMed abstract was manually annotated by two annotators with disease mentions and their corresponding concepts in Medical Subject Headings (MeSH®) or Online Mendelian Inheritance in Man (OMIM®). Manual curation was performed using PubTator, which allowed the use of pre-annotations as a pre-step to manual annotations. Fourteen annotators were randomly paired and differing annotations were discussed for reaching a consensus in two annotation phases. In this setting, a high inter-annotator agreement was observed. Finally, all results were checked against annotations of the rest of the corpus to assure corpus-wide consistency. The public release of the NCBI disease corpus contains 6892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier. We were able to link 91% of the mentions to a single disease concept, while the rest are described as a combination of concepts. In order to help researchers use the corpus to design and test disease identification methods, we have prepared the corpus as training, testing and development sets. To demonstrate its utility, we conducted a benchmarking experiment where we compared three different knowledge-based disease normalization methods with a best performance in F-measure of 63.7%. These results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks. The NCBI disease corpus, guidelines and other associated resources are available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/. Published by Elsevier Inc.
Functional annotation of regulatory pathways.

PubMed

Pandey, Jayesh; Koyutürk, Mehmet; Kim, Yohan; Szpankowski, Wojciech; Subramaniam, Shankar; Grama, Ananth

2007-07-01

Standardized annotations of biomolecules in interaction networks (e.g. Gene Ontology) provide comprehensive understanding of the function of individual molecules. Extending such annotations to pathways is a critical component of functional characterization of cellular signaling at the systems level. We propose a framework for projecting gene regulatory networks onto the space of functional attributes using multigraph models, with the objective of deriving statistically significant pathway annotations. We first demonstrate that annotations of pairwise interactions do not generalize to indirect relationships between processes. Motivated by this result, we formalize the problem of identifying statistically overrepresented pathways of functional attributes. We establish the hardness of this problem by demonstrating the non-monotonicity of common statistical significance measures. We propose a statistical model that emphasizes the modularity of a pathway, evaluating its significance based on the coupling of its building blocks. We complement the statistical model by an efficient algorithm and software, Narada, for computing significant pathways in large regulatory networks. Comprehensive results from our methods applied to the Escherichia coli transcription network demonstrate that our approach is effective in identifying known, as well as novel biological pathway annotations. Narada is implemented in Java and is available at http://www.cs.purdue.edu/homes/jpandey/narada/.
Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

PubMed

Armean, Irina M; Lilley, Kathryn S; Trotter, Matthew W B; Pilkington, Nicholas C V; Holden, Sean B

2018-06-01

Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. https://github.com/ima23/maxent-ppi. sbh11@cl.cam.ac.uk. Supplementary data are available at Bioinformatics online.
TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases

PubMed Central

Riadi, Gonzalo; Medina-Moenne, Cristobal; Holmes, David S.

2012-01-01

Transposases (Tnps) are enzymes that participate in the movement of insertion sequences (ISs) within and between genomes. Genes that encode Tnps are amongst the most abundant and widely distributed genes in nature. However, they are difficult to predict bioinformatically and given the increasing availability of prokaryotic genomes and metagenomes, it is incumbent to develop rapid, high quality automatic annotation of ISs. This need prompted us to develop a web service, termed TnpPred for Tnp discovery. It provides better sensitivity and specificity for Tnp predictions than given by currently available programs as determined by ROC analysis. TnpPred should be useful for improving genome annotation. The TnpPred web service is freely available for noncommercial use. PMID:23251097
Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization

PubMed Central

Bhatia, Vivek N.; Perlman, David H.; Costello, Catherine E.; McComb, Mark E.

2009-01-01

In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP) – a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP includes meta-information on the protein in addition to complimentary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger dataset. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie (biological process, cellular component and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large datasets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net. PMID:19839595
Groping for quantitative digital 3-D image analysis: an approach to quantitative fluorescence in situ hybridization in thick tissue sections of prostate carcinoma.

PubMed

Rodenacker, K; Aubele, M; Hutzler, P; Adiga, P S

1997-01-01

In molecular pathology numerical chromosome aberrations have been found to be decisive for the prognosis of malignancy in tumours. The existence of such aberrations can be detected by interphase fluorescence in situ hybridization (FISH). The gain or loss of certain base sequences in the desoxyribonucleic acid (DNA) can be estimated by counting the number of FISH signals per cell nucleus. The quantitative evaluation of such events is a necessary condition for a prospective use in diagnostic pathology. To avoid occlusions of signals, the cell nucleus has to be analyzed in three dimensions. Confocal laser scanning microscopy is the means to obtain series of optical thin sections from fluorescence stained or marked material to fulfill the conditions mentioned above. A graphical user interface (GUI) to a software package for display, inspection, count and (semi-)automatic analysis of 3-D images for pathologists is outlined including the underlying methods of 3-D image interaction and segmentation developed. The preparative methods are briefly described. Main emphasis is given to the methodical questions of computer-aided analysis of large 3-D image data sets for pathologists. Several automated analysis steps can be performed for segmentation and succeeding quantification. However tumour material is in contrast to isolated or cultured cells even for visual inspection, a difficult material. For the present a fully automated digital image analysis of 3-D data is not in sight. A semi-automatic segmentation method is thus presented here.
Automatic Calibration of a Semi-Distributed Hydrologic Model Using Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

Bekele, E. G.; Nicklow, J. W.

2005-12-01

Hydrologic simulation models need to be calibrated and validated before using them for operational predictions. Spatially-distributed hydrologic models generally have a large number of parameters to capture the various physical characteristics of a hydrologic system. Manual calibration of such models is a very tedious and daunting task, and its success depends on the subjective assessment of a particular modeler, which includes knowledge of the basic approaches and interactions in the model. In order to alleviate these shortcomings, an automatic calibration model, which employs an evolutionary optimization technique known as Particle Swarm Optimizer (PSO) for parameter estimation, is developed. PSO is a heuristic search algorithm that is inspired by social behavior of bird flocking or fish schooling. The newly-developed calibration model is integrated to the U.S. Department of Agriculture's Soil and Water Assessment Tool (SWAT). SWAT is a physically-based, semi-distributed hydrologic model that was developed to predict the long term impacts of land management practices on water, sediment and agricultural chemical yields in large complex watersheds with varying soils, land use, and management conditions. SWAT was calibrated for streamflow and sediment concentration. The calibration process involves parameter specification, whereby sensitive model parameters are identified, and parameter estimation. In order to reduce the number of parameters to be calibrated, parameterization was performed. The methodology is applied to a demonstration watershed known as Big Creek, which is located in southern Illinois. Application results show the effectiveness of the approach and model predictions are significantly improved.
20 CFR 220.133 - Skill requirements.

Code of Federal Regulations, 2011 CFR

2011-04-01

... needs little or no judgment to do simple duties that can be learned on the job in a short period of time... claimant can usually learn to do the job in 30 days, and little job training and judgment are needed. The... machines which are automatic or operated by others); or (4) Machine tending. (c) Semi-skilled work. Semi...
20 CFR 220.133 - Skill requirements.

Code of Federal Regulations, 2010 CFR

2010-04-01

... needs little or no judgment to do simple duties that can be learned on the job in a short period of time... claimant can usually learn to do the job in 30 days, and little job training and judgment are needed. The... machines which are automatic or operated by others); or (4) Machine tending. (c) Semi-skilled work. Semi...
20 CFR 220.133 - Skill requirements.

Code of Federal Regulations, 2014 CFR

2014-04-01

... needs little or no judgment to do simple duties that can be learned on the job in a short period of time... claimant can usually learn to do the job in 30 days, and little job training and judgment are needed. The... machines which are automatic or operated by others); or (4) Machine tending. (c) Semi-skilled work. Semi...
20 CFR 220.133 - Skill requirements.

Code of Federal Regulations, 2012 CFR

2012-04-01

... needs little or no judgment to do simple duties that can be learned on the job in a short period of time... claimant can usually learn to do the job in 30 days, and little job training and judgment are needed. The... machines which are automatic or operated by others); or (4) Machine tending. (c) Semi-skilled work. Semi...
20 CFR 220.133 - Skill requirements.

Code of Federal Regulations, 2013 CFR

2013-04-01

... needs little or no judgment to do simple duties that can be learned on the job in a short period of time... claimant can usually learn to do the job in 30 days, and little job training and judgment are needed. The... machines which are automatic or operated by others); or (4) Machine tending. (c) Semi-skilled work. Semi...
Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation.

PubMed

Song, Jingkuan; Gao, Lianli; Nie, Feiping; Shen, Heng Tao; Yan, Yan; Sebe, Nicu

2016-11-01

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometry-based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimized graph (OGL) from multi-cues (i.e., partial tags and multiple features), which can more accurately embed the relationships among the data points. Since OGL is a transductive method and cannot deal with novel data points, we further extend our model to address the out-of-sample issue. Extensive experiments on image and video annotation show the consistent superiority of OGL over the state-of-the-art methods.
A method for semi-automatic segmentation and evaluation of intracranial aneurysms in bone-subtraction computed tomography angiography (BSCTA) images

NASA Astrophysics Data System (ADS)

Krämer, Susanne; Ditt, Hendrik; Biermann, Christina; Lell, Michael; Keller, Jörg

2009-02-01

The rupture of an intracranial aneurysm has dramatic consequences for the patient. Hence early detection of unruptured aneurysms is of paramount importance. Bone-subtraction computed tomography angiography (BSCTA) has proven to be a powerful tool for detection of aneurysms in particular those located close to the skull base. Most aneurysms though are chance findings in BSCTA scans performed for other reasons. Therefore it is highly desirable to have techniques operating on standard BSCTA scans available which assist radiologists and surgeons in evaluation of intracranial aneurysms. In this paper we present a semi-automatic method for segmentation and assessment of intracranial aneurysms. The only user-interaction required is placement of a marker into the vascular malformation. Termination ensues automatically as soon as the segmentation reaches the vessels which feed the aneurysm. The algorithm is derived from an adaptive region-growing which employs a growth gradient as criterion for termination. Based on this segmentation values of high clinical and prognostic significance, such as volume, minimum and maximum diameter as well as surface of the aneurysm, are calculated automatically. the segmentation itself as well as the calculated diameters are visualised. Further segmentation of the adjoining vessels provides the means for visualisation of the topographical situation of vascular structures associated to the aneurysm. A stereolithographic mesh (STL) can be derived from the surface of the segmented volume. STL together with parameters like the resiliency of vascular wall tissue provide for an accurate wall model of the aneurysm and its associated vascular structures. Consequently the haemodynamic situation in the aneurysm itself and close to it can be assessed by flow modelling. Significant values of haemodynamics such as pressure onto the vascular wall, wall shear stress or pathlines of the blood flow can be computed. Additionally a dynamic flow model can be generated. Thus the presented method supports a better understanding of the clinical situation and assists the evaluation of therapeutic options. Furthermore it contributes to future research addressing intervention planning and prognostic assessment of intracranial aneurysms.
Construction of an annotated corpus to support biomedical information extraction

PubMed Central

Thompson, Paul; Iqbal, Syed A; McNaught, John; Ananiadou, Sophia

2009-01-01

Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE components, such as semantic role labellers. The corpus and annotation guidelines are freely available for academic purposes. PMID:19852798
Tool Efficiency Analysis model research in SEMI industry

NASA Astrophysics Data System (ADS)

Lei, Ma; Nana, Zhang; Zhongqiu, Zhang

2018-06-01

One of the key goals in SEMI industry is to improve equipment through put and ensure equipment production efficiency maximization. This paper is based on SEMI standards in semiconductor equipment control, defines the transaction rules between different tool states, and presents a TEA system model which is to analysis tool performance automatically based on finite state machine. The system was applied to fab tools and verified its effectiveness successfully, and obtained the parameter values used to measure the equipment performance, also including the advices of improvement.
Dynamic deformable models for 3D MRI heart segmentation

NASA Astrophysics Data System (ADS)

Zhukov, Leonid; Bao, Zhaosheng; Gusikov, Igor; Wood, John; Breen, David E.

2002-05-01

Automated or semiautomated segmentation of medical images decreases interstudy variation, observer bias, and postprocessing time as well as providing clincally-relevant quantitative data. In this paper we present a new dynamic deformable modeling approach to 3D segmentation. It utilizes recently developed dynamic remeshing techniques and curvature estimation methods to produce high-quality meshes. The approach has been implemented in an interactive environment that allows a user to specify an initial model and identify key features in the data. These features act as hard constraints that the model must not pass through as it deforms. We have employed the method to perform semi-automatic segmentation of heart structures from cine MRI data.
Modeling mechanical cardiopulmonary interactions for virtual environments.

PubMed

Kaye, J M

1997-01-01

We have developed a computer system for modeling mechanical cardiopulmonary behavior in an interactive, 3D virtual environment. The system consists of a compact, scalar description of cardiopulmonary mechanics, with an emphasis on respiratory mechanics, that drives deformable 3D anatomy to simulate mechanical behaviors of and interactions between physiological systems. Such an environment can be used to facilitate exploration of cardiopulmonary physiology, particularly in situations that are difficult to reproduce clinically. We integrate 3D deformable body dynamics with new, formal models of (scalar) cardiorespiratory physiology, associating the scalar physiological variables and parameters with corresponding 3D anatomy. Our approach is amenable to modeling patient-specific circumstances in two ways. First, using CT scan data, we apply semi-automatic methods for extracting and reconstructing the anatomy to use in our simulations. Second, our scalar models are defined in terms of clinically-measurable, patient-specific parameters. This paper describes our approach and presents a sample of results showing normal breathing and acute effects of pneumothoraces.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.