Science.gov

Sample records for open biomedical annotator

  1. Comparison of concept recognizers for building the Open Biomedical Annotator.

    PubMed

    Shah, Nigam H; Bhatia, Nipun; Jonquet, Clement; Rubin, Daniel; Chiang, Annie P; Musen, Mark A

    2009-01-01

    The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers - NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data. PMID:19761568

  2. Ranking Biomedical Annotations with Annotator's Semantic Relevancy

    PubMed Central

    2014-01-01

    Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator's knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user's vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large. PMID:24899918

  3. Ranking biomedical annotations with annotator's semantic relevancy.

    PubMed

    Wu, Aihua

    2014-01-01

    Biomedical annotation is a common and affective artifact for researchers to discuss, show opinion, and share discoveries. It becomes increasing popular in many online research communities, and implies much useful information. Ranking biomedical annotations is a critical problem for data user to efficiently get information. As the annotator's knowledge about the annotated entity normally determines quality of the annotations, we evaluate the knowledge, that is, semantic relationship between them, in two ways. The first is extracting relational information from credible websites by mining association rules between an annotator and a biomedical entity. The second way is frequent pattern mining from historical annotations, which reveals common features of biomedical entities that an annotator can annotate with high quality. We propose a weighted and concept-extended RDF model to represent an annotator, a biomedical entity, and their background attributes and merge information from the two ways as the context of an annotator. Based on that, we present a method to rank the annotations by evaluating their correctness according to user's vote and the semantic relevancy between the annotator and the annotated entity. The experimental results show that the approach is applicable and efficient even when data set is large. PMID:24899918

  4. Corpus annotation for mining biomedical events from literature

    PubMed Central

    Kim, Jin-Dong; Ohta, Tomoko; Tsujii, Jun'ichi

    2008-01-01

    Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain. PMID:18182099

  5. BioCause: Annotating and analysing causality in the biomedical domain

    PubMed Central

    2013-01-01

    Background Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. Results We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. Conclusion Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new

  6. Construction of an annotated corpus to support biomedical information extraction

    PubMed Central

    Thompson, Paul; Iqbal, Syed A; McNaught, John; Ananiadou, Sophia

    2009-01-01

    Background Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources. Results We have defined a new scheme for annotating sentence-bound gene regulation events, centred on both verbs and nominalised verbs. For each event instance, all participants (arguments) in the same sentence are identified and assigned a semantic role from a rich set of 13 roles tailored to biomedical research articles, together with a biological concept type linked to the Gene Regulation Ontology. To our knowledge, our scheme is unique within the biomedical field in terms of the range of event arguments identified. Using the scheme, we have created the Gene Regulation Event Corpus (GREC), consisting of 240 MEDLINE abstracts, in which events relating to gene regulation and expression have been annotated by biologists. A novel method of evaluating various different facets of the annotation task showed that average inter-annotator agreement rates fall within the range of 66% - 90%. Conclusion The GREC is a unique resource within the biomedical field, in that it annotates not only core relationships between entities, but also a range of other important details about these relationships, e.g., location, temporal, manner and environmental conditions. As such, it is specifically designed to support bio-specific tool and resource development. It has already been used to acquire semantic frames for inclusion within the BioLexicon (a lexical, terminological resource to aid biomedical text mining). Initial experiments have also shown that the corpus may viably be used to train IE

  7. Open semantic annotation of scientific publications using DOMEO

    PubMed Central

    2012-01-01

    Background Our group has developed a useful shared software framework for performing, versioning, sharing and viewing Web annotations of a number of kinds, using an open representation model. Methods The Domeo Annotation Tool was developed in tandem with this open model, the Annotation Ontology (AO). Development of both the Annotation Framework and the open model was driven by requirements of several different types of alpha users, including bench scientists and biomedical curators from university research labs, online scientific communities, publishing and pharmaceutical companies. Several use cases were incrementally implemented by the toolkit. These use cases in biomedical communications include personal note-taking, group document annotation, semantic tagging, claim-evidence-context extraction, reagent tagging, and curation of textmining results from entity extraction algorithms. Results We report on the Domeo user interface here. Domeo has been deployed in beta release as part of the NIH Neuroscience Information Framework (NIF, http://www.neuinfo.org) and is scheduled for production deployment in the NIF’s next full release. Future papers will describe other aspects of this work in detail, including Annotation Framework Services and components for integrating with external textmining services, such as the NCBO Annotator web service, and with other textmining applications using the Apache UIMA framework. PMID:22541592

  8. Enriching a biomedical event corpus with meta-knowledge annotation

    PubMed Central

    2011-01-01

    Background Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. Results We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. Conclusion By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as

  9. An open annotation ontology for science on web 3.0

    PubMed Central

    2011-01-01

    Background There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Methods Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. Results This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO’s Google Code page: http://code.google.com/p/annotation-ontology/ . Conclusions The

  10. Biomedical article retrieval using multimodal features and image annotations in region-based CBIR

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Antani, Sameer; Demner-Fushman, Dina; Rahman, Md Mahmudur; Govindaraju, Venu; Thoma, George R.

    2010-01-01

    Biomedical images are invaluable in establishing diagnosis, acquiring technical skills, and implementing best practices in many areas of medicine. At present, images needed for instructional purposes or in support of clinical decisions appear in specialized databases and in biomedical articles, and are often not easily accessible to retrieval tools. Our goal is to automatically annotate images extracted from scientific publications with respect to their usefulness for clinical decision support and instructional purposes, and project the annotations onto images stored in databases by linking images through content-based image similarity. Authors often use text labels and pointers overlaid on figures and illustrations in the articles to highlight regions of interest (ROI). These annotations are then referenced in the caption text or figure citations in the article text. In previous research we have developed two methods (a heuristic and dynamic time warping-based methods) for localizing and recognizing such pointers on biomedical images. In this work, we add robustness to our previous efforts by using a machine learning based approach to localizing and recognizing the pointers. Identifying these can assist in extracting relevant image content at regions within the image that are likely to be highly relevant to the discussion in the article text. Image regions can then be annotated using biomedical concepts from extracted snippets of text pertaining to images in scientific biomedical articles that are identified using National Library of Medicine's Unified Medical Language System® (UMLS) Metathesaurus. The resulting regional annotation and extracted image content are then used as indices for biomedical article retrieval using the multimodal features and region-based content-based image retrieval (CBIR) techniques. The hypothesis that such an approach would improve biomedical document retrieval is validated through experiments on an expert-marked biomedical article

  11. [Open access :an opportunity for biomedical research].

    PubMed

    Duchange, Nathalie; Autard, Delphine; Pinhas, Nicole

    2008-01-01

    Open access within the scientific community depends on the scientific context and the practices of the field. In the biomedical domain, the communication of research results is characterised by the importance of the peer reviewing process, the existence of a hierarchy among journals and the transfer of copyright to the editor. Biomedical publishing has become a lucrative market and the growth of electronic journals has not helped lower the costs. Indeed, it is difficult for today's public institutions to gain access to all the scientific literature. Open access is thus imperative, as demonstrated through the positions taken by a growing number of research funding bodies, the development of open access journals and efforts made in promoting open archives. This article describes the setting up of an Inserm portal for publication in the context of the French national protocol for open-access self-archiving and in an international context. PMID:18789227

  12. Leveraging biomedical ontologies and annotation services to organize microbiome data from Mammalian hosts.

    PubMed

    Sarkar, Indra Neil

    2010-01-01

    A better understanding of commensal microbiotic communities ("microbiomes") may provide valuable insights to human health. Towards this goal, an essential step may be the development of approaches to organize data that can enable comparative hypotheses across mammalian microbiomes. The present study explores the feasibility of using existing biomedical informatics resources - especially focusing on those available at the National Center for Biomedical Ontology - to organize microbiome data contained within large sequence repositories, such as GenBank. The results indicate that the Foundational Model of Anatomy and SNOMED CT can be used to organize greater than 90% of the bacterial organisms associated with 10 domesticated mammalian species. The promising findings suggest that the current biomedical informatics infrastructure may be used towards the organizing of microbiome data beyond humans. Furthermore, the results identify key concepts that might be organized into a semantic structure for incorporation into subsequent annotations that could facilitate comparative biomedical hypotheses pertaining to human health. PMID:21347072

  13. Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes

    PubMed Central

    Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

    2015-01-01

    Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems’ output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems’ annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh

  14. Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

    PubMed

    Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

    2015-01-01

    Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh

  15. Recommending MeSH terms for annotating biomedical articles

    PubMed Central

    Huang, Minlie; Névéol, Aurélie

    2011-01-01

    Background Due to the high cost of manual curation of key aspects from the scientific literature, automated methods for assisting this process are greatly desired. Here, we report a novel approach to facilitate MeSH indexing, a challenging task of assigning MeSH terms to MEDLINE citations for their archiving and retrieval. Methods Unlike previous methods for automatic MeSH term assignment, we reformulate the indexing task as a ranking problem such that relevant MeSH headings are ranked higher than those irrelevant ones. Specifically, for each document we retrieve 20 neighbor documents, obtain a list of MeSH main headings from neighbors, and rank the MeSH main headings using ListNet–a learning-to-rank algorithm. We trained our algorithm on 200 documents and tested on a previously used benchmark set of 200 documents and a larger dataset of 1000 documents. Results Tested on the benchmark dataset, our method achieved a precision of 0.390, recall of 0.712, and mean average precision (MAP) of 0.626. In comparison to the state of the art, we observe statistically significant improvements as large as 39% in MAP (p-value <0.001). Similar significant improvements were also obtained on the larger document set. Conclusion Experimental results show that our approach makes the most accurate MeSH predictions to date, which suggests its great potential in making a practical impact on MeSH indexing. Furthermore, as discussed the proposed learning framework is robust and can be adapted to many other similar tasks beyond MeSH indexing in the biomedical domain. All data sets are available at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/indexing. PMID:21613640

  16. A Maximum-Entropy approach for accurate document annotation in the biomedical domain

    PubMed Central

    2012-01-01

    The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH). The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm’s performance is resilient to terms’ ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms. PMID:22541593

  17. Open Biomedical Engineering education in Africa.

    PubMed

    Ahluwalia, Arti; Atwine, Daniel; De Maria, Carmelo; Ibingira, Charles; Kipkorir, Emmauel; Kiros, Fasil; Madete, June; Mazzei, Daniele; Molyneux, Elisabeth; Moonga, Kando; Moshi, Mainen; Nzomo, Martin; Oduol, Vitalice; Okuonzi, John

    2015-08-01

    Despite the virtual revolution, the mainstream academic community in most countries remains largely ignorant of the potential of web-based teaching resources and of the expansion of open source software, hardware and rapid prototyping. In the context of Biomedical Engineering (BME), where human safety and wellbeing is paramount, a high level of supervision and quality control is required before open source concepts can be embraced by universities and integrated into the curriculum. In the meantime, students, more than their teachers, have become attuned to continuous streams of digital information, and teaching methods need to adapt rapidly by giving them the skills to filter meaningful information and by supporting collaboration and co-construction of knowledge using open, cloud and crowd based technology. In this paper we present our experience in bringing these concepts to university education in Africa, as a way of enabling rapid development and self-sufficiency in health care. We describe the three summer schools held in sub-Saharan Africa where both students and teachers embraced the philosophy of open BME education with enthusiasm, and discuss the advantages and disadvantages of opening education in this way in the developing and developed world. PMID:26737093

  18. Annotare—a tool for annotating high-throughput biomedical investigations and resulting data

    PubMed Central

    Shankar, Ravi; Parkinson, Helen; Burdett, Tony; Hastings, Emma; Liu, Junmin; Miller, Michael; Srinivasa, Rashmi; White, Joseph; Brazma, Alvis; Sherlock, Gavin; Stoeckert, Christian J.; Ball, Catherine A.

    2010-01-01

    Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. Availability and Implementation: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows. Contact: rshankar@stanford.edu PMID:20733062

  19. Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature

    PubMed Central

    Natarajan, Jeyakumar; Ganapathy, Jawahar

    2007-01-01

    Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical ‘baseline’ set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns. PMID:18305827

  20. Integration and Querying of Genomic and Proteomic Semantic Annotations for Biomedical Knowledge Extraction.

    PubMed

    Masseroli, Marco; Canakoglu, Arif; Ceri, Stefano

    2016-01-01

    Understanding complex biological phenomena involves answering complex biomedical questions on multiple biomolecular information simultaneously, which are expressed through multiple genomic and proteomic semantic annotations scattered in many distributed and heterogeneous data sources; such heterogeneity and dispersion hamper the biologists' ability of asking global queries and performing global evaluations. To overcome this problem, we developed a software architecture to create and maintain a Genomic and Proteomic Knowledge Base (GPKB), which integrates several of the most relevant sources of such dispersed information (including Entrez Gene, UniProt, IntAct, Expasy Enzyme, GO, GOA, BioCyc, KEGG, Reactome, and OMIM). Our solution is general, as it uses a flexible, modular, and multilevel global data schema based on abstraction and generalization of integrated data features, and a set of automatic procedures for easing data integration and maintenance, also when the integrated data sources evolve in data content, structure, and number. These procedures also assure consistency, quality, and provenance tracking of all integrated data, and perform the semantic closure of the hierarchical relationships of the integrated biomedical ontologies. At http://www.bioinformatics.deib.polimi.it/GPKB/, a Web interface allows graphical easy composition of queries, although complex, on the knowledge base, supporting also semantic query expansion and comprehensive explorative search of the integrated data to better sustain biomedical knowledge extraction. PMID:27045824

  1. Open Biomedical Ontology-based Medline exploration

    PubMed Central

    Xuan, Weijian; Dai, Manhong; Mirel, Barbara; Song, Jean; Athey, Brian; Watson, Stanley J; Meng, Fan

    2009-01-01

    Background Effective Medline database exploration is critical for the understanding of high throughput experimental results and the development of novel hypotheses about the mechanisms underlying the targeted biological processes. While existing solutions enhance Medline exploration through different approaches such as document clustering, network presentations of underlying conceptual relationships and the mapping of search results to MeSH and Gene Ontology trees, we believe the use of multiple ontologies from the Open Biomedical Ontology can greatly help researchers to explore literature from different perspectives as well as to quickly locate the most relevant Medline records for further investigation. Results We developed an ontology-based interactive Medline exploration solution called PubOnto to enable the interactive exploration and filtering of search results through the use of multiple ontologies from the OBO foundry. The PubOnto program is a rich internet application based on the FLEX platform. It contains a number of interactive tools, visualization capabilities, an open service architecture, and a customizable user interface. It is freely accessible at: . PMID:19426463

  2. ProteoAnnotator--open source proteogenomics annotation software supporting PSI standards.

    PubMed

    Ghali, Fawaz; Krishna, Ritesh; Perkins, Simon; Collins, Andrew; Xia, Dong; Wastling, Jonathan; Jones, Andrew R

    2014-12-01

    The recent massive increase in capability for sequencing genomes is producing enormous advances in our understanding of biological systems. However, there is a bottleneck in genome annotation--determining the structure of all transcribed genes. Experimental data from MS studies can play a major role in confirming and correcting gene structure--proteogenomics. However, there are some technical and practical challenges to overcome, since proteogenomics requires pipelines comprising a complex set of interconnected modules as well as bespoke routines, for example in protein inference and statistics. We are introducing a complete, open source pipeline for proteogenomics, called ProteoAnnotator, which incorporates a graphical user interface and implements the Proteomics Standards Initiative mzIdentML standard for each analysis stage. All steps are included as standalone modules with the mzIdentML library, allowing other groups to re-use the whole pipeline or constituent parts within other tools. We have developed new modules for pre-processing and combining multiple search databases, for performing peptide-level statistics on mzIdentML files, for scoring grouped protein identifications matched to a given genomic locus to validate that updates to the official gene models are statistically sound and for mapping end results back onto the genome. ProteoAnnotator is available from http://www.proteoannotator.org/. All MS data have been deposited in the ProteomeXchange with identifiers PXD001042 and PXD001390 (http://proteomecentral.proteomexchange.org/dataset/PXD001042; http://proteomecentral.proteomexchange.org/dataset/PXD001390). PMID:25297486

  3. WebMedSA: a web-based framework for segmenting and annotating medical images using biomedical ontologies

    NASA Astrophysics Data System (ADS)

    Vega, Francisco; Pérez, Wilson; Tello, Andrés.; Saquicela, Victor; Espinoza, Mauricio; Solano-Quinde, Lizandro; Vidal, Maria-Esther; La Cruz, Alexandra

    2015-12-01

    Advances in medical imaging have fostered medical diagnosis based on digital images. Consequently, the number of studies by medical images diagnosis increases, thus, collaborative work and tele-radiology systems are required to effectively scale up to this diagnosis trend. We tackle the problem of the collaborative access of medical images, and present WebMedSA, a framework to manage large datasets of medical images. WebMedSA relies on a PACS and supports the ontological annotation, as well as segmentation and visualization of the images based on their semantic description. Ontological annotations can be performed directly on the volumetric image or at different image planes (e.g., axial, coronal, or sagittal); furthermore, annotations can be complemented after applying a segmentation technique. WebMedSA is based on three main steps: (1) RDF-ization process for extracting, anonymizing, and serializing metadata comprised in DICOM medical images into RDF/XML; (2) Integration of different biomedical ontologies (using L-MOM library), making this approach ontology independent; and (3) segmentation and visualization of annotated data which is further used to generate new annotations according to expert knowledge, and validation. Initial user evaluations suggest that WebMedSA facilitates the exchange of knowledge between radiologists, and provides the basis for collaborative work among them.

  4. A selected annotated bibliography of the core biomedical literature pertaining to stroke, cervical spine, manipulation and head/neck movement

    PubMed Central

    Gotlib, Allan C.; Thiel, Haymo

    1985-01-01

    This manuscript’s purpose was to establish a knowledge base of information related to stroke and the cervical spine vascular structures, from both historical and current perspectives. The scientific biomedical literatures both indexed (ie. Index Medicus, CRAC) and non-indexed literature systems were scanned and the pertinent manuscripts were annotated. Citation is by occurence in the literature so that historical trends may be viewed more easily. No analysis of the reference material is offered. Suggested however is that: 1. complications to cervical spine manipulation are being recognized and reported with increasing frequency, 2. a cause and effect relationship between stroke and cervical spine manipulation has not been established, 3. a screening mechanism that is valid, reliable and reasonable needs to be established.

  5. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research

    PubMed Central

    Köhler, Sebastian; Doelken, Sandra C; Ruef, Barbara J; Bauer, Sebastian; Washington, Nicole; Westerfield, Monte; Gkoutos, George; Schofield, Paul; Smedley, Damian; Lewis, Suzanna E; Robinson, Peter N; Mungall, Christopher J

    2014-01-01

    Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/. PMID:24358873

  6. Concept annotation in the CRAFT corpus

    PubMed Central

    2012-01-01

    Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http

  7. BIOSMILE web search: a web application for annotating biomedical entities and relations.

    PubMed

    Dai, Hong-Jie; Huang, Chi-Hsin; Lin, Ryan T K; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2008-07-01

    BIOSMILE web search (BWS), a web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form. To date, BWS has been field tested by over 30 biologists and questionnaires have shown that subjects are highly satisfied with its capabilities and usability. BWS is accessible free of charge at http://bioservices.cse.yzu.edu.tw/BWS. PMID:18515840

  8. BIOSMILE web search: a web application for annotating biomedical entities and relations

    PubMed Central

    Dai, Hong-Jie; Huang, Chi-Hsin; Lin, Ryan T. K.; Tsai, Richard Tzong-Han; Hsu, Wen-Lian

    2008-01-01

    BIOSMILE web search (BWS), a web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein–protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form. To date, BWS has been field tested by over 30 biologists and questionnaires have shown that subjects are highly satisfied with its capabilities and usability. BWS is accessible free of charge at http://bioservices.cse.yzu.edu.tw/BWS. PMID:18515840

  9. OpenCL based machine learning labeling of biomedical datasets

    NASA Astrophysics Data System (ADS)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  10. Low cost open data acquisition system for biomedical applications

    NASA Astrophysics Data System (ADS)

    Zabolotny, Wojciech M.; Laniewski-Wollk, Przemyslaw; Zaworski, Wojciech

    2005-09-01

    In the biomedical applications it is often necessary to collect measurement data from different devices. It is relatively easy, if the devices are equipped with a MIB or Ethernet interface, however often they feature only the asynchronous serial link, and sometimes the measured values are available only as the analog signals. The system presented in the paper is a low cost alternative to commercially available data acquisition systems. The hardware and software architecture of the system is fully open, so it is possible to customize it for particular needs. The presented system offers various possibilities to connect it to the computer based data processing unit - e.g. using the USB or Ethernet ports. Both interfaces allow also to use many such systems in parallel to increase amount of serial and analog inputs. The open source software used in the system makes possible to process the acquired data with standard tools like MATLAB, Scilab or Octave, or with a dedicated, user supplied application.

  11. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications

    PubMed Central

    2014-01-01

    where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations. We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications. PMID:26261718

  12. Data annotation, recording and mapping system for the US open skies aircraft

    SciTech Connect

    Brown, B.W.; Goede, W.F.; Farmer, R.G.

    1996-11-01

    This paper discusses the system developed by Northrop Grumman for the Defense Nuclear Agency (DNA), US Air Force, and the On-Site Inspection Agency (OSIA) to comply with the data annotation and reporting provisions of the Open Skies Treaty. This system, called the Data Annotation, Recording and Mapping System (DARMS), has been installed on the US OC-135 and meets or exceeds all annotation requirements for the Open Skies Treaty. The Open Skies Treaty, which will enter into force in the near future, allows any of the 26 signatory countries to fly fixed wing aircraft with imaging sensors over any of the other treaty participants, upon very short notice, and with no restricted flight areas. Sensor types presently allowed by the treaty are: optical framing and panoramic film cameras; video cameras ranging from analog PAL color television cameras to the more sophisticated digital monochrome and color line scanning or framing cameras; infrared line scanners; and synthetic aperture radars. Each sensor type has specific performance parameters which are limited by the treaty, as well as specific annotation requirements which must be achieved upon full entry into force. DARMS supports U.S. compliance with the Opens Skies Treaty by means of three subsystems: the Data Annotation Subsytem (DAS), which annotates sensor media with data obtained from sensors and the aircraft`s avionics system; the Data Recording System (DRS), which records all sensor and flight events on magnetic media for later use in generating Treaty mandated mission reports; and the Dynamic Sensor Mapping Subsystem (DSMS), which provides observers and sensor operators with a real-time moving map displays of the progress of the mission, complete with instantaneous and cumulative sensor coverages. This paper will describe DARMS and its subsystems in greater detail, along with the supporting avionics sub-systems. 7 figs.

  13. For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases

    PubMed Central

    Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

    2015-01-01

    Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases’ criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ’s coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical

  14. For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases.

    PubMed

    Liljekvist, Mads Svane; Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

    2015-01-01

    Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases' criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ's coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases

  15. SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data

    PubMed Central

    Pang, Chao; Sollie, Annet; Sijtsma, Anna; Hendriksen, Dennis; Charbon, Bart; de Haan, Mark; de Boer, Tommy; Kelpin, Fleur; Jetten, Jonathan; van der Velde, Joeri K.; Smidt, Nynke; Sijmons, Rolf; Hillege, Hans; Swertz, Morris A.

    2015-01-01

    There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA’s applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from

  16. SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data.

    PubMed

    Pang, Chao; Sollie, Annet; Sijtsma, Anna; Hendriksen, Dennis; Charbon, Bart; de Haan, Mark; de Boer, Tommy; Kelpin, Fleur; Jetten, Jonathan; van der Velde, Joeri K; Smidt, Nynke; Sijmons, Rolf; Hillege, Hans; Swertz, Morris A

    2015-01-01

    There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi- automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re)coding tasks and we believe it will prove useful for many more projects. Database URL: http://molgenis.org/sorta or as an open source download from

  17. WIRM: An Open Source Toolkit for Building Biomedical Web Applications

    PubMed Central

    Jakobovits, Rex M.; Rosse, Cornelius; Brinkley, James F.

    2002-01-01

    This article describes an innovative software toolkit that allows the creation of web applications that facilitate the acquisition, integration, and dissemination of multimedia biomedical data over the web, thereby reducing the cost of knowledge sharing. There is a lack of high-level web application development tools suitable for use by researchers, clinicians, and educators who are not skilled programmers. Our Web Interfacing Repository Manager (WIRM) is a software toolkit that reduces the complexity of building custom biomedical web applications. WIRM’s visual modeling tools enable domain experts to describe the structure of their knowledge, from which WIRM automatically generates full-featured, customizable content management systems. PMID:12386108

  18. ORegAnno: an open-access community-driven resource for regulatory annotation

    PubMed Central

    Griffith, Obi L.; Montgomery, Stephen B.; Bernier, Bridget; Chu, Bryan; Kasaian, Katayoon; Aerts, Stein; Mahony, Shaun; Sleumer, Monica C.; Bilenky, Mikhail; Haeussler, Maximilian; Griffith, Malachi; Gallo, Steven M.; Giardine, Belinda; Hooghe, Bart; Van Loo, Peter; Blanco, Enrique; Ticoll, Amy; Lithwick, Stuart; Portales-Casamar, Elodie; Donaldson, Ian J.; Robertson, Gordon; Wadelius, Claes; De Bleser, Pieter; Vlieghe, Dominique; Halfon, Marc S.; Wasserman, Wyeth; Hardison, Ross; Bergman, Casey M.; Jones, Steven J.M.

    2008-01-01

    ORegAnno is an open-source, open-access database and literature curation system for community-based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species. A new feature called the ‘publication queue’ allows users to input relevant papers from scientific literature as targets for annotation. The queue contains 4438 gene regulation papers entered by experts and another 54 351 identified by text-mining methods. Users can enter or ‘check out’ papers from the queue for manual curation using a series of user-friendly annotation pages. A typical record entry consists of species, sequence type, sequence, target gene, binding factor, experimental outcome and one or more lines of experimental evidence. An evidence ontology was developed to describe and categorize these experiments. Records are cross-referenced to Ensembl or Entrez gene identifiers, PubMed and dbSNP and can be visualized in the Ensembl or UCSC genome browsers. All data are freely available through search pages, XML data dumps or web services at: http://www.oreganno.org. PMID:18006570

  19. BioSig: The Free and Open Source Software Library for Biomedical Signal Processing

    PubMed Central

    Vidaurre, Carmen; Sander, Tilmann H.; Schlögl, Alois

    2011-01-01

    BioSig is an open source software library for biomedical signal processing. The aim of the BioSig project is to foster research in biomedical signal processing by providing free and open source software tools for many different application areas. Some of the areas where BioSig can be employed are neuroinformatics, brain-computer interfaces, neurophysiology, psychology, cardiovascular systems, and sleep research. Moreover, the analysis of biosignals such as the electroencephalogram (EEG), electrocorticogram (ECoG), electrocardiogram (ECG), electrooculogram (EOG), electromyogram (EMG), or respiration signals is a very relevant element of the BioSig project. Specifically, BioSig provides solutions for data acquisition, artifact processing, quality control, feature extraction, classification, modeling, and data visualization, to name a few. In this paper, we highlight several methods to help students and researchers to work more efficiently with biomedical signals. PMID:21437227

  20. Facilitating Full-text Access to Biomedical Literature Using Open Access Resources.

    PubMed

    Kang, Hongyu; Hou, Zhen; Li, Jiao

    2015-01-01

    Open access (OA) resources and local libraries often have their own literature databases, especially in the field of biomedicine. We have developed a method of linking a local library to a biomedical OA resource facilitating researchers' full-text article access. The method uses a model based on vector space to measure similarities between two articles in local library and OA resources. The method achieved an F-score of 99.61%. This method of article linkage and mapping between local library and OA resources is available for use. Through this work, we have improved the full-text access of the biomedical OA resources. PMID:26262422

  1. Publishing biomedical journals on the World-Wide Web using an open architecture model.

    PubMed Central

    Shareck, E. P.; Greenes, R. A.

    1996-01-01

    BACKGROUND: In many respects, biomedical publications are ideally suited for distribution via the World-Wide Web, but economic concerns have prevented the rapid adoption of an on-line publishing model. PURPOSE: We report on our experiences with assisting biomedical journals in developing an online presence, issues that were encountered, and methods used to address these issues. Our approach is based on an open architecture that fosters adaptation and interconnection of biomedical resources. METHODS: We have worked with the New England Journal of Medicine (NEJM), as well as five other publishers. A set of tools and protocols was employed to develop a scalable and customizable solution for publishing journals on-line. RESULTS: In March, 1996, the New England Journal of Medicine published its first World-Wide Web issue. Explorations with other publishers have helped to generalize the model. CONCLUSIONS: Economic and technical issues play a major role in developing World-Wide Web publishing solutions. PMID:8947685

  2. Do open access biomedical journals benefit smaller countries? The Slovenian experience.

    PubMed

    Turk, Nana

    2011-06-01

    Scientists from smaller countries have problems gaining visibility for their research. Does open access publishing provide a solution? Slovenia is a small country with around 5000 medical doctors, 1300 dentists and 1000 pharmacists. A search of Slovenia's Bibliographic database was carried out to identity all biomedical journals and those which are open access. Slovenia has 18 medical open access journals, but none has an impact factor and only 10 are indexed by Slovenian and international bibliographic databases. The visibility and quality of medical papers is poor. The solution might be to reduce the number of journals and encourage Slovenian scientists to publish their best articles in them. PMID:21564498

  3. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications

    PubMed Central

    Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

    2014-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu PMID:24131510

  4. Harvest: an open platform for developing web-based biomedical data discovery and reporting applications.

    PubMed

    Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

    2014-01-01

    Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu. PMID:24131510

  5. Data federation in the Biomedical Informatics Research Network: tools for semantic annotation and query of distributed multiscale brain data.

    PubMed

    Bug, William; Astahkov, Vadim; Boline, Jyl; Fennema-Notestine, Christine; Grethe, Jeffrey S; Gupta, Amarnath; Kennedy, David N; Rubin, Daniel L; Sanders, Brian; Turner, Jessica A; Martone, Maryann E

    2008-01-01

    The broadly defined mission of the Biomedical Informatics Research Network (BIRN, www.nbirn.net) is to better understand the causes human disease and the specific ways in which animal models inform that understanding. To construct the community-wide infrastructure for gathering, organizing and managing this knowledge, BIRN is developing a federated architecture for linking multiple databases across sites contributing data and knowledge. Navigating across these distributed data sources requires a shared semantic scheme and supporting software framework to actively link the disparate repositories. At the core of this knowledge organization is BIRNLex, a formally-represented ontology facilitating data exchange. Source curators enable database interoperability by mapping their schema and data to BIRNLex semantic classes thereby providing a means to cast BIRNLex-based queries against specific data sources in the federation. We will illustrate use of the source registration, term mapping, and query tools. PMID:18999211

  6. Annotated bibliography of the biomedical literature pertaining to chiropractic, pediatrics and manipulation in relation to the treatment of health conditions

    PubMed Central

    Gotlib, Allan C; Beingessner, Melanie

    1995-01-01

    Biomedical literature retrieval, both indexed and non-indexed, with respect to the application of manipulative therapy with therapeutic intent and pediatric health conditions (ages 0 to 17 years) yielded 66 discrete documents which met specified inclusion and exclusion criteria. There was one experimental study (RCT’s), 3 observational (cohort, case control) studies and 62 descriptive studies (case series, case reports, surveys, literature reviews). An independent rating panel determined consistency with a modified quality of evidence scale adopted from procedure ratings system 1 of Clinical Guidelines for Chiropractic Practice in Canada. Results indicate minimal Class 1 and Class 2 and some Class 3 evidence for a variety of pediatric conditions utilizing the application of manipulation with therapeutic intent.

  7. A Survey of Quality Assurance Practices in Biomedical Open Source Software Projects

    PubMed Central

    Koru, Günes; Neisa, Angelica; Umarji, Medha

    2007-01-01

    Background Open source (OS) software is continuously gaining recognition and use in the biomedical domain, for example, in health informatics and bioinformatics. Objectives Given the mission critical nature of applications in this domain and their potential impact on patient safety, it is important to understand to what degree and how effectively biomedical OS developers perform standard quality assurance (QA) activities such as peer reviews and testing. This would allow the users of biomedical OS software to better understand the quality risks, if any, and the developers to identify process improvement opportunities to produce higher quality software. Methods A survey of developers working on biomedical OS projects was conducted to examine the QA activities that are performed. We took a descriptive approach to summarize the implementation of QA activities and then examined some of the factors that may be related to the implementation of such practices. Results Our descriptive results show that 63% (95% CI, 54-72) of projects did not include peer reviews in their development process, while 82% (95% CI, 75-89) did include testing. Approximately 74% (95% CI, 67-81) of developers did not have a background in computing, 80% (95% CI, 74-87) were paid for their contributions to the project, and 52% (95% CI, 43-60) had PhDs. A multivariate logistic regression model to predict the implementation of peer reviews was not significant (likelihood ratio test = 16.86, 9 df, P = .051) and neither was a model to predict the implementation of testing (likelihood ratio test = 3.34, 9 df, P = .95). Conclusions Less attention is paid to peer review than testing. However, the former is a complementary, and necessary, QA practice rather than an alternative. Therefore, one can argue that there are quality risks, at least at this point in time, in transitioning biomedical OS software into any critical settings that may have operational, financial, or safety implications. Developers of

  8. ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

    PubMed

    Zeng, Victor; Extavour, Cassandra G

    2012-01-01

    The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental

  9. 3D visualization of biomedical CT images based on OpenGL and VRML techniques

    NASA Astrophysics Data System (ADS)

    Yin, Meng; Luo, Qingming; Xia, Fuhua

    2002-04-01

    Current high-performance computers and advanced image processing capabilities have made the application of three- dimensional visualization objects in biomedical computer tomographic (CT) images facilitate the researches on biomedical engineering greatly. Trying to cooperate with the update technology using Internet, where 3D data are typically stored and processed on powerful servers accessible by using TCP/IP, we should hold the results of the isosurface be applied in medical visualization generally. Furthermore, this project is a future part of PACS system our lab is working on. So in this system we use the 3D file format VRML2.0, which is used through the Web interface for manipulating 3D models. In this program we implemented to generate and modify triangular isosurface meshes by marching cubes algorithm. Then we used OpenGL and MFC techniques to render the isosurface and manipulating voxel data. This software is more adequate visualization of volumetric data. The drawbacks are that 3D image processing on personal computers is rather slow and the set of tools for 3D visualization is limited. However, these limitations have not affected the applicability of this platform for all the tasks needed in elementary experiments in laboratory or data preprocessed.

  10. Status of open access in the biomedical field in 2005*†

    PubMed Central

    Matsubayashi, Mamiko; Kurata, Keiko; Sakai, Yukiko; Morioka, Tomoko; Kato, Shinya; Mine, Shinji; Ueda, Shuichi

    2009-01-01

    Objectives: This study was designed to document the state of open access (OA) in the biomedical field in 2005. Methods: PubMed was used to collect bibliographic data on target articles published in 2005. PubMed, Google Scholar, Google, and OAIster were then used to establish the availability of free full text online for these publications. Articles were analyzed by type of OA, country, type of article, impact factor, publisher, and publishing model to provide insight into the current state of OA. Results: Twenty-seven percent of all the articles were accessible as OA articles. More than 70% of the OA articles were provided through journal websites. Mid-rank commercial publishers often provided OA articles in OA journals, while society publishers tended to provide OA articles in the context of a traditional subscription model. The rate of OA articles available from the websites of individual authors or in institutional repositories was quite low. Discussion/Conclusions: In 2005, OA in the biomedical field was achieved under an umbrella of existing scholarly communication systems. Typically, OA articles were published as part of subscription journals published by scholarly societies. OA journals published by BioMed Central contributed to a small portion of all OA articles. PMID:19159007

  11. Computing human image annotation.

    PubMed

    Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Rubin, Daniel L

    2009-01-01

    An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human (or machine) observer. An image markup is the graphical symbols placed over the image to depict an annotation. In the majority of current, clinical and research imaging practice, markup is captured in proprietary formats and annotations are referenced only in free text radiology reports. This makes these annotations difficult to query, retrieve and compute upon, hampering their integration into other data mining and analysis efforts. This paper describes the National Cancer Institute's Cancer Biomedical Informatics Grid's (caBIG) Annotation and Image Markup (AIM) project, focusing on how to use AIM to query for annotations. The AIM project delivers an information model for image annotation and markup. The model uses controlled terminologies for important concepts. All of the classes and attributes of the model have been harmonized with the other models and common data elements in use at the National Cancer Institute. The project also delivers XML schemata necessary to instantiate AIMs in XML as well as a software application for translating AIM XML into DICOM S/R and HL7 CDA. Large collections of AIM annotations can be built and then queried as Grid or Web services. Using the tools of the AIM project, image annotations and their markup can be captured and stored in human and machine readable formats. This enables the inclusion of human image observation and inference as part of larger data mining and analysis activities. PMID:19964202

  12. The ImageJ ecosystem: An open platform for biomedical image analysis.

    PubMed

    Schindelin, Johannes; Rueden, Curtis T; Hiner, Mark C; Eliceiri, Kevin W

    2015-01-01

    Technology in microscopy advances rapidly, enabling increasingly affordable, faster, and more precise quantitative biomedical imaging, which necessitates correspondingly more-advanced image processing and analysis techniques. A wide range of software is available-from commercial to academic, special-purpose to Swiss army knife, small to large-but a key characteristic of software that is suitable for scientific inquiry is its accessibility. Open-source software is ideal for scientific endeavors because it can be freely inspected, modified, and redistributed; in particular, the open-software platform ImageJ has had a huge impact on the life sciences, and continues to do so. From its inception, ImageJ has grown significantly due largely to being freely available and its vibrant and helpful user community. Scientists as diverse as interested hobbyists, technical assistants, students, scientific staff, and advanced biology researchers use ImageJ on a daily basis, and exchange knowledge via its dedicated mailing list. Uses of ImageJ range from data visualization and teaching to advanced image processing and statistical analysis. The software's extensibility continues to attract biologists at all career stages as well as computer scientists who wish to effectively implement specific image-processing algorithms. In this review, we use the ImageJ project as a case study of how open-source software fosters its suites of software tools, making multitudes of image-analysis technology easily accessible to the scientific community. We specifically explore what makes ImageJ so popular, how it impacts the life sciences, how it inspires other projects, and how it is self-influenced by coevolving projects within the ImageJ ecosystem. PMID:26153368

  13. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures.

    PubMed

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  14. DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

    PubMed Central

    Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi; Man, Haixia; Zhang, Jun; Learned-Miller, Erik; Yu, Hong

    2015-01-01

    Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. PMID:25951377

  15. Automatic discourse connective detection in biomedical text

    PubMed Central

    Polepalli Ramesh, Balaji; Prasad, Rashmi; Miller, Tim; Harrington, Brian

    2012-01-01

    Objective Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. Materials and Methods Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (∼112 000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (∼1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. Results and Conclusion Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org. PMID:22744958

  16. Community gene annotation in practice

    PubMed Central

    Loveland, Jane E.; Gilbert, James G.R.; Griffiths, Ed; Harrow, Jennifer L.

    2012-01-01

    Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the ‘Blessed’ annotator and ‘Gatekeeper’ approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation. Database URL: http://vega.sanger.ac.uk/index.html PMID:22434843

  17. KEGG orthology-based annotation of the predicted proteome of Acropora digitifera: ZoophyteBase - an open access and searchable database of a coral genome

    PubMed Central

    2013-01-01

    Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of

  18. @Note: a workbench for biomedical text mining.

    PubMed

    Lourenço, Anália; Carreira, Rafael; Carneiro, Sónia; Maia, Paulo; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Ferreira, Eugénio C; Rocha, Isabel; Rocha, Miguel

    2009-08-01

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used. PMID:19393341

  19. A Unified Framework for Biomedical Terminologies and Ontologies

    PubMed Central

    Ceusters, Werner; Smith, Barry

    2011-01-01

    The goal of the OBO (Open Biomedical Ontologies) Foundry initiative is to create and maintain an evolving collection of non-overlapping interoperable ontologies that will offer unambiguous representations of the types of entities in biological and biomedical reality. These ontologies are designed to serve non-redundant annotation of data and scientific text. To achieve these ends, the Foundry imposes strict requirements upon the ontologies eligible for inclusion. While these requirements are not met by most existing biomedical terminologies, the latter may nonetheless support the Foundry’s goal of consistent and non-redundant annotation if appropriate mappings of data annotated with their aid can be achieved. To construct such mappings in reliable fashion, however, it is necessary to analyze terminological resources from an ontologically realistic perspective in such a way as to identify the exact import of the ‘concepts’ and associated terms which they contain. We propose a framework for such analysis that is designed to maximize the degree to which legacy terminologies and the data coded with their aid can be successfully used for information-driven clinical and translational research. PMID:20841844

  20. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology

    PubMed Central

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e − 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e − 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  1. Semantic Similarity in Biomedical Ontologies

    PubMed Central

    Pesquita, Catia; Faria, Daniel; Falcão, André O.; Lord, Phillip; Couto, Francisco M.

    2009-01-01

    In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies. Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research. PMID:19649320

  2. xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

    PubMed Central

    Merchant, Nirav

    2016-01-01

    Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957

  3. The National Center for Biomedical Ontology: Advancing Biomedicinethrough Structured Organization of Scientific Knowledge

    SciTech Connect

    Rubin, Daniel L.; Lewis, Suzanna E.; Mungall, Chris J.; Misra,Sima; Westerfield, Monte; Ashburner, Michael; Sim, Ida; Chute,Christopher G.; Solbrig, Harold; Storey, Margaret-Anne; Smith, Barry; Day-Richter, John; Noy, Natalya F.; Musen, Mark A.

    2006-01-23

    The National Center for Biomedical Ontology (http://bioontology.org) is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists funded by the NIH Roadmap to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are: (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. The Center is working toward these objectives by providing tools to develop ontologies and to annotate experimental data, and by developing resources to integrate and relate existing ontologies as well as by creating repositories of biomedical data that are annotated using those ontologies. The Center is providing training workshops in ontology design, development, and usage, and is also pursuing research in ontology evaluation, quality, and use of ontologies to promote scientific discovery. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease.

  4. Opening Pathways for Underrepresented High School Students to Biomedical Research Careers: The Emory University RISE Program

    PubMed Central

    Rohrbaugh, Margaret C.; Corces, Victor G.

    2011-01-01

    Increasing the college graduation rates of underrepresented minority students in science disciplines is essential to attain a diverse workforce for the 21st century. The Research Internship and Science Education (RISE) program attempts to motivate and prepare students from the Atlanta Public School system, where underrepresented minority (URM) students comprise a majority of the population, for biomedical science careers by offering the opportunity to participate in an original research project. Students work in a research laboratory from the summer of their sophomore year until graduation, mentored by undergraduate and graduate students and postdoctoral fellows (postdocs). In addition, they receive instruction in college-level biology, scholastic assessment test (SAT) preparation classes, and help with the college application process. During the last 4 yr, RISE students have succeeded in the identification and characterization of a series of proteins involved in the regulation of nuclear organization and transcription. All but 1 of 39 RISE students have continued on to 4-year college undergraduate studies and 61% of those students are currently enrolled in science-related majors. These results suggest that the use of research-based experiences at the high school level may contribute to the increased recruitment of underrepresented students into science-related careers. PMID:21926301

  5. The Virtual Skeleton Database: An Open Access Repository for Biomedical Research and Collaboration

    PubMed Central

    Bonaretti, Serena; Pfahrer, Marcel; Niklaus, Roman; Büchler, Philippe

    2013-01-01

    Background Statistical shape models are widely used in biomedical research. They are routinely implemented for automatic image segmentation or object identification in medical images. In these fields, however, the acquisition of the large training datasets, required to develop these models, is usually a time-consuming process. Even after this effort, the collections of datasets are often lost or mishandled resulting in replication of work. Objective To solve these problems, the Virtual Skeleton Database (VSD) is proposed as a centralized storage system where the data necessary to build statistical shape models can be stored and shared. Methods The VSD provides an online repository system tailored to the needs of the medical research community. The processing of the most common image file types, a statistical shape model framework, and an ontology-based search provide the generic tools to store, exchange, and retrieve digital medical datasets. The hosted data are accessible to the community, and collaborative research catalyzes their productivity. Results To illustrate the need for an online repository for medical research, three exemplary projects of the VSD are presented: (1) an international collaboration to achieve improvement in cochlear surgery and implant optimization, (2) a population-based analysis of femoral fracture risk between genders, and (3) an online application developed for the evaluation and comparison of the segmentation of brain tumors. Conclusions The VSD is a novel system for scientific collaboration for the medical image community with a data-centric concept and semantically driven search option for anatomical structures. The repository has been proven to be a useful tool for collaborative model building, as a resource for biomechanical population studies, or to enhance segmentation algorithms. PMID:24220210

  6. Introducing meta-services for biomedical information extraction

    PubMed Central

    Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos; Hakenberg, Jörg; Plake, Conrad; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Hung, Hsi-Chuan; Lau, William W; Johnson, Calvin A; Sætre, Rune; Yoshida, Kazuhiro; Chen, Yan Hua; Kim, Sun; Shin, Soo-Yong; Zhang, Byoung-Tak; Baumgartner, William A; Hunter, Lawrence; Haddow, Barry; Matthews, Michael; Wang, Xinglong; Ruch, Patrick; Ehrler, Frédéric; Özgür, Arzucan; Erkan, Güneş; Radev, Dragomir R; Krauthammer, Michael; Luong, ThaiBinh; Hoffmann, Robert; Sander, Chris; Valencia, Alfonso

    2008-01-01

    We introduce the first meta-service for information extraction in molecular biology, the BioCreative MetaServer (BCMS; ). This prototype platform is a joint effort of 13 research groups and provides automatically generated annotations for PubMed/Medline abstracts. Annotation types cover gene names, gene IDs, species, and protein-protein interactions. The annotations are distributed by the meta-server in both human and machine readable formats (HTML/XML). This service is intended to be used by biomedical researchers and database annotators, and in biomedical language processing. The platform allows direct comparison, unified access, and result aggregation of the annotations. PMID:18834497

  7. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

    PubMed

    Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina

    2013-01-01

    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis. PMID:23408875

  8. Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics

    PubMed Central

    Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina

    2013-01-01

    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis. PMID:23408875

  9. Ethics of open access to biomedical research: Just a special case of ethics of open access to research

    PubMed Central

    Harnad, Stevan

    2007-01-01

    The ethical case for Open Access (OA) (free online access) to research findings is especially salient when it is public health that is being compromised by needless access restrictions. But the ethical imperative for OA is far more general: It applies to all scientific and scholarly research findings published in peer-reviewed journals. And peer-to-peer access is far more important than direct public access. Most research is funded so as to be conducted and published, by researchers, in order to be taken up, used, and built upon in further research and applications, again by researchers (pure and applied, including practitioners), for the benefit of the public that funded it – not in order to generate revenue for the peer-reviewed journal publishing industry (nor even because there is a burning public desire to read much of it). Hence OA needs to be mandated, by researchers' institutions and funders, for all research. PMID:18067660

  10. Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges

    PubMed Central

    Dos Reis, J. C.; Pruski, C.

    2015-01-01

    Summary Objectives Controlled terminologies and their dependent artefacts provide a consensual understanding of a domain while reducing ambiguities and enabling reasoning. However, the evolution of a domain’s knowledge directly impacts these terminologies and generates inconsistencies in the underlying biomedical information systems. In this article, we review existing work addressing the dynamic aspect of terminologies as well as their effects on mappings and semantic annotations. Methods We investigate approaches related to the identification, characterization and propagation of changes in terminologies, mappings and semantic annotations including techniques to update their content. Results and conclusion Based on the explored issues and existing methods, we outline open research challenges requiring investigation in the near future. PMID:26293859

  11. Computer systems for annotation of single molecule fragments

    DOEpatents

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  12. National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge.

    PubMed

    Rubin, Daniel L; Lewis, Suzanna E; Mungall, Chris J; Misra, Sima; Westerfield, Monte; Ashburner, Michael; Sim, Ida; Chute, Christopher G; Solbrig, Harold; Storey, Margaret-Anne; Smith, Barry; Day-Richter, John; Noy, Natalya F; Musen, Mark A

    2006-01-01

    The National Center for Biomedical Ontology is a consortium that comprises leading informaticians, biologists, clinicians, and ontologists, funded by the National Institutes of Health (NIH) Roadmap, to develop innovative technology and methods that allow scientists to record, manage, and disseminate biomedical information and knowledge in machine-processable form. The goals of the Center are (1) to help unify the divergent and isolated efforts in ontology development by promoting high quality open-source, standards-based tools to create, manage, and use ontologies, (2) to create new software tools so that scientists can use ontologies to annotate and analyze biomedical data, (3) to provide a national resource for the ongoing evaluation, integration, and evolution of biomedical ontologies and associated tools and theories in the context of driving biomedical projects (DBPs), and (4) to disseminate the tools and resources of the Center and to identify, evaluate, and communicate best practices of ontology development to the biomedical community. Through the research activities within the Center, collaborations with the DBPs, and interactions with the biomedical community, our goal is to help scientists to work more effectively in the e-science paradigm, enhancing experiment design, experiment execution, data analysis, information synthesis, hypothesis generation and testing, and understand human disease. PMID:16901225

  13. A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

    PubMed Central

    Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

    2015-01-01

    Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699

  14. A modular framework for biomedical concept recognition

    PubMed Central

    2013-01-01

    Background Concept recognition is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. The development of such solutions is typically performed in an ad-hoc manner or using general information extraction frameworks, which are not optimized for the biomedical domain and normally require the integration of complex external libraries and/or the development of custom tools. Results This article presents Neji, an open source framework optimized for biomedical concept recognition built around four key characteristics: modularity, scalability, speed, and usability. It integrates modules for biomedical natural language processing, such as sentence splitting, tokenization, lemmatization, part-of-speech tagging, chunking and dependency parsing. Concept recognition is provided through dictionary matching and machine learning with normalization methods. Neji also integrates an innovative concept tree implementation, supporting overlapped concept names and respective disambiguation techniques. The most popular input and output formats, namely Pubmed XML, IeXML, CoNLL and A1, are also supported. On top of the built-in functionalities, developers and researchers can implement new processing modules or pipelines, or use the provided command-line interface tool to build their own solutions, applying the most appropriate techniques to identify heterogeneous biomedical concepts. Neji was evaluated against three gold standard corpora with heterogeneous biomedical concepts (CRAFT, AnEM and NCBI disease corpus), achieving high performance results on named entity recognition (F1-measure for overlap matching: species 95%, cell 92%, cellular components 83%, gene and proteins 76%, chemicals 65%, biological processes and molecular functions 63%, disorders 85%, and anatomical entities 82%) and on entity normalization (F1-measure for overlap name matching and correct identifier included in the returned list of identifiers: species 88

  15. MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN

    PubMed Central

    Campbell, Michael S.; Law, MeiYee; Holt, Carson; Stein, Joshua C.; Moghe, Gaurav D.; Hufnagel, David E.; Lei, Jikai; Achawanantakun, Rujira; Jiao, Dian; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Childs, Kevin L.; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2014-01-01

    We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds. PMID:24306534

  16. NCBI prokaryotic genome annotation pipeline.

    PubMed

    Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James

    2016-08-19

    Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/. PMID:27342282

  17. Interpretation Errors related to the GO Annotation File Format

    PubMed Central

    Moreira, Dilvan A.; Shah, Nigam H.; Musen, Mark A.

    2007-01-01

    The Gene Ontology (GO) is the most widely used ontology for creating biomedical annotations. GO annotations are statements associating a biological entity with a GO term. These statements comprise a large dataset of biological knowledge that is used widely in biomedical research. GO Annotations are available as “gene association files” from the GO website in a tab-delimited file format (GO Annotation File Format) composed of rows of 15 tab-delimited fields. This simple format lacks the knowledge representation (KR) capabilities to represent unambiguously semantic relationships between each field. This paper demonstrates that this KR shortcoming leads users to interpret the files in ways that can be erroneous. We propose a complementary format to represent GO annotation files as knowledge bases using the W3C recommended Web Ontology Language (OWL). PMID:18693894

  18. The environment ontology: contextualising biological and biomedical entities

    PubMed Central

    2013-01-01

    As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO; http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”. PMID:24330602

  19. A Framework for Comparing Phenotype Annotations of Orthologous Genes

    PubMed Central

    Bodenreider, Olivier; Burgun, Anita

    2015-01-01

    Objectives Animal models are a key resource for the investigation of human diseases. In contrast to functional annotation, phenotype annotation is less standard, and comparing phenotypes across species remains challenging. The objective of this paper is to propose a framework for comparing phenotype annotations of orthologous genes based on the Medical Subject Headings (MeSH) indexing of biomedical articles in which these genes are discussed. Methods 17,769 pairs of orthologous genes (mouse and human) are downloaded from the Mouse Genome Informatics (MGI) system and linked to biomedical articles through Entrez Gene. MeSH index terms corresponding to diseases are extracted from Medline. Results 11,111 pairs of genes exhibited at least one phenotype annotation for each gene in the pair. Among these, 81% have at least one phenotype annotation in common, 80% have at least one annotation specific to the human gene and 84% have at least one annotation specific to the mouse gene. Four disease categories represent 54% of all phenotype annotations. Conclusions This framework supports the curation of phenotype annotation and the generation of research hypotheses based on comparative studies. PMID:20841896

  20. The National Center for Biomedical Ontology: Advancing Biomedicinethrough Structured Organization of Scientific Knowledge

    SciTech Connect

    Rubin, Daniel L.; Lewis, Suzanna E.; Mungall, Chris J.; Misra,Sima; Westerfield, Monte; Ashburner, Michael; Sim, Ida; Chute,Christopher G.; Solbrig, Harold; Storey, Margaret-Anne; Smith, Barry; Day-Richter, John; Noy, Natalya F.; Musen, Mark A.

    2006-01-23

    The National Center for Biomedical Ontology(http://bioontology.org) is a consortium that comprises leadinginformaticians, biologists, clinicians, and ontologists funded by the NIHRoadmap to develop innovative technology and methods that allowscientists to record, manage, and disseminate biomedical information andknowledge in machine-processable form. The goals of the Center are: (1)to help unify the divergent and isolated efforts in ontology developmentby promoting high quality open-source, standards-based tools to create,manage, and use ontologies, (2) to create new software tools so thatscientists can use ontologies to annotate and analyze biomedical data,(3) to provide a national resource for the ongoing evaluation,integration, and evolution of biomedical ontologies and associated toolsand theories in the context of driving biomedical projects (DBPs), and(4) to disseminate the tools and resources of the Center and to identify,evaluate, and communicate best practices of ontology development to thebiomedical community. The Center is working toward these objectives byproviding tools to develop ontologies and to annotate experimental data,and by developing resources to integrate and relate existing ontologiesas well as by creating repositories of biomedical data that are annotatedusing those ontologies. The Center is providing training workshops inontology design, development, and usage, and is also pursuing research inontology evaluation, quality, and use of ontologies to promote scientificdiscovery. Through the research activities within the Center,collaborations with the DBPs, and interactions with the biomedicalcommunity, our goal is to help scientists to work more effectively in thee-science paradigm, enhancing experiment design, experiment execution,data analysis, information synthesis, hypothesis generation and testing,and understand human disease.

  1. Food environment, walkability, and public open spaces are associated with incident development of cardio-metabolic risk factors in a biomedical cohort.

    PubMed

    Paquet, Catherine; Coffee, Neil T; Haren, Matthew T; Howard, Natasha J; Adams, Robert J; Taylor, Anne W; Daniel, Mark

    2014-07-01

    We investigated whether residential environment characteristics related to food (unhealthful/healthful food sources ratio), walkability and public open spaces (POS; number, median size, greenness and type) were associated with incidence of four cardio-metabolic risk factors (pre-diabetes/diabetes, hypertension, dyslipidaemia, abdominal obesity) in a biomedical cohort (n=3205). Results revealed that the risk of developing pre-diabetes/diabetes was lower for participants in areas with larger POS and greater walkability. Incident abdominal obesity was positively associated with the unhealthful food environment index. No associations were found with hypertension or dyslipidaemia. Results provide new evidence for specific, prospective associations between the built environment and cardio-metabolic risk factors. PMID:24880234

  2. Enabling Ontology Based Semantic Queries in Biomedical Database Systems.

    PubMed

    Zheng, Shuai; Wang, Fusheng; Lu, James; Saltz, Joel

    2012-01-01

    While current biomedical ontology repositories offer primitive query capabilities, it is difficult or cumbersome to support ontology based semantic queries directly in semantically annotated biomedical databases. The problem may be largely attributed to the mismatch between the models of the ontologies and the databases, and the mismatch between the query interfaces of the two systems. To fully realize semantic query capabilities based on ontologies, we develop a system DBOntoLink to provide unified semantic query interfaces by extending database query languages. With DBOntoLink, semantic queries can be directly and naturally specified as extended functions of the database query languages without any programming needed. DBOntoLink is adaptable to different ontologies through customizations and supports major biomedical ontologies hosted at the NCBO BioPortal. We demonstrate the use of DBOntoLink in a real world biomedical database with semantically annotated medical image annotations. PMID:23404054

  3. Biomedical research

    NASA Technical Reports Server (NTRS)

    1981-01-01

    Biomedical problems encountered by man in space which have been identified as a result of previous experience in simulated or actual spaceflight include cardiovascular deconditioning, motion sickness, bone loss, muscle atrophy, red cell alterations, fluid and electrolyte loss, radiation effects, radiation protection, behavior, and performance. The investigations and the findings in each of these areas were reviewed. A description of how biomedical research is organized within NASA, how it is funded, and how it is being reoriented to meet the needs of future manned space missions is also provided.

  4. Gene ontology annotation by density and gravitation models.

    PubMed

    Hou, Wen-Juan; Lin, Kevin Hsin-Yih; Chen, Hsin-Hsi

    2006-01-01

    Gene Ontology (GO) is developed to provide standard vocabularies of gene products in different databases. The process of annotating GO terms to genes requires curators to read through lengthy articles. Methods for speeding up or automating the annotation process are thus of great importance. We propose a GO annotation approach using full-text biomedical documents for directing more relevant papers to curators. This system explores word density and gravitation relationships between genes and GO terms. Different density and gravitation models are built and several evaluation criteria are employed to assess the effects of the proposed methods. PMID:17503384

  5. The Otter Annotation System

    PubMed Central

    Searle, Stephen M.J.; Gilbert, James; Iyer, Vivek; Clamp, Michele

    2004-01-01

    With the completion of the human genome sequence and genome sequence available for other vertebrate genomes, the task of manual annotation at the large genome scale has become a priority. Possibly even more important, is the requirement to curate and improve this annotation in the light of future data. For this to be possible, there is a need for tools to access and manage the annotation. Ensembl provides an excellent means for storing gene structures, genome features, and sequence, but it does not support the extra textual data necessary for manual annotation. We have extended Ensembl to create the Otter manual annotation system. This comprises a relational database schema for storing the manual annotation data, an application-programming interface (API) to access it, an extensible markup language (XML) format to allow transfer of the data, and a server to allow multiuser/multimachine access to the data. We have also written a data-adaptor plugin for the Apollo Browser/Editor to enable it to utilize an Otter server. The otter database is currently used by the Vertebrate Genome Annotation (VEGA) site (http://vega.sanger.ac.uk), which provides access to manually curated human chromosomes. Support is also being developed for using the AceDB annotation editor, FMap, via a perl wrapper called Lace. The Human and Vertebrate Annotation (HAVANA) group annotators at the Sanger center are using this to annotate human chromosomes 1 and 20. PMID:15123593

  6. Publishing priorities of biomedical research funders

    PubMed Central

    Collins, Ellen

    2013-01-01

    Objectives To understand the publishing priorities, especially in relation to open access, of 10 UK biomedical research funders. Design Semistructured interviews. Setting 10 UK biomedical research funders. Participants 12 employees with responsibility for research management at 10 UK biomedical research funders; a purposive sample to represent a range of backgrounds and organisation types. Conclusions Publicly funded and large biomedical research funders are committed to open access publishing and are pleased with recent developments which have stimulated growth in this area. Smaller charitable funders are supportive of the aims of open access, but are concerned about the practical implications for their budgets and their funded researchers. Across the board, biomedical research funders are turning their attention to other priorities for sharing research outputs, including data, protocols and negative results. Further work is required to understand how smaller funders, including charitable funders, can support open access. PMID:24154520

  7. Using Amazon’s Mechanical Turk for Annotating Medical Named Entities

    PubMed Central

    Yetisgen-Yildiz, Meliha; Solti, Imre; Xia, Fei

    2010-01-01

    Amazon’s Mechanical Turk (AMT) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this poster, we report our findings in using AMT to annotate biomedical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We also describe our observations on AMT workers’ annotations. PMID:21785667

  8. Biomedical Conferences

    NASA Technical Reports Server (NTRS)

    1976-01-01

    As a result of Biomedical Conferences, Vivo Metric Systems Co. has produced cardiac electrodes based on NASA technology. Frequently in science, one highly specialized discipline is unaware of relevant advances made in other areas. In an attempt to familiarize researchers in a variety of disciplines with medical problems and needs, NASA has sponsored conferences that bring together university scientists, practicing physicians and manufacturers of medical instruments.

  9. Constructing a semantic predication gold standard from the biomedical literature

    PubMed Central

    2011-01-01

    Background Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. Results We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. Conclusions While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is

  10. Making web annotations persistent over time

    SciTech Connect

    Sanderson, Robert; Van De Sompel, Herbert

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  11. Biomedical technology in Franconia.

    PubMed

    Efferth, T

    2000-01-01

    Medical instrumentation and biotechnology business is developing rapidly in Franconia. The universities of Bayreuth, Erlangen-Nürnberg, and Würzburg hold upper ranks in biomedical extramural funding research. They have a high competence in biomedical research, medical instrumentation, and biotechnology. The association "BioMedTec Franken e.V" has been founded at the beginning of 1999 both to foster the information exchange between universities, industry and politics and to facilitate the establishment of biomedical companies by means of science parks. In the IGZ (Innovation and Foundation Center Nürnberg-Fürth-Erlangen) 4,500 square meters of space are currently shared by 19 novel companies. Since 1985 60 companies in the IGZ had a total turnover of about 74 Mio Euro. The TGZ (Technologie- und Gründerzentrum) in Würzburg provides space for 11 companies. For the specific needs of biomedical technology companies further science parks will be set up in the near future. A science park for medical instrumentation will be founded in Erlangen (IZMP, Innovations- und Gründerzentrum für Medizintechnik und Pharma in der Region Nürnberg, Fürch, Erlangen). Furthermore, a Biomedical Technology Center and a Research Center for Bicompatible Materials are to be founded in Würzburg and Bayreuth, respectively. Several communication platforms (Bayern Innovativ, FORWISS, FTT, KIM, N-TEC-VISIT, TBU, WETTI etc.) allow the transfer of local academic research activities to industrial utilization and open new co-operation possibilities. International pharmaceutical companies (Novartis, Nürnberg; Pharmacia Upjohn, Erlangen) are located in Franconia. Central Franconia represents a national focus for medical instrumentation. The Erlangen settlement of the Medical Engineering Section of Siemens employs 4,500 people including approximately 1,000 employees in the Siemens research center. PMID:10683721

  12. Semi-automatic conversion of BioProp semantic annotation to PASBio annotation

    PubMed Central

    Tsai, Richard Tzong-Han; Dai, Hong-Jie; Huang, Chi-Hsin; Hsu, Wen-Lian

    2008-01-01

    Background Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator. Results Our first experiment evaluated our rule-based converter's performance independently from BIOSMILE performance. The converter achieved an F-score of 85.29%. The second experiment evaluated combined system (BIOSMILE + rule-based converter). The system achieved an F-score of 69.08% for PASBio's 29 verbs. Conclusion Our approach allows PAS conversion between BioProp and PASBio annotation using BIOSMILE alongside our newly developed semi-automatic rule generator and rule-based converter. Our system can match the performance of other state-of-the-art domain-specific ML-based SRL systems and can be easily customized for PASBio application development. PMID:19091017

  13. SEED Software Annotations.

    ERIC Educational Resources Information Center

    Bethke, Dee; And Others

    This document provides a composite index of the first five sets of software annotations produced by Project SEED. The software has been indexed by title, subject area, and grade level, and it covers sets of annotations distributed in September 1986, April 1987, September 1987, November 1987, and February 1988. The date column in the index…

  14. Annotation extension through protein family annotation coherence metrics

    PubMed Central

    Bastos, Hugo P.; Clarke, Luka A.; Couto, Francisco M.

    2013-01-01

    Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families. PMID:24130572

  15. PREFACE: 17th International School on Condensed Matter Physics (ISCMP): Open Problems in Condensed Matter Physics, Biomedical Physics and their Applications

    NASA Astrophysics Data System (ADS)

    Dimova-Malinovska, Doriana; Nesheva, Diana; Pecheva, Emilia; Petrov, Alexander G.; Primatarowa, Marina T.

    2012-12-01

    We are pleased to introduce the Proceedings of the 17th International School on Condensed Matter Physics: Open Problems in Condensed Matter Physics, Biomedical Physics and their Applications, organized by the Institute of Solid State Physics of the Bulgarian Academy of Sciences. The Chairman of the School was Professor Alexander G Petrov. Like prior events, the School took place in the beautiful Black Sea resort of Saints Constantine and Helena near Varna, going back to the refurbished facilities of the Panorama hotel. Participants from 17 different countries delivered 31 invited lecturers and 78 posters, contributing through three sessions of poster presentations. Papers submitted to the Proceedings were refereed according to the high standards of the Journal of Physics: Conference Series and the accepted papers illustrate the diversity and the high level of the contributions. Not least significant factor for the success of the 17 ISCMP was the social program, both the organized events (Welcome and Farewell Parties) and the variety of pleasant local restaurants and beaches. Visits to the Archaeological Museum (rich in valuable gold treasures of the ancient Thracian culture) and to the famous rock monastery Aladja were organized for the participants from the Varna Municipality. These Proceedings are published for the second time by the Journal of Physics: Conference Series. We are grateful to the Journal's staff for supporting this idea. The Committee decided that the next event will take place again in Saints Constantine and Helena, 1-5 September 2014. It will be entitled: Challenges of the Nanoscale Science: Theory, Materials and Applications. Doriana Dimova-Malinovska, Diana Nesheva, Emilia Pecheva, Alexander G Petrov and Marina T Primatarowa Editors

  16. National Space Biomedical Research Institute

    NASA Technical Reports Server (NTRS)

    1998-01-01

    The National Space Biomedical Research Institute (NSBRI) sponsors and performs fundamental and applied space biomedical research with the mission of leading a world-class, national effort in integrated, critical path space biomedical research that supports NASA's Human Exploration and Development of Space (HEDS) Strategic Plan. It focuses on the enabling of long-term human presence in, development of, and exploration of space. This will be accomplished by: designing, implementing, and validating effective countermeasures to address the biological and environmental impediments to long-term human space flight; defining the molecular, cellular, organ-level, integrated responses and mechanistic relationships that ultimately determine these impediments, where such activity fosters the development of novel countermeasures; establishing biomedical support technologies to maximize human performance in space, reduce biomedical hazards to an acceptable level, and deliver quality medical care; transferring and disseminating the biomedical advances in knowledge and technology acquired through living and working in space to the benefit of mankind in space and on Earth, including the treatment of patients suffering from gravity- and radiation-related conditions on Earth; and ensuring open involvement of the scientific community, industry, and the public at large in the Institute's activities and fostering a robust collaboration with NASA, particularly through Johnson Space Center.

  17. An analysis on the entity annotations in biological corpora

    PubMed Central

    Neves, Mariana

    2014-01-01

    Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain. PMID:25254099

  18. SEF Annotated Bibliography on Informal Education.

    ERIC Educational Resources Information Center

    Metropolitan Toronto School Board (Ontario). Study of Educational Facilities.

    This bibliography on informal education grew out of a concern to understand the kinds of programs possible in open plan schools. The annotations are reading notes generally more descriptive than evaluative. Citations are grouped under nine headings: (1) general, (2) description of British informal education by British writers, (3) description of…

  19. Annotation and visualization of endogenous retroviral sequences using the Distributed Annotation System (DAS) and eBioX

    PubMed Central

    Martínez Barrio, Álvaro; Lagercrantz, Erik; Sperber, Göran O; Blomberg, Jonas; Bongcam-Rudloff, Erik

    2009-01-01

    Background The Distributed Annotation System (DAS) is a widely used network protocol for sharing biological information. The distributed aspects of the protocol enable the use of various reference and annotation servers for connecting biological sequence data to pertinent annotations in order to depict an integrated view of the data for the final user. Results An annotation server has been devised to provide information about the endogenous retroviruses detected and annotated by a specialized in silico tool called RetroTector. We describe the procedure to implement the DAS 1.5 protocol commands necessary for constructing the DAS annotation server. We use our server to exemplify those steps. Data distribution is kept separated from visualization which is carried out by eBioX, an easy to use open source program incorporating multiple bioinformatics utilities. Some well characterized endogenous retroviruses are shown in two different DAS clients. A rapid analysis of areas free from retroviral insertions could be facilitated by our annotations. Conclusion The DAS protocol has shown to be advantageous in the distribution of endogenous retrovirus data. The distributed nature of the protocol is also found to aid in combining annotation and visualization along a genome in order to enhance the understanding of ERV contribution to its evolution. Reference and annotation servers are conjointly used by eBioX to provide visualization of ERV annotations as well as other data sources. Our DAS data source can be found in the central public DAS service repository, , or at . PMID:19534743

  20. An annotated energy bibliography

    NASA Technical Reports Server (NTRS)

    Blow, S. J.

    1979-01-01

    Comprehensive annotated compilation of books, journals, periodicals, and reports on energy and energy related topics, contains approximately 10,0000 tehcnical and nontechnical references from bibliographic and other sources dated January 1975 through May 1977.

  1. Opening up Academic Biomedical Research

    NASA Video Gallery

    Eva Guinan, MD, Associate Professor of Pediatrics, Associate Direction, Center for Clinical and Translational Research at Harvard Medical School, was featured during the September 7, 2011 Innovatio...

  2. Biomedical ultrasonoscope

    NASA Technical Reports Server (NTRS)

    Lee, R. D. (Inventor)

    1979-01-01

    The combination of a "C" mode scan electronics in a portable, battery powered biomedical ultrasonoscope having "A" and "M" mode scan electronics, the latter including a clock generator for generating clock pulses, a cathode ray tube having X, Y and Z axis inputs, a sweep generator connected between the clock generator and the X axis input of the cathode ray tube for generating a cathode ray sweep signal synchronized by the clock pulses, and a receiver adapted to be connected to the Z axis input of the cathode ray tube. The "C" mode scan electronics comprises a plurality of transducer elements arranged in a row and adapted to be positioned on the skin of the patient's body for converting a pulsed electrical signal to a pulsed ultrasonic signal, radiating the ultrasonic signal into the patient's body, picking up the echoes reflected from interfaces in the patient's body and converting the echoes to electrical signals; a plurality of transmitters, each transmitter being coupled to a respective transducer for transmitting a pulsed electrical signal thereto and for transmitting the converted electrical echo signals directly to the receiver, a sequencer connected between the clock generator and the plurality of transmitters and responsive to the clock pulses for firing the transmitters in cyclic order; and a staircase voltage generator connected between the clock generator and the Y axis input of the cathode ray tube for generating a staircase voltage having steps synchronized by the clock pulses.

  3. Dizeez: An Online Game for Human Gene-Disease Annotation

    PubMed Central

    Loguercio, Salvatore; Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org. PMID:23951102

  4. Dizeez: an online game for human gene-disease annotation.

    PubMed

    Loguercio, Salvatore; Good, Benjamin M; Su, Andrew I

    2013-01-01

    Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org. PMID:23951102

  5. Semantic Annotation of Mutable Data

    PubMed Central

    Morris, Robert A.; Dou, Lei; Hanken, James; Kelly, Maureen; Lowery, David B.; Ludäscher, Bertram; Macklin, James A.; Morris, Paul J.

    2013-01-01

    Electronic annotation of scientific data is very similar to annotation of documents. Both types of annotation amplify the original object, add related knowledge to it, and dispute or support assertions in it. In each case, annotation is a framework for discourse about the original object, and, in each case, an annotation needs to clearly identify its scope and its own terminology. However, electronic annotation of data differs from annotation of documents: the content of the annotations, including expectations and supporting evidence, is more often shared among members of networks. Any consequent actions taken by the holders of the annotated data could be shared as well. But even those current annotation systems that admit data as their subject often make it difficult or impossible to annotate at fine-enough granularity to use the results in this way for data quality control. We address these kinds of issues by offering simple extensions to an existing annotation ontology and describe how the results support an interest-based distribution of annotations. We are using the result to design and deploy a platform that supports annotation services overlaid on networks of distributed data, with particular application to data quality control. Our initial instance supports a set of natural science collection metadata services. An important application is the support for data quality control and provision of missing data. A previous proof of concept demonstrated such use based on data annotations modeled with XML-Schema. PMID:24223697

  6. Detection of gene annotations and protein-protein interaction associated disorders through transitive relationships between integrated annotations

    PubMed Central

    2015-01-01

    the literature for the candidate associations detected between Cystic fibrosis disorder and the PPIs between the CFTR_HUMAN, DERL1_HUMAN, RNF5_HUMAN, AHSA1_HUMAN and GOPC_HUMAN proteins, and between the CHIP_HUMAN and HSP7C_HUMAN proteins. Conclusions Although identified gene annotations and PPI-genetic disorder candidate associations require biological validation, our approach intrinsically provides their in silico evidence based on available data. Public availability within the GPKB (http://www.bioinformatics.deib.polimi.it/GPKB/) of all identified and integrated annotations offers a valuable resource fostering new biomedical-molecular knowledge discoveries. PMID:26046679

  7. Algal functional annotation tool

    Energy Science and Technology Software Center (ESTSC)

    2012-07-12

    Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations tomore » interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on

  8. Algal functional annotation tool

    SciTech Connect

    2012-07-12

    Abstract BACKGROUND: Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. DESCRIPTION: The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG

  9. Human Genome Annotation

    NASA Astrophysics Data System (ADS)

    Gerstein, Mark

    A central problem for 21st century science is annotating the human genome and making this annotation useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the genome that does not code for canonical genes, concentrating on intergenic features such as structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs (ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs) based on processing next-generation sequencing experiments. I will further explain how we cluster together groups of sites to create larger annotations. Next, I will discuss a comprehensive pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome and analyze their distribution with respect to age, protein family, and chromosomal location. Throughout, I will try to introduce some of the computational algorithms and approaches that are required for genome annotation. Much of this work has been carried out in the framework of the ENCODE, modENCODE, and 1000 genomes projects.

  10. Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN

    PubMed Central

    Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.; Stein, Joshua C.; Olson, Andrew J.; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M.; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2015-01-01

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563

  11. Algal functional annotation tool

    SciTech Connect

    Lopez, D.; Casero, D.; Cokus, S. J.; Merchant, S. S.; Pellegrini, M.

    2012-07-01

    The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of genes on KEGG pathway maps and batch gene identifier conversion.

  12. Semantic Reasoning with Image Annotations for Tumor Assessment

    PubMed Central

    Levy, Mia A.; O’Connor, Martin J.; Rubin, Daniel L.

    2009-01-01

    Identifying, tracking and reasoning about tumor lesions is a central task in cancer research and clinical practice that could potentially be automated. However, information about tumor lesions in imaging studies is not easily accessed by machines for automated reasoning. The Annotation and Image Markup (AIM) information model recently developed for the cancer Biomedical Informatics Grid provides a method for encoding the semantic information related to imaging findings, enabling their storage and transfer. However, it is currently not possible to apply automated reasoning methods to image information encoded in AIM. We have developed a methodology and a suite of tools for transforming AIM image annotations into OWL, and an ontology for reasoning with the resulting image annotations for tumor lesion assessment. Our methods enable automated inference of semantic information about cancer lesions in images. PMID:20351880

  13. Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.

    PubMed

    Pyysalo, Sampo; Ginter, Filip; Pahikkala, Tapio; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

    2006-06-01

    We present an evaluation of Link Grammar and Connexor Machinese Syntax, two major broad-coverage dependency parsers, on a custom hand-annotated corpus consisting of sentences regarding protein-protein interactions. In the evaluation, we apply the notion of an interaction subgraph, which is the subgraph of a dependency graph expressing a protein-protein interaction. We measure the performance of the parsers for recovery of individual dependencies, fully correct parses, and interaction subgraphs. For Link Grammar, an open system that can be inspected in detail, we further perform a comprehensive failure analysis, report specific causes of error, and suggest potential modifications to the grammar. We find that both parsers perform worse on biomedical English than previously reported on general English. While Connexor Machinese Syntax significantly outperforms Link Grammar, the failure analysis suggests specific ways in which the latter could be modified for better performance in the domain. PMID:16099201

  14. Annotation: The Savant Syndrome

    ERIC Educational Resources Information Center

    Heaton, Pamela; Wallace, Gregory L.

    2004-01-01

    Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…

  15. Collaborative Movie Annotation

    NASA Astrophysics Data System (ADS)

    Zad, Damon Daylamani; Agius, Harry

    In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.

  16. Annotated Bibliography. First Edition.

    ERIC Educational Resources Information Center

    Haring, Norris G.

    An annotated bibliography which presents approximately 300 references from 1951 to 1973 on the education of severely/profoundly handicapped persons. Citations are grouped alphabetically by author's name within the following categories: characteristics and treatment, gross motor development, sensory and motor development, physical therapy for the…

  17. Ghostwriting: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Simmons, Donald B.

    Drawn from communication journals, historical and news magazines, business and industrial magazines, political science and world affairs journals, general interest periodicals, and literary and political review magazines, the approximately 90 entries in this annotated bibliography discuss ghostwriting as practiced through the ages and reveal the…

  18. Investigating heterogeneous protein annotations toward cross-corpora utilization

    PubMed Central

    2009-01-01

    Background The number of corpora, collections of structured texts, has been increasing, as a result of the growing interest in the application of natural language processing methods to biological texts. Many named entity recognition (NER) systems have been developed based on these corpora. However, in the biomedical community, there is yet no general consensus regarding named entity annotation; thus, the resources are largely incompatible, and it is difficult to compare the performance of systems developed on resources that were divergently annotated. On the other hand, from a practical application perspective, it is desirable to utilize as many existing annotated resources as possible, because annotation is costly. Thus, it becomes a task of interest to integrate the heterogeneous annotations in these resources. Results We explore the potential sources of incompatibility among gene and protein annotations that were made for three common corpora: GENIA, GENETAG and AIMed. To show the inconsistency in the corpora annotations, we first tackle the incompatibility problem caused by corpus integration, and we quantitatively measure the effect of this incompatibility on protein mention recognition. We find that the F-score performance declines tremendously when training with integrated data, instead of training with pure data; in some cases, the performance drops nearly 12%. This degradation may be caused by the newly added heterogeneous annotations, and cannot be fixed without an understanding of the heterogeneities that exist among the corpora. Motivated by the result of this preliminary experiment, we further qualitatively analyze a number of possible sources for these differences, and investigate the factors that would explain the inconsistencies, by performing a series of well-designed experiments. Our analyses indicate that incompatibilities in the gene/protein annotations exist mainly in the following four areas: the boundary annotation conventions, the scope of

  19. Porting a lexicalized-grammar parser to the biomedical domain.

    PubMed

    Rimell, Laura; Clark, Stephen

    2009-10-01

    This paper introduces a state-of-the-art, linguistically motivated statistical parser to the biomedical text mining community, and proposes a method of adapting it to the biomedical domain requiring only limited resources for data annotation. The parser was originally developed using the Penn Treebank and is therefore tuned to newspaper text. Our approach takes advantage of a lexicalized grammar formalism, Combinatory Categorial Grammar (ccg), to train the parser at a lower level of representation than full syntactic derivations. The ccg parser uses three levels of representation: a first level consisting of part-of-speech (pos) tags; a second level consisting of more fine-grained ccg lexical categories; and a third, hierarchical level consisting of ccg derivations. We find that simply retraining the pos tagger on biomedical data leads to a large improvement in parsing performance, and that using annotated data at the intermediate lexical category level of representation improves parsing accuracy further. We describe the procedure involved in evaluating the parser, and obtain accuracies for biomedical data in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical resource on which we evaluate. Our conclusion is that porting newspaper parsers to the biomedical domain, at least for parsers which use lexicalized grammars, may not be as difficult as first thought. PMID:19141332

  20. Figure content analysis for improved biomedical article retrieval

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Apostolova, Emilia; Antani, Sameer; Demner-Fushman, Dina; Thoma, George R.

    2009-01-01

    Biomedical images are invaluable in medical education and establishing clinical diagnosis. Clinical decision support (CDS) can be improved by combining biomedical text with automatically annotated images extracted from relevant biomedical publications. In a previous study we reported 76.6% accuracy using supervised machine learning on the feasibility of automatically classifying images by combining figure captions and image content for usefulness in finding clinical evidence. Image content extraction is traditionally applied on entire images or on pre-determined image regions. Figure images articles vary greatly limiting benefit of whole image extraction beyond gross categorization for CDS due to the large variety. However, text annotations and pointers on them indicate regions of interest (ROI) that are then referenced in the caption or discussion in the article text. We have previously reported 72.02% accuracy in text and symbols localization but we failed to take advantage of the referenced image locality. In this work we combine article text analysis and figure image analysis for localizing pointer (arrows, symbols) to extract ROI pointed that can then be used to measure meaningful image content and associate it with the identified biomedical concepts for improved (text and image) content-based retrieval of biomedical articles. Biomedical concepts are identified using National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus. Our methods report an average precision and recall of 92.3% and 75.3%, respectively on identifying pointing symbols in images from a randomly selected image subset made available through the ImageCLEF 2008 campaign.

  1. The annotation and the usage of scientific databases could be improved with public issue tracker software

    PubMed Central

    Dall'Olio, Giovanni Marco; Bertranpetit, Jaume; Laayouni, Hafid

    2010-01-01

    Since the publication of their longtime predecessor The Atlas of Protein Sequences and Structures in 1965 by Margaret Dayhoff, scientific databases have become a key factor in the organization of modern science. All the information and knowledge described in the novel scientific literature is translated into entries in many different scientific databases, making it possible to obtain very accurate information on a biological entity like genes or proteins without having to manually review the literature on it. However, even for the databases with the finest annotation procedures, errors or unclear parts sometimes appear in the publicly released version and influence the research of unaware scientists using them. The researcher that finds an error in a database is often left in a uncertain state, and often abandons the effort of reporting it because of a lack of a standard procedure to do so. In the present work, we propose that the simple adoption of a public error tracker application, as in many open software projects, could improve the quality of the annotations in many databases and encourage feedback from the scientific community on the data annotated publicly. In order to illustrate the situation, we describe a series of errors that we found and helped solve on the genes of a very well-known pathway in various biomedically relevant databases. We would like to show that, even if a majority of the most important scientific databases have procedures for reporting errors, these are usually not publicly visible, making the process of reporting errors time consuming and not useful. Also, the effort made by the user that reports the error often goes unacknowledged, putting him in a discouraging position. PMID:21186182

  2. AmiGO: online access to ontology and annotation data

    SciTech Connect

    Carbon, Seth; Ireland, Amelia; Mungall, Christopher J.; Shu, ShengQiang; Marshall, Brad; Lewis, Suzanna

    2009-01-15

    AmiGO is a web application that allows users to query, browse, and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium; it can also be downloaded and installed to browse local ontologies and annotations. AmiGO is free open source software developed and maintained by the GO Consortium.

  3. Apollo: a sequence annotation editor

    PubMed Central

    Lewis, SE; Searle, SMJ; Harris, N; Gibson, M; Iyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, S; Mungall, CJ; Clamp, ME

    2002-01-01

    The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. PMID:12537571

  4. BioInfer: a corpus for information extraction in the biomedical domain

    PubMed Central

    Pyysalo, Sampo; Ginter, Filip; Heimonen, Juho; Björne, Jari; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

    2007-01-01

    Background Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. Results We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. Conclusion We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at . PMID:17291334

  5. The Ontology for Biomedical Investigations

    PubMed Central

    Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias; Brush, Matthew H.; Chibucos, Marcus C.; Clancy, Kevin; Courtot, Mélanie; Derom, Dirk; Dumontier, Michel; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Gibson, Frank; Gonzalez-Beltran, Alejandra; Haendel, Melissa A.; He, Yongqun; Heiskanen, Mervi; Hernandez-Boussard, Tina; Jensen, Mark; Lin, Yu; Lister, Allyson L.; Lord, Phillip; Malone, James; Manduchi, Elisabetta; McGee, Monnie; Morrison, Norman; Overton, James A.; Parkinson, Helen; Peters, Bjoern; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H.; Schober, Daniel; Smith, Barry; Soldatova, Larisa N.; Stoeckert, Christian J.; Taylor, Chris F.; Torniai, Carlo; Turner, Jessica A.; Vita, Randi; Whetzel, Patricia L.; Zheng, Jie

    2016-01-01

    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed

  6. The Ontology for Biomedical Investigations.

    PubMed

    Bandrowski, Anita; Brinkman, Ryan; Brochhausen, Mathias; Brush, Matthew H; Bug, Bill; Chibucos, Marcus C; Clancy, Kevin; Courtot, Mélanie; Derom, Dirk; Dumontier, Michel; Fan, Liju; Fostel, Jennifer; Fragoso, Gilberto; Gibson, Frank; Gonzalez-Beltran, Alejandra; Haendel, Melissa A; He, Yongqun; Heiskanen, Mervi; Hernandez-Boussard, Tina; Jensen, Mark; Lin, Yu; Lister, Allyson L; Lord, Phillip; Malone, James; Manduchi, Elisabetta; McGee, Monnie; Morrison, Norman; Overton, James A; Parkinson, Helen; Peters, Bjoern; Rocca-Serra, Philippe; Ruttenberg, Alan; Sansone, Susanna-Assunta; Scheuermann, Richard H; Schober, Daniel; Smith, Barry; Soldatova, Larisa N; Stoeckert, Christian J; Taylor, Chris F; Torniai, Carlo; Turner, Jessica A; Vita, Randi; Whetzel, Patricia L; Zheng, Jie

    2016-01-01

    The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed

  7. MICROTASK CROWDSOURCING FOR DISEASE MENTION ANNOTATION IN PUBMED ABSTRACTS

    PubMed Central

    Good, Benjamin M; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2014-01-01

    Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon’s Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the ‘training set’ of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation

  8. Entity linking for biomedical literature

    PubMed Central

    2015-01-01

    Background The Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results. Methods Since existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking. Results Without using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature. Conclusions In this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing

  9. The Ensembl gene annotation system.

    PubMed

    Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J; Murphy, Daniel N; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul; Searle, Stephen M J

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980

  10. The Ensembl gene annotation system

    PubMed Central

    Aken, Bronwen L.; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J.; Murphy, Daniel N.; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y. Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail. Database URL: http://www.ensembl.org/index.html PMID:27337980

  11. Widowed Persons Service: Selected Annotated Bibliography.

    ERIC Educational Resources Information Center

    Bressler, Dawn, Comp.; And Others

    This document presents an annotated bibliography of books and articles on topics relevant to widowhood. These annotations are included: (1) 21 annotations on the grief process; (2) 11 annotations on personal observations about widowhood; (3) 16 annotations on practical problems surrounding widowhood, including legal and financial problems and job…

  12. National Space Biomedical Research Institute

    NASA Technical Reports Server (NTRS)

    2005-01-01

    NSBRI partners with NASA to develop countermeasures against the deleterious effects of long duration space flight. NSBRI's science and technology projects are directed toward this goal, which is accomplished by: 1. Designing, testing and validating effective countermeasures to address the biological and environmental impediments to long-term human space flight. 2. Defining the molecular, cellular, organ-level, integrated responses and mechanistic relationships that ultimately determine these impediments, where such activity fosters the development of novel countermeasures. 3. Establishing biomedical support technologies to maximize human performance in space, reduce biomedical hazards to an acceptable level and deliver quality medical care. 4. Transferring and disseminating the biomedical advances in knowledge and technology acquired through living and working in space to the general benefit of humankind; including the treatment of patients suffering from gravity- and radiation-related conditions on Earth. and 5. ensuring open involvement of the scientific community,industry and the public in the Institute's activities and fostering a robust collaboration with NASA, particularly through JSC.

  13. Biomedical Compounds from Marine organisms

    PubMed Central

    Jha, Rajeev Kumar; Zi-rong, Xu

    2004-01-01

    The Ocean, which is called the ‘mother of origin of life’, is also the source of structurally unique natural products that are mainly accumulated in living organisms. Several of these compounds show pharmacological activities and are helpful for the invention and discovery of bioactive compounds, primarily for deadly diseases like cancer, acquired immuno-deficiency syndrome (AIDS), arthritis, etc., while other compounds have been developed as analgesics or to treat inflammation, etc. The life-saving drugs are mainly found abundantly in microorganisms, algae and invertebrates, while they are scarce in vertebrates. Modern technologies have opened vast areas of research for the extraction of biomedical compounds from oceans and seas.

  14. PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

    PubMed Central

    Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

    2007-01-01

    PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business. PMID:17916232

  15. Biomedical applications engineering tasks

    NASA Technical Reports Server (NTRS)

    Laenger, C. J., Sr.

    1976-01-01

    The engineering tasks performed in response to needs articulated by clinicians are described. Initial contacts were made with these clinician-technology requestors by the Southwest Research Institute NASA Biomedical Applications Team. The basic purpose of the program was to effectively transfer aerospace technology into functional hardware to solve real biomedical problems.

  16. Trends in Biomedical Education.

    ERIC Educational Resources Information Center

    Peppas, Nicholas A.; Mallinson, Richard G.

    1982-01-01

    An analysis of trends in biomedical education within chemical education is presented. Data used for the analysis included: type/level of course, subjects taught, and textbook preferences. Results among others of the 1980 survey indicate that 28 out of 79 schools responding offer at least one course in biomedical engineering. (JN)

  17. Mapping annotations with textual evidence using an scLDA model

    PubMed Central

    Jin, Bo; Chen, Vicky; Chen, Lujia; Lu, Xinghua

    2011-01-01

    Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learning major biological concepts from the corpus, 2) inferring the biological concepts existing within text regions (sentences), and 3) identifying the text regions in a document that provides evidence for the observed annotations. When applied to new gene-related documents, a trained scLDA model is capable of predicting GO annotations and identifying text regions as textual evidence supporting the predicted annotations. This study uses GO annotation data as a testbed; the approach can be generalized to other annotated data, such as MeSH and MEDLINE documents. PMID:22195141

  18. Exploring subdomain variation in biomedical language

    PubMed Central

    2011-01-01

    Background Applications of Natural Language Processing (NLP) technology to biomedical texts have generated significant interest in recent years. In this paper we identify and investigate the phenomenon of linguistic subdomain variation within the biomedical domain, i.e., the extent to which different subject areas of biomedicine are characterised by different linguistic behaviour. While variation at a coarser domain level such as between newswire and biomedical text is well-studied and known to affect the portability of NLP systems, we are the first to conduct an extensive investigation into more fine-grained levels of variation. Results Using the large OpenPMC text corpus, which spans the many subdomains of biomedicine, we investigate variation across a number of lexical, syntactic, semantic and discourse-related dimensions. These dimensions are chosen for their relevance to the performance of NLP systems. We use clustering techniques to analyse commonalities and distinctions among the subdomains. Conclusions We find that while patterns of inter-subdomain variation differ somewhat from one feature set to another, robust clusters can be identified that correspond to intuitive distinctions such as that between clinical and laboratory subjects. In particular, subdomains relating to genetics and molecular biology, which are the most common sources of material for training and evaluating biomedical NLP tools, are not representative of all biomedical subdomains. We conclude that an awareness of subdomain variation is important when considering the practical use of language processing applications by biomedical researchers. PMID:21619603

  19. Communication and Gender: Annotated Bibliography

    ERIC Educational Resources Information Center

    Todd-Mancillas, William R.; Krug, Linda

    Focusing on the similarities and differences in men's and women's verbal and nonverbal communication behavior, this 33-item annotated bibliography presents a sample of articles appearing in speech communication publications on the subject. Categories of the annotated bibliography are books, sexism and sexual harassment in academia, theoretic…

  20. Drug Education: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Mathieson, Moira B.

    This bibliography consists of a total of 215 entries dealing with drug education, including curriculum guides, and drawn from documents in the ERIC system. There are two sections, the first containing 130 annotated citations of documents and journal articles, and the second containing 85 citations of journal articles without annotations, but with…

  1. Women in Communication: Annotated Bibliography.

    ERIC Educational Resources Information Center

    Mills, Carol A.

    This annotated bibliography is designed to survey the field of women in communication. The bibliography is centered on a specific context: who are and who were the women who worked in the communication field, and specifically, what were their writings like? The 56 annotations date from 1949 through 1990 and deal mostly with books (especially…

  2. Simbody: multibody dynamics for biomedical research

    PubMed Central

    Sherman, Michael A.; Seth, Ajay; Delp, Scott L.

    2015-01-01

    Multibody software designed for mechanical engineering has been successfully employed in biomedical research for many years. For real time operation some biomedical researchers have also adapted game physics engines. However, these tools were built for other purposes and do not fully address the needs of biomedical researchers using them to analyze the dynamics of biological structures and make clinically meaningful recommendations. We are addressing this problem through the development of an open source, extensible, high performance toolkit including a multibody mechanics library aimed at the needs of biomedical researchers. The resulting code, Simbody, supports research in a variety of fields including neuromuscular, prosthetic, and biomolecular simulation, and related research such as biologically-inspired design and control of humanoid robots and avatars. Simbody is the dynamics engine behind OpenSim, a widely used biomechanics simulation application. This article reviews issues that arise uniquely in biomedical research, and reports on the architecture, theory, and computational methods Simbody uses to address them. By addressing these needs explicitly Simbody provides a better match to the needs of researchers than can be obtained by adaptation of mechanical engineering or gaming codes. Simbody is a community resource, free for any purpose. We encourage wide adoption and invite contributions to the code base at https://simtk.org/home/simbody. PMID:25866705

  3. Variobox: automatic detection and annotation of human genetic variants.

    PubMed

    Gaspar, Paulo; Lopes, Pedro; Oliveira, Jorge; Santos, Rosário; Dalgleish, Raymond; Oliveira, José Luís

    2014-02-01

    Triggered by the sequencing of the human genome, personalized medicine has been one of the fastest growing research areas in the last decade. Multiple software and hardware technologies have been developed by several projects, culminating in the exponential growth of genetic data. Considering the technological developments in this field, it is now fairly easy and inexpensive to obtain genetic profiles for unique individuals, such as those performed by several genetic analysis companies. The availability of computational tools that simplify genetic data analysis and the disclosure of biomedical evidences are of utmost importance. We present Variobox, a desktop tool to annotate, analyze, and compare human genes. Variobox obtains variant annotation data from WAVe, protein metadata annotations from Protein Data Bank, and sequences are obtained from Locus Reference Genomic or RefSeq databases. To explore the data, Variobox provides an advanced sequence visualization that enables agile navigation through genetic regions. DNA sequencing data can be compared with reference sequences retrieved from LRG or RefSeq records, identifying and automatically annotating new potential variants. These features and data, ranging from patient sequences to HGVS-compliant variant descriptions, are combined in an intuitive interface to analyze genes and variants. Variobox is a Java application, available at http://bioinformatics.ua.pt/variobox. PMID:24186831

  4. 76 FR 1212 - Joint Biomedical Laboratory Research and Development and Clinical Science Research and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-01-07

    ... AFFAIRS Joint Biomedical Laboratory Research and Development and Clinical Science Research and Development... Eligibility of the Joint Biomedical Laboratory Research and Development and Clinical Science Research and... areas of biomedical, behavioral and clinical science research. The panel meeting will be open to...

  5. 76 FR 79273 - Joint Biomedical Laboratory Research and Development and Clinical Science Research and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-21

    ... AFFAIRS Joint Biomedical Laboratory Research and Development and Clinical Science Research and Development... Eligibility of the Joint Biomedical Laboratory Research and Development and Clinical Science Research and... biomedical, behavioral, and clinical science research. The panel meeting will be open to the public...

  6. Annotations in Refseq (GSC8 Meeting)

    ScienceCinema

    Tatusova, Tatiana

    2011-04-28

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Tatiana Tatusova of NCBI discusses "Annotations in Refseq" at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 10, 2009.

  7. Expressed sequence tags: analysis and annotation.

    PubMed

    Parkinson, John; Blaxter, Mark

    2004-01-01

    Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated "pipeline" of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools. PMID:15153624

  8. Annotations in Refseq (GSC8 Meeting)

    SciTech Connect

    Tatusova, Tatiana

    2009-09-10

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Tatiana Tatusova of NCBI discusses "Annotations in Refseq" at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 10, 2009.

  9. An integrated computational pipeline and database to support whole-genome sequence annotation

    PubMed Central

    Mungall, CJ; Misra, S; Berman, BP; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, JS; Prochnik, SE; Smith, CD; Smith, E; Tupy, JL; Wiel, C; Rubin, GM; Lewis, SE

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. PMID:12537570

  10. Gene Ontology annotations and resources.

    PubMed

    Blake, J A; Dolan, M; Drabkin, H; Hill, D P; Li, Ni; Sitnikov, D; Bridges, S; Burgess, S; Buza, T; McCarthy, F; Peddinti, D; Pillai, L; Carbon, S; Dietze, H; Ireland, A; Lewis, S E; Mungall, C J; Gaudet, P; Chrisholm, R L; Fey, P; Kibbe, W A; Basu, S; Siegele, D A; McIntosh, B K; Renfro, D P; Zweifel, A E; Hu, J C; Brown, N H; Tweedie, S; Alam-Faruque, Y; Apweiler, R; Auchinchloss, A; Axelsen, K; Bely, B; Blatter, M -C; Bonilla, C; Bouguerleret, L; Boutet, E; Breuza, L; Bridge, A; Chan, W M; Chavali, G; Coudert, E; Dimmer, E; Estreicher, A; Famiglietti, L; Feuermann, M; Gos, A; Gruaz-Gumowski, N; Hieta, R; Hinz, C; Hulo, C; Huntley, R; James, J; Jungo, F; Keller, G; Laiho, K; Legge, D; Lemercier, P; Lieberherr, D; Magrane, M; Martin, M J; Masson, P; Mutowo-Muellenet, P; O'Donovan, C; Pedruzzi, I; Pichler, K; Poggioli, D; Porras Millán, P; Poux, S; Rivoire, C; Roechert, B; Sawford, T; Schneider, M; Stutz, A; Sundaram, S; Tognolli, M; Xenarios, I; Foulgar, R; Lomax, J; Roncaglia, P; Khodiyar, V K; Lovering, R C; Talmud, P J; Chibucos, M; Giglio, M Gwinn; Chang, H -Y; Hunter, S; McAnulla, C; Mitchell, A; Sangrador, A; Stephan, R; Harris, M A; Oliver, S G; Rutherford, K; Wood, V; Bahler, J; Lock, A; Kersey, P J; McDowall, D M; Staines, D M; Dwinell, M; Shimoyama, M; Laulederkind, S; Hayman, T; Wang, S -J; Petri, V; Lowry, T; D'Eustachio, P; Matthews, L; Balakrishnan, R; Binkley, G; Cherry, J M; Costanzo, M C; Dwight, S S; Engel, S R; Fisk, D G; Hitz, B C; Hong, E L; Karra, K; Miyasato, S R; Nash, R S; Park, J; Skrzypek, M S; Weng, S; Wong, E D; Berardini, T Z; Huala, E; Mi, H; Thomas, P D; Chan, J; Kishore, R; Sternberg, P; Van Auken, K; Howe, D; Westerfield, M

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678

  11. Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

    PubMed Central

    Blair, David R.; Wang, Kanix; Nestorov, Svetlozar; Evans, James A.; Rzhetsky, Andrey

    2014-01-01

    Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through “crowd-sourcing.” Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for “next-generation,” high-coverage lexical terminologies. PMID:25255227

  12. Resources for Exceptional Adult Education: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Beltran, Alejandro C.; And Others

    This annotated bibliography describes materials that can be helpful to adult educators working with exceptional adults. The bibliography includes 186 citations of resource materials, assessment materials, training guides, curriculum guides, research findings, films, and general information. The opening section consists of citations of general…

  13. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Ross, R.; Levy, R.; Makeig, D.

    1987-03-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  14. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Curtiss, E.R.; Heitzman, J.; LePoer, B.A.; Levy, R.J.

    1985-10-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  15. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Ross, R.; LePoer, B.; Levy, R.; Curtiss, E.

    1987-08-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  16. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.; Levy, R.

    1985-09-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  17. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Curtiss, E.R.; Heitzman, J.; LePoer, B.A.; Levy, R.J.

    1986-01-01

    This bibliography provides selective annotations of open source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  18. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Levy, R.J.; Heitzman, J.; LePoer, B.; Ross, R.

    1986-11-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  19. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Curtiss, E.R.; Heitzman, J.; LePoer, B.A.; Levy, R.J.

    1985-12-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  20. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Curtiss, E.R.; Heitzman, J.; LePoer, B.A.; Levy, R.J.

    1985-07-01

    This bibliography procides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  1. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Levy, R.J.; Heitzman, J.; Ross, R.; Curtiss, E.

    1987-01-01

    This bibliography provides selective annotations of open source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  2. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.R.; Curtiss, E.R.; Hietzman, J.; LePoer, B.A.; Levy, R.J.

    1985-11-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  3. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Ross, R.R.; Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.

    1986-05-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  4. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1986-12-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  5. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.; Levy, R.

    1985-08-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  6. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1986-03-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  7. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1986-04-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  8. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Makeig, D.C.; Heitzman, J.; Ross, R.; Curtiss, E.

    1986-10-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  9. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.; Levy, R.

    1987-02-01

    This bibliography provides selective annotations of open source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  10. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Levy, R.; Ross, R.

    1986-09-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  11. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1986-06-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  12. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.; Levy, R.

    1987-09-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  13. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Curtiss, E.; Heitzman, J.; LePoer, B.; Levy, R.

    1986-02-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  14. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1986-07-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  15. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Curtiss, E.; LaPoer, B.

    1987-12-01

    This bibliography provides selective annotations of open source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  16. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Ross, R.; Makeig, D.; LePoer, B.; Heitzman, J.; Levy, R.

    1988-03-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  17. Selective, annotated bibliography on current south Asian issues. Final report

    SciTech Connect

    Blood, P.; Heitzman, J.; Levy, R.; Ross, R.; Curtiss, E.

    1988-08-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  18. Selective, annotated bibliography on current south Asian issues

    SciTech Connect

    1987-07-01

    This bibliography provides selective annotations of open-source material on two current issues: nuclear developments in South Asia, and tactics and organization of Afghan resistance groups. The monthly bibliography incorporates serials and monographs arranged alphabetically by author and title within each section.

  19. Annotated Bibliography; Freedom of Information Center Reports and Summary Papers.

    ERIC Educational Resources Information Center

    Freedom of Information Center, Columbia, MO.

    This bibliography lists and annotates almost 400 information reports, opinion papers, and summary papers dealing with freedom of information. Topics covered include the nature of press freedom and increased press efforts toward more open access to information; the press situation in many foreign countries, including France, Sweden, Communist…

  20. Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

    PubMed

    Good, Benjamin M; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2015-01-01

    Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402

  1. ORegAnno 3.0: a community-driven resource for curated regulatory annotation.

    PubMed

    Lesurf, Robert; Cotto, Kelsy C; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J M; Montgomery, Stephen B; Griffith, Obi L

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  2. ORegAnno 3.0: a community-driven resource for curated regulatory annotation

    PubMed Central

    Lesurf, Robert; Cotto, Kelsy C.; Wang, Grace; Griffith, Malachi; Kasaian, Katayoon; Jones, Steven J. M.; Montgomery, Stephen B.; Griffith, Obi L.

    2016-01-01

    The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/. PMID:26578589

  3. Topics in Biomedical Optics: Introduction

    NASA Astrophysics Data System (ADS)

    Hebden, Jeremy C.; Boas, David A.; George, John S.; Durkin, Anthony J.

    2003-06-01

    The field of biomedical optics is experiencing tremendous growth. Biomedical technologies contribute in the creation of devices used in healthcare of various specialties (ophthalmology, cardiology, anesthesiology, and immunology, etc.). Recent research in biomedical optics is discussed. Overviews of meetings held at the 2002 Optical Society of America Biomedical Topical Meetings are presented.

  4. Patient Education: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Simmons, Jeannette

    Topics included in this annotated bibliography on patient education are (1) background on development of patient education programs, (2) patient education interventions, (3) references for health professionals, and (4) research and evaluation in patient education. (TA)

  5. Manpower development for the biomedical industry space.

    PubMed

    Goh, James C H

    2013-01-01

    The Biomedical Sciences (BMS) Cluster is one of four key pillars of the Singapore economy. The Singapore Government has injected research funding for basic and translational research to attract companies to carry out their commercial R&D activities. To further intensify the R&D efforts, the National Research Foundation (NRF) was set up to coordinate the research activities of different agencies within the larger national framework and to fund strategic R&D initiatives. In recent years, funding agencies began to focus on support of translational and clinical research, particularly those with potential for commercialization. Translational research is beginning to have traction, in particular research funding for the development of innovation medical devices. Therefore, the Biomedical Sciences sector is projected to grow which means that there is a need to invest in human capital development to achieve sustainable growth. In support of this, education and training programs to strengthen the manpower capabilities for the Biomedical Sciences industry have been developed. In recent years, undergraduate and graduate degree courses in biomedical engineering/bioengineering have been developing at a rapid rate. The goal is to train students with skills to understand complex issues of biomedicine and to develop and implement of advanced technological applications to these problems. There are a variety of career opportunities open to graduates in biomedical engineering, however regardless of the type of career choices, students must not only focus on achieving good grades. They have to develop their marketability to employers through internships, overseas exchange programs, and involvement in leadership-type activities. Furthermore, curriculum has to be developed with biomedical innovation in mind and ensure relevance to the industry. The objective of this paper is to present the NUS Bioengineering undergraduate program in relation to manpower development for the biomedical

  6. Gene Ontology Annotations and Resources

    PubMed Central

    2013-01-01

    The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources. PMID:23161678

  7. Towards a Consensus Annotation System (GSC8 Meeting)

    ScienceCinema

    White, Owen [University of Maryland

    2011-04-28

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. "Comparing Annotations: Towards Consensus Annotation" at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 10, 2009

  8. Towards a Consensus Annotation System (GSC8 Meeting)

    SciTech Connect

    White, Owen

    2009-09-10

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. "Comparing Annotations: Towards Consensus Annotation" at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 10, 2009

  9. Nanoparticles for biomedical imaging

    PubMed Central

    Nune, Satish K; Gunda, Padmaja; Thallapally, Praveen K; Lin, Ying-Ying; Forrest, M Laird; Berkland, Cory J

    2011-01-01

    Background Synthetic nanoparticles are emerging as versatile tools in biomedical applications, particularly in the area of biomedical imaging. Nanoparticles 1 – 100 nm in diameter have dimensions comparable to biological functional units. Diverse surface chemistries, unique magnetic properties, tunable absorption and emission properties, and recent advances in the synthesis and engineering of various nanoparticles suggest their potential as probes for early detection of diseases such as cancer. Surface functionalization has expanded further the potential of nanoparticles as probes for molecular imaging. Objective To summarize emerging research of nanoparticles for biomedical imaging with increased selectivity and reduced nonspecific uptake with increased spatial resolution containing stabilizers conjugated with targeting ligands. Methods This review summarizes recent technological advances in the synthesis of various nanoparticle probes, and surveys methods to improve the targeting of nanoparticles for their application in biomedical imaging. Conclusion Structural design of nanomaterials for biomedical imaging continues to expand and diversify. Synthetic methods have aimed to control the size and surface characteristics of nanoparticles to control distribution, half-life and elimination. Although molecular imaging applications using nanoparticles are advancing into clinical applications, challenges such as storage stability and long-term toxicology should continue to be addressed. PMID:19743894

  10. The center for expanded data annotation and retrieval.

    PubMed

    Musen, Mark A; Bean, Carol A; Cheung, Kei-Hoi; Dumontier, Michel; Durante, Kim A; Gevaert, Olivier; Gonzalez-Beltran, Alejandra; Khatri, Purvesh; Kleinstein, Steven H; O'Connor, Martin J; Pouliot, Yannick; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Wiser, Jeffrey A

    2015-11-01

    The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments. PMID:26112029

  11. Annotation of the Protein Coding Regions of the Equine Genome

    PubMed Central

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.; Zeng, Zheng; Liu, Jinze; Orlando, Ludovic; MacLeod, James N.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced mRNA from a pool of forty-three different tissues. From these, we derived the structures of 68,594 transcripts. In addition, we identified 301,829 positions with SNPs or small indels within these transcripts relative to EquCab2. Interestingly, 780 variants extend the open reading frame of the transcript and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross-species transcriptional and genomic comparisons. PMID:26107351

  12. CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations

    PubMed Central

    2013-01-01

    Background In order to access the large amount of information in biomedical literature about genes implicated in various cancers both efficiently and accurately, the aid of text mining (TM) systems is invaluable. Current TM systems do target either gene-cancer relations or biological processes involving genes and cancers, but the former type produces information not comprehensive enough to explain how a gene affects a cancer, and the latter does not provide a concise summary of gene-cancer relations. Results In this paper, we present a corpus for the development of TM systems that are specifically targeting gene-cancer relations but are still able to capture complex information in biomedical sentences. We describe CoMAGC, a corpus with multi-faceted annotations of gene-cancer relations. In CoMAGC, a piece of annotation is composed of four semantically orthogonal concepts that together express 1) how a gene changes, 2) how a cancer changes and 3) the causality between the gene and the cancer. The multi-faceted annotations are shown to have high inter-annotator agreement. In addition, we show that the annotations in CoMAGC allow us to infer the prospective roles of genes in cancers and to classify the genes into three classes according to the inferred roles. We encode the mapping between multi-faceted annotations and gene classes into 10 inference rules. The inference rules produce results with high accuracy as measured against human annotations. CoMAGC consists of 821 sentences on prostate, breast and ovarian cancers. Currently, we deal with changes in gene expression levels among other types of gene changes. The corpus is available at http://biopathway.org/CoMAGCunder the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0). Conclusions The corpus will be an important resource for the development of advanced TM systems on gene-cancer relations. PMID:24225062

  13. Biomedical materials and devices

    SciTech Connect

    Hanker, J. S. ); Giammara, B. L. )

    1989-01-01

    This conference reports on how biomedical materials and devices are undergoing important changes that require interdisciplinary approaches, innovation expertise, and access to sophisticated preparative and analytical equipment and methodologies. The interaction of materials scientists with biomedical, biotechnological, bioengineering and clinical scientists in the last decade has resulted in major advances in therapy. New therapeutic modalities and bioengineering methods and devices for the continuous removal of toxins or pathologic products present in arthritis, atherosclerosis and malignancy are presented. Novel monitoring and controlled drug delivery systems and discussions of materials such as blood or plasma substitutes, artificial organs, and bone graft substitutes are discussed.

  14. Commercial Biomedical Experiments Payload

    NASA Technical Reports Server (NTRS)

    2003-01-01

    Experiments to seek solutions for a range of biomedical issues are at the heart of several investigations that will be hosted by the Commercial Instrumentation Technology Associates (ITA), Inc. The biomedical experiments CIBX-2 payload is unique, encompassing more than 20 separate experiments including cancer research, commercial experiments, and student hands-on experiments from 10 schools as part of ITA's ongoing University Among the stars program. Here, Astronaut Story Musgrave activates the CMIX-5 (Commercial MDA ITA experiment) payload in the Space Shuttle mid deck during the STS-80 mission in 1996 which is similar to CIBX-2. The experiments are sponsored by NASA's Space Product Development Program (SPD).

  15. Commercial Biomedical Experiments

    NASA Technical Reports Server (NTRS)

    2003-01-01

    Experiments to seek solutions for a range of biomedical issues are at the heart of several investigations that will be hosted by the Commercial Instrumentation Technology Associates (ITA), Inc. Biomedical Experiments (CIBX-2) payload. CIBX-2 is unique, encompassing more than 20 separate experiments including cancer research, commercial experiments, and student hands-on experiments from 10 schools as part of ITA's ongoing University Among the Stars program. Valerie Cassanto of ITA checks the Canadian Protein Crystallization Experiment (CAPE) carried by STS-86 to Mir in 1997. The experiments are sponsored by NASA's Space Product Development Program (SPD).

  16. Noninvasive biomedical sensor

    NASA Astrophysics Data System (ADS)

    Ling, Daniel; Bullock, Audra

    2003-07-01

    A non-invasive biomedical sensor for monitoring glucose levels is described. The sensor utilizes laser light to determine glucose levels in urine, but could also be used for drug screening and diagnosis of other medical conditions. The glucose measurement is based on modulation spectroscopy with harmonic analysis. Active signal processing and filtering are used to increase the signal-to-noise ratio and decreases the measurement time to allow for real time sample analysis. Preliminary data are given which show the concentration of glucose in a control sample. Future applications of this technology, for example, as a portable multipurpose bio-medical analysis tool, are explored.

  17. Elastomers for biomedical applications.

    PubMed

    Yoda, R

    1998-01-01

    Current topics in elastomers for biomedical applications are reviewed. Elastomeric biomaterials, such as silicones, thermoplastic elastomers, polyolefin and polydiene elastomers, poly(vinyl chloride), natural rubber, heparinized polymers, hydrogels, polypeptides elastomers and others are described. In addition biomedical applications, such as cardiovascular devices, prosthetic devices, general medical care products, transdermal therapeutic systems, orthodontics, and ophthalmology are reviewed as well. Elastomers will find increasing use in medical products, offering biocompatibility, durability, design flexibility, and favorable performance/cost ratios. Elastomers will play a key role in medical technology of the future. PMID:9659600

  18. Supporting undergraduate biomedical entrepreneurship.

    PubMed

    Patterson, P E

    2004-01-01

    As biomedical innovations become more sophisticated and expensive to bring to market, an approach is needed to ensure the survival of the best ideas. The tactic used by Iowa State University to provide entrepreneurship opportunities for undergraduate students in biomedical areas is a model that has proven to be both distinctive and effective. Iowa State supports and fosters undergraduate student entrepreneurship efforts through the Pappajohn Center for Entrepreneurship. This unique partnership encourages ISU faculty, researchers, and students to become involved in the world of entrepreneurship, while allowing Iowa's business communities to gain access to a wide array of available resources, skills, and information from Iowa State University. PMID:15134007

  19. Biomedical enhancements as justice.

    PubMed

    Nam, Jeesoo

    2015-02-01

    Biomedical enhancements, the applications of medical technology to make better those who are neither ill nor deficient, have made great strides in the past few decades. Using Amartya Sen's capability approach as my framework, I argue in this article that far from being simply permissible, we have a prima facie moral obligation to use these new developments for the end goal of promoting social justice. In terms of both range and magnitude, the use of biomedical enhancements will mark a radical advance in how we compensate the most disadvantaged members of society. PMID:24117708

  20. Collective dynamics of social annotation

    PubMed Central

    Cattuto, Ciro; Barrat, Alain; Baldassarri, Andrea; Schehr, Gregory; Loreto, Vittorio

    2009-01-01

    The enormous increase of popularity and use of the worldwide web has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with keywords known as “tags.” Understanding the rich emergent structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks (RWs), and complex networks theory, can effectively contribute to the mathematical modeling of social annotation systems. Here, we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of RWs. This modeling framework reproduces several aspects, thus far unexplained, of social annotation, among which are the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and that are typically hard to access. PMID:19506244

  1. Collective dynamics of social annotation.

    PubMed

    Cattuto, Ciro; Barrat, Alain; Baldassarri, Andrea; Schehr, Gregory; Loreto, Vittorio

    2009-06-30

    The enormous increase of popularity and use of the worldwide web has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with keywords known as "tags." Understanding the rich emergent structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks (RWs), and complex networks theory, can effectively contribute to the mathematical modeling of social annotation systems. Here, we show that the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of RWs. This modeling framework reproduces several aspects, thus far unexplained, of social annotation, among which are the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and that are typically hard to access. PMID:19506244

  2. National Space Biomedical Research Institute Annual Report

    NASA Technical Reports Server (NTRS)

    2000-01-01

    This report summarizes the activities of the National Space Biomedical Research Institute (NSBRI) during FY 2000. The NSBRI is responsible for the development of countermeasures against the deleterious effects of long-duration space flight and performs fundamental and applied space biomedical research directed towards this specific goal. Its mission is to lead a world-class, national effort in integrated, critical path space biomedical research that supports NASA's Human Exploration and Development of Space (HEDS) Strategic Plan by focusing on the enabling of long-term human presence in, development of, and exploration of space. This is accomplished by: designing, testing and validating effective countermeasures to address the biological and environmental impediments to long-term human space flight; defining the molecular, cellular, organ-level, integrated responses and mechanistic relationships that ultimately determine these impediments, where such activity fosters the development of novel countermeasures; establishing biomedical support technologies to maximize human performance in space, reduce biomedical hazards to an acceptable level, and deliver quality medical care; transferring and disseminating the biomedical advances in knowledge and technology acquired through living and working in space to the general benefit of mankind, including the treatment of patients suffering from gravity- and radiation-related conditions on Earth; and ensuring open involvement of the scientific community, industry and the public at large in the Institute's activities and fostering a robust collaboration with NASA, particularly through NASA's Lyndon B. Johnson Space Center. Attachment:Appendices (A,B,C,D,E,F,G,H,I,J,K,L,M,N,O, and P.).

  3. Whiplash: a selective annotated bibliography

    PubMed Central

    Smith, Brad MT; Adams, Alan

    1997-01-01

    Objective: To review the literature on whiplash injury including an overview, collision mechanics, pathophysiology, neurobehavioral, imaging, treatment/management, prognosis, outcomes, and litigation. Design: An annotated bibliography. Methods: A literature search of MEDLINE from 1987 to 1995 and CHIROLARS from 1900 to 1996, with emphasis on the last ten years, was performed. Conference proceedings and the personal files of the authors were searched for relevant citations. Key words utilized in the search were whiplash injury, acceleration/deceleration injury, neck pain, head pain, cognitive impairment, treatment, imaging, prognosis and litigation. Results: This annotated bibliography identifies key studies and potential models for future research. Conclusions: There is currently a lack of clinical consensus both in practice and in the literature regarding the evaluation and management of an episode of whiplash injury. This annotated bibliography has been developed in an attempt to provide an overview of the literature regarding various issues surrounding an episode of whiplash injury.

  4. Biomedical Engineering in Modern Society

    ERIC Educational Resources Information Center

    Attinger, E. O.

    1971-01-01

    Considers definition of biomedical engineering (BME) and how biomedical engineers should be trained. State of the art descriptions of BME and BME education are followed by a brief look at the future of BME. (TS)

  5. Vcfanno: fast, flexible annotation of genetic variants.

    PubMed

    Pedersen, Brent S; Layer, Ryan M; Quinlan, Aaron R

    2016-01-01

    The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file. By leveraging a parallel "chromosome sweeping" algorithm, we demonstrate substantial performance gains by annotating ~85,000 variants per second with 50 attributes from 17 commonly used genome annotation resources. Vcfanno is available at https://github.com/brentp/vcfanno under the MIT license. PMID:27250555

  6. Anatomy for Biomedical Engineers

    ERIC Educational Resources Information Center

    Carmichael, Stephen W.; Robb, Richard A.

    2008-01-01

    There is a perceived need for anatomy instruction for graduate students enrolled in a biomedical engineering program. This appeared especially important for students interested in and using medical images. These students typically did not have a strong background in biology. The authors arranged for students to dissect regions of the body that…

  7. Biomedical applications in EELA.

    PubMed

    Cardenas, Miguel; Hernández, Vicente; Mayo, Rafael; Blanquer, Ignacio; Perez-Griffo, Javier; Isea, Raul; Nuñez, Luis; Mora, Henry Ricardo; Fernández, Manuel

    2006-01-01

    The current demand for Grid Infrastructures to bring collabarating groups between Latina America and Europe has created the EELA proyect. This e-infrastructure is used by Biomedical groups in Latina America and Europe for the studies of ocnological analisis, neglected diseases, sequence alignments and computation plygonetics. PMID:16823158

  8. What is biomedical informatics?

    PubMed Central

    Bernstam, Elmer V.; Smith, Jack W.; Johnson, Todd R.

    2009-01-01

    Biomedical informatics lacks a clear and theoretically grounded definition. Many proposed definitions focus on data, information, and knowledge, but do not provide an adequate definition of these terms. Leveraging insights from the philosophy of information, we define informatics as the science of information, where information is data plus meaning. Biomedical informatics is the science of information as applied to or studied in the context of biomedicine. Defining the object of study of informatics as data plus meaning clearly distinguishes the field from related fields, such as computer science, statistics and biomedicine, which have different objects of study. The emphasis on data plus meaning also suggests that biomedical informatics problems tend to be difficult when they deal with concepts that are hard to capture using formal, computational definitions. In other words, problems where meaning must be considered are more difficult than problems where manipulating data without regard for meaning is sufficient. Furthermore, the definition implies that informatics research, teaching, and service should focus on biomedical information as data plus meaning rather than only computer applications in biomedicine. PMID:19683067

  9. Biomedical Results of Apollo

    NASA Technical Reports Server (NTRS)

    Johnston, R. S. (Editor); Dietlein, L. F. (Editor); Berry, C. A. (Editor); Parker, James F. (Compiler); West, Vita (Compiler)

    1975-01-01

    The biomedical program developed for Apollo is described in detail. The findings are listed of those investigations which are conducted to assess the effects of space flight on man's physiological and functional capacities, and significant medical events in Apollo are documented. Topics discussed include crew health and inflight monitoring, preflight and postflight medical testing, inflight experiments, quarantine, and life support systems.

  10. Texture in Biomedical Images

    NASA Astrophysics Data System (ADS)

    Petrou, Maria

    An overview of texture analysis methods is given and the merits of each method for biomedical applications are discussed. Methods discussed include Markov random fields, Gibbs distributions, co-occurrence matrices, Gabor functions and wavelets, Karhunen-Loève basis images, and local symmetry and orientation from the monogenic signal. Some example applications of texture to medical image processing are reviewed.

  11. Holography In Biomedical Sciences

    NASA Astrophysics Data System (ADS)

    von Bally, G.

    1988-01-01

    Today not only physicists and engineers but also biological and medical scientists are exploring the potentials of holographic methods in their special field of work. Most of the underlying physical principles such as coherence, interference, diffraction and polarization as well as general features of holography e.g. storage and retrieval of amplitude and phase of a wavefront, 3-d-imaging, large field of depth, redundant storage of information, spatial filtering, high-resolving, non-contactive, 3-d form and motion analysis are explained in detail in other contributions to this book. Therefore, this article is confined to the applications of holography in biomedical sciences. Because of the great number of contributions and the variety of applications [1,2,3,4,5,6,7,8] in this review the investigations can only be mentioned briefly and the survey has to be confined to some examples. As in all fields of optics and laser metrology, a review of biomedical applications of holography would be incomplete if military developments and their utilization are not mentioned. As will be demonstrated by selected examples the increasing interlacing of science with the military does not stop at domains that traditionally are regarded as exclusively oriented to human welfare like biomedical research [9]. This fact is actually characterized and stressed by the expression "Star Wars Medicine", which becomes increasingly common as popular description for laser applications (including holography) in medicine [10]. Thus, the consequence - even in such highly specialized fields like biomedical applications of holography - have to be discussed.

  12. Implantable CMOS Biomedical Devices

    PubMed Central

    Ohta, Jun; Tokuda, Takashi; Sasagawa, Kiyotaka; Noda, Toshihiko

    2009-01-01

    The results of recent research on our implantable CMOS biomedical devices are reviewed. Topics include retinal prosthesis devices and deep-brain implantation devices for small animals. Fundamental device structures and characteristics as well as in vivo experiments are presented. PMID:22291554

  13. Principles of Biomedical Ethics

    PubMed Central

    Athar, Shahid

    2012-01-01

    In this presentation, I will discuss the principles of biomedical and Islamic medical ethics and an interfaith perspective on end-of-life issues. I will also discuss three cases to exemplify some of the conflicts in ethical decision-making. PMID:23610498

  14. Cloud Based Metalearning System for Predictive Modeling of Biomedical Data

    PubMed Central

    Vukićević, Milan

    2014-01-01

    Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101

  15. Software Suite for Gene and Protein Annotation Prediction and Similarity Search.

    PubMed

    Chicco, Davide; Masseroli, Marco

    2015-01-01

    In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim, a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib. polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery. PMID:26357324

  16. Alignment-Annotator web server: rendering and annotating sequence alignments

    PubMed Central

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-01-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. Availability: http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. PMID:24813445

  17. Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives.

    PubMed

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the "glue" for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  18. Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

    PubMed Central

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  19. DEVA: An extensible ontology-based annotation model for visual document collections

    NASA Astrophysics Data System (ADS)

    Jelmini, Carlo; Marchand-Maillet, Stephane

    2003-01-01

    The description of visual documents is a fundamental aspect of any efficient information management system, but the process of manually annotating large collections of documents is tedious and far from being perfect. The need for a generic and extensible annotation model therefore arises. In this paper, we present DEVA, an open, generic and expressive multimedia annotation framework. DEVA is an extension of the Dublin Core specification. The model can represent the semantic content of any visual document. It is described in the ontology language DAML+OIL and can easily be extended with external specialized ontologies, adapting the vocabulary to the given application domain. In parallel, we present the Magritte annotation tool, which is an early prototype that validates the DEVA features. Magritte allows to manually annotating image collections. It is designed with a modular and extensible architecture, which enables the user to dynamically adapt the user interface to specialized ontologies merged into DEVA.

  20. Improving Genome Assemblies and Annotations for Nonhuman Primates

    PubMed Central

    Norgren, Robert B.

    2013-01-01

    The study of nonhuman primates (NHP) is key to understanding human evolution, in addition to being an important model for biomedical research. NHPs are especially important for translational medicine. There are now exciting opportunities to greatly increase the utility of these models by incorporating Next Generation (NextGen) sequencing into study design. Unfortunately, the draft status of nonhuman genomes greatly constrains what can currently be accomplished with available technology. Although all genomes contain errors, draft assemblies and annotations contain so many mistakes that they make currently available nonhuman primate genomes misleading to investigators conducting evolutionary studies; and these genomes are of insufficient quality to serve as references for NextGen studies. Fortunately, NextGen sequencing can be used in the production of greatly improved genomes. Existing Sanger sequences can be supplemented with NextGen whole genome, and exomic genomic sequences to create new, more complete and correct assemblies. Additional physical mapping, and an incorporation of information about gene structure, can be used to improve assignment of scaffolds to chromosomes. In addition, mRNA-sequence data can be used to economically acquire transcriptome information, which can be used for annotation. Some highly polymorphic and complex regions, for example MHC class I and immunoglobulin loci, will require extra effort to properly assemble and annotate. However, for the vast majority of genes, a modest investment in money, and a somewhat greater investment in time, can greatly improve assemblies and annotations sufficient to produce true, reference grade nonhuman primate genomes. Such resources can reasonably be expected to transform nonhuman primate research. PMID:24174438

  1. Lynx web services for annotations and systems analysis of multi-gene disorders.

    PubMed

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. PMID:24948611

  2. Lynx web services for annotations and systems analysis of multi-gene disorders

    PubMed Central

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J.; Foster, Ian T.; Gilliam, T. Conrad; Maltsev, Natalia

    2014-01-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. PMID:24948611

  3. Systems Theory and Communication. Annotated Bibliography.

    ERIC Educational Resources Information Center

    Covington, William G., Jr.

    This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)

  4. Annotated Bibliography, Grades K-6.

    ERIC Educational Resources Information Center

    Massachusetts Dept. of Education, Boston. Bureau of Nutrition Education and School Food Services.

    This annotated bibliography on nutrition is for the use of teachers at the elementary grade level. It contains a list of books suitable for reading about nutrition and foods for pupils from kindergarten through the sixth grade. Films and audiovisual presentations for classroom use are also listed. The names and addresses from which these materials…

  5. Vietnamese Amerasians: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Johnson, Mark C.; And Others

    This annotated bibliography on Vietnamese Amerasians includes primary and secondary sources as well as reviews of three documentary films. Sources were selected in order to provide an overview of the historical and political context of Amerasian resettlement and a review of the scant available research on coping and adaptation with this…

  6. Radiocarbon Dating: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Fortine, Suellen

    This selective annotated bibliography covers various sources of information on the radiocarbon dating method, including journal articles, conference proceedings, and reports, reflecting the most important and useful sources of the last 25 years. The bibliography is divided into five parts--general background on radiocarbon, radiocarbon dating,…

  7. Instructional Materials Centers; Annotated Bibliography.

    ERIC Educational Resources Information Center

    Poli, Rosario, Comp.

    An annotated bibliography lists 74 articles and reports on instructional materials centers (IMC) which appeared from 1967-70. The articles deal with such topics as the purposes of an IMC, guidelines for setting up an IMC, and the relationship of an IMC to technology. Most articles deal with use of an IMC on an elementary or secondary level, but…

  8. BIBLIOTHERAPY--AN ANNOTATED BIBLIOGRAPHY.

    ERIC Educational Resources Information Center

    RIGGS, CORINNE W.

    THIS ANNOTATED BIBLIOGRAPHY ON BIBLIOTHERAPY IS COMPOSED OF 138 CITATIONS RANGING IN DATE FROM 1936 TO 1967. IT IS DESIGNED TO AID TEACHERS AND LIBRARIANS IN MODIFYING THE ATTITUDES AND BEHAVIOR OF BOYS AND GIRLS. ITS LISTINGS ARE ARRANGED ALPHABETICALLY ACCORDING TO AUTHOR UNDER THE GENERAL DIVISIONS OF BOOKS, PERIODICALS, AND UNPUBLISHED…

  9. MSDAC Resource Library Annotated Bibliography.

    ERIC Educational Resources Information Center

    Watson, Cristel; And Others

    This annotated bibliography lists books, films, filmstrips, recordings, and booklets on sex equity. Entries are arranged according to the following topics: career resources, curriculum resources, management, sex equity, sex roles, women's studies, student activities, and sex-fair fiction. Included in each entry are name of author, editor or…

  10. Teacher Evaluation: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    McKenna, Bernard H.; And Others

    In his introduction to the 86-item annotated bibliography by Mueller and Poliakoff, McKenna discusses his views on teacher evaluation and his impressions of the documents cited. He observes, in part, that the current concern is with the process of evaluation and that most researchers continue to believe that student achievement is the most…

  11. Music Analysis: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Fink, Michael

    One hundred and forty citations comprise this annotated bibliography of books, articles, and selected dissertations that encompass trends in music theory and k-16 music education since the late 19th century. Special emphasis is upon writings since the 1950's. During earlier development, music analysts concentrated upon the elements of music (i.e.,…

  12. Teacher Aides; An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Marin County Public Schools, Corte Madera, CA.

    This annotated bibliography lists 40 items, published between 1966 and 1971, that have to do with teacher aides. The listing is arranged alphabetically by author. In addition to the abstract and standard bibliographic information, addresses where the material can be purchased are often included. The items cited include handbooks, research studies,…

  13. Staff Differentiation. An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Marin County Superintendent of Schools, Corte Madera, CA.

    This annotated bibliography reviews selected literature focusing on the concept of staff differentiation. Included are 62 items (dated 1966-1970), along with a list of mailing addresses where copies of individual items can be obtained. Also a list of 31 staff differentiation projects receiving financial assistance from the U.S. Office of Education…

  14. Infant Feeding: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Crowhurst, Christine Marie, Comp.; Kumer, Bonnie Lee, Comp.

    Intended for parents, health professionals and allied health workers, and others involved in caring for infants and young children, this annotated bibliography brings together in one selective listing a review of over 700 current publications related to infant feeding. Reflecting current knowledge in infant feeding, the bibliography has as its…

  15. English Language Learners: Annotated Bibliography

    ERIC Educational Resources Information Center

    Hector-Mason, Anestine; Bardack, Sarah

    2010-01-01

    This annotated bibliography represents a first step toward compiling a comprehensive overview of current research on issues related to English language learners (ELLs). It is intended to be a resource for researchers, policymakers, administrators, and educators who are engaged in efforts to bridge the divide between research, policy, and practice…

  16. Workforce Reductions. An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Hickok, Thomas A.; Hickok, Thomas A.

    This report, which is based on a review of practitioner-oriented sources and scholarly journals, uses a three-part framework to organize annotated bibliographies that, together, list a total of 104 sources that provide the following three perspectives on work force reduction issues: organizational, organizational-individual relationship, and…

  17. Service Integration: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Chaudry, Ajay; And Others

    This annotated bibliography describes 53 books, papers, and articles written about efforts toward integrating and improving human services for children, youth, and families living in poverty. The bibliography has been developed for individuals working on and interested in service integration, including policymakers, program administrators,…

  18. Appalachian Women. An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Hamm, Mary Margo

    This bibliography compiles annotations of 178 books, journal articles, ERIC documents, and dissertations on Appalachian women and their social, cultural, and economic environment. Entries were published 1966-93 and are listed in the following categories: (1) authors and literary criticism; (2) bibliographies and resource guides; (3) economics,…

  19. Determining similarity of scientific entities in annotation datasets.

    PubMed

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  20. Determining similarity of scientific entities in annotation datasets

    PubMed Central

    Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

    2015-01-01

    Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057

  1. Next generation models for storage and representation of microbial biological annotation

    PubMed Central

    2010-01-01

    Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to

  2. Next Generation Models for Storage and Representation of Microbial Biological Annotation

    SciTech Connect

    Quest, Daniel J; Land, Miriam L; Brettin, Thomas S; Cottingham, Robert W

    2010-01-01

    Background Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. Results Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. Conclusions The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to

  3. Adapting content-based image retrieval techniques for the semantic annotation of medical images.

    PubMed

    Kumar, Ashnil; Dyer, Shane; Kim, Jinman; Li, Changyang; Leong, Philip H W; Fulham, Michael; Feng, Dagan

    2016-04-01

    The automatic annotation of medical images is a prerequisite for building comprehensive semantic archives that can be used to enhance evidence-based diagnosis, physician education, and biomedical research. Annotation also has important applications in the automatic generation of structured radiology reports. Much of the prior research work has focused on annotating images with properties such as the modality of the image, or the biological system or body region being imaged. However, many challenges remain for the annotation of high-level semantic content in medical images (e.g., presence of calcification, vessel obstruction, etc.) due to the difficulty in discovering relationships and associations between low-level image features and high-level semantic concepts. This difficulty is further compounded by the lack of labelled training data. In this paper, we present a method for the automatic semantic annotation of medical images that leverages techniques from content-based image retrieval (CBIR). CBIR is a well-established image search technology that uses quantifiable low-level image features to represent the high-level semantic content depicted in those images. Our method extends CBIR techniques to identify or retrieve a collection of labelled images that have similar low-level features and then uses this collection to determine the best high-level semantic annotations. We demonstrate our annotation method using retrieval via weighted nearest-neighbour retrieval and multi-class classification to show that our approach is viable regardless of the underlying retrieval strategy. We experimentally compared our method with several well-established baseline techniques (classification and regression) and showed that our method achieved the highest accuracy in the annotation of liver computed tomography (CT) images. PMID:26890880

  4. New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

    PubMed

    Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

    2009-05-01

    The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation. PMID:19396742

  5. Annotation and Classification of Argumentative Writing Revisions

    ERIC Educational Resources Information Center

    Zhang, Fan; Litman, Diane

    2015-01-01

    This paper explores the annotation and classification of students' revision behaviors in argumentative writing. A sentence-level revision schema is proposed to capture why and how students make revisions. Based on the proposed schema, a small corpus of student essays and revisions was annotated. Studies show that manual annotation is reliable with…

  6. Alcohol Education Materials; An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Milgram, Gail Gleason

    This 873-item annotated bibliography cites books, pamphlets, leaflets, and other materials produced for education about alcohol from 1950 to May 1973. The major part of each annotation is a brief summary of the contents. The annotation also contains a statement of orientation or type of presentation and evaluative comments. Each item is classified…

  7. Graphene for Biomedical Implants

    NASA Astrophysics Data System (ADS)

    Moore, Thomas; Podila, Ramakrishna; Alexis, Frank; Rao, Apparao; Clemson Bioengineering Team; Clemson Physics Team

    2013-03-01

    In this study, we used graphene, a one-atom thick sheet of carbon atoms, to modify the surfaces of existing implant materials to enhance both bio- and hemo-compatibility. This novel effort meets all functional criteria for a biomedical implant coating as it is chemically inert, atomically smooth and highly durable, with the potential for greatly enhancing the effectiveness of such implants. Specifically, graphene coatings on nitinol, a widely used implant and stent material, showed that graphene coated nitinol (Gr-NiTi) supports excellent smooth muscle and endothelial cell growth leading to better cell proliferation. We further determined that the serum albumin adsorption on Gr-NiTi is greater than that of fibrinogen, an important and well understood criterion for promoting a lower thrombosis rate. These hemo-and biocompatible properties and associated charge transfer mechanisms, along with high strength, chemical inertness and durability give graphene an edge over most antithrombogenic coatings for biomedical implants and devices.

  8. Ethics and biomedical information.

    PubMed

    France, F H

    1998-03-01

    Ethical rules are similar for physicians in most countries that follow the Hippocratic oath. They have no formal legal force, but can be used as a reference to provide answers to solve individual cases. It appears erroneous to believe that privacy is about information. It is about relationship. In medicine, there is a contract between a patient and a physician, where health care personnel has to respect secrecy, while integrity and availability of information should be obtained for continuity of care. These somewhat contradictory objectives have to be applied very carefully to computerised biomedical information. Ethical principles have to be made clear to everyone, and society should take the necessary steps to organise their enforcement. Several examples are given in the delivery of health care, telediagnosis, patient follow-up. clinical research as well as possible breakthroughs that could jeopardise privacy, using biomedical information. PMID:9723809

  9. Biomedical Applications of Graphene

    PubMed Central

    Shen, He; Zhang, Liming; Liu, Min; Zhang, Zhijun

    2012-01-01

    Graphene exhibits unique 2-D structure and exceptional phyiscal and chemical properties that lead to many potential applications. Among various applications, biomedical applications of graphene have attracted ever-increasing interests over the last three years. In this review, we present an overview of current advances in applications of graphene in biomedicine with focus on drug delivery, cancer therapy and biological imaging, together with a brief discussion on the challenges and perspectives for future research in this field. PMID:22448195

  10. VariOtator, a Software Tool for Variation Annotation with the Variation Ontology.

    PubMed

    Schaafsma, Gerard C P; Vihinen, Mauno

    2016-04-01

    The Variation Ontology (VariO) is used for describing and annotating types, effects, consequences, and mechanisms of variations. To facilitate easy and consistent annotations, the online application VariOtator was developed. For variation type annotations, VariOtator is fully automated, accepting variant descriptions in Human Genome Variation Society (HGVS) format, and generating VariO terms, either with or without full lineage, that is, all parent terms. When a coding DNA variant description with a reference sequence is provided, VariOtator checks the description first with Mutalyzer and then generates the predicted RNA and protein descriptions with their respective VariO annotations. For the other sublevels, function, structure, and property, annotations cannot be automated, and VariOtator generates annotation based on provided details. For VariO terms relating to structure and property, one can use attribute terms as modifiers and evidence code terms for annotating experimental evidence. There is an online batch version, and stand-alone batch versions to be used with a Leiden Open Variation Database (LOVD) download file. A SOAP Web service allows client programs to access VariOtator programmatically. Thus, systematic variation effect and type annotations can be efficiently generated to allow easy use and integration of variations and their consequences. PMID:26773573