named entity extraction: Topics by Science.gov

Sample records for named entity extraction

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

PubMed

Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong

2016-01-01

Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts

PubMed Central

Zhang, Shaodian; Elhadad, Nóemie

2013-01-01

Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592
Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts

NASA Astrophysics Data System (ADS)

Tongtep, Nattapong; Theeramunkong, Thanaruk

Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.
Chemical named entities recognition: a review on approaches and applications

PubMed Central

2014-01-01

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to “text mine” these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted. PMID:24834132
Chemical named entities recognition: a review on approaches and applications.

PubMed

Eltyeb, Safaa; Salim, Naomie

2014-01-01

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
Improving Information Extraction and Translation Using Component Interactions

DTIC Science & Technology

2008-01-01

74 7. CASE STUDY ON MONOLINGUAL INTERACTION.....................................................................76 7.1 IMPROVING NAME TAGGING BY...interactions described above focused on the monolingual analysis pipeline. (Huang and Vogel, 2002) presented a cross-lingual joint inference example to...improve the extracted named entity translation dictionary and the entity annotation in a bilingual 22 training corpus. They used a more
Biomedical named entity extraction: some issues of corpus compatibilities.

PubMed

Ekbal, Asif; Saha, Sriparna; Sikdar, Utpal Kumar

2013-01-01

Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection "Features for named entity extraction"; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.
Building a protein name dictionary from full text: a machine learning term extraction approach.

PubMed

Shi, Lei; Campagne, Fabien

2005-04-07

The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.
Building a protein name dictionary from full text: a machine learning term extraction approach

PubMed Central

Shi, Lei; Campagne, Fabien

2005-01-01

Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129
A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

PubMed Central

2017-01-01

Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863
Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks.

PubMed

Wei, Qikang; Chen, Tao; Xu, Ruifeng; He, Yulan; Gui, Lin

2016-01-01

The recognition of disease and chemical named entities in scientific articles is a very important subtask in information extraction in the biomedical domain. Due to the diversity and complexity of disease names, the recognition of named entities of diseases is rather tougher than those of chemical names. Although there are some remarkable chemical named entity recognition systems available online such as ChemSpot and tmChem, the publicly available recognition systems of disease named entities are rare. This article presents a system for disease named entity recognition (DNER) and normalization. First, two separate DNER models are developed. One is based on conditional random fields model with a rule-based post-processing module. The other one is based on the bidirectional recurrent neural networks. Then the named entities recognized by each of the DNER model are fed into a support vector machine classifier for combining results. Finally, each recognized disease named entity is normalized to a medical subject heading disease name by using a vector space model based method. Experimental results show that using 1000 PubMed abstracts for training, our proposed system achieves an F1-measure of 0.8428 at the mention level and 0.7804 at the concept level, respectively, on the testing data of the chemical-disease relation task in BioCreative V.Database URL: http://219.223.252.210:8080/SS/cdr.html. © The Author(s) 2016. Published by Oxford University Press.
NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles.

PubMed

Tariq, Amara; Karim, Asim; Foroosh, Hassan

2017-10-01

Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.
Cross domains Arabic named entity recognition system

NASA Astrophysics Data System (ADS)

Al-Ahmari, S. Saad; Abdullatif Al-Johar, B.

2016-07-01

Named Entity Recognition (NER) plays an important role in many Natural Language Processing (NLP) applications such as; Information Extraction (IE), Question Answering (QA), Text Clustering, Text Summarization and Word Sense Disambiguation. This paper presents the development and implementation of domain independent system to recognize three types of Arabic named entities. The system works based on a set of domain independent grammar-rules along with Arabic part of speech tagger in addition to gazetteers and lists of trigger words. The experimental results shown, that the system performed as good as other systems with better results in some cases of cross-domains corpora.
SPECTRa-T: machine-based data extraction and semantic searching of chemistry e-theses.

PubMed

Downing, Jim; Harvey, Matt J; Morgan, Peter B; Murray-Rust, Peter; Rzepa, Henry S; Stewart, Diana C; Tonge, Alan P; Townsend, Joe A

2010-02-22

The SPECTRa-T project has developed text-mining tools to extract named chemical entities (NCEs), such as chemical names and terms, and chemical objects (COs), e.g., experimental spectral assignments and physical chemistry properties, from electronic theses (e-theses). Although NCEs were readily identified within the two major document formats studied, only the use of structured documents enabled identification of chemical objects and their association with the relevant chemical entity (e.g., systematic chemical name). A corpus of theses was analyzed and it is shown that a high degree of semantic information can be extracted from structured documents. This integrated information has been deposited in a persistent Resource Description Framework (RDF) triple-store that allows users to conduct semantic searches. The strength and weaknesses of several document formats are reviewed.
A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

PubMed

Jiang, Min; Chen, Yukun; Liu, Mei; Rosenbloom, S Trent; Mani, Subramani; Denny, Joshua C; Xu, Hua

2011-01-01

The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
Improving the Accuracy of Attribute Extraction using the Relatedness between Attribute Values

NASA Astrophysics Data System (ADS)

Bollegala, Danushka; Tani, Naoki; Ishizuka, Mitsuru

Extracting attribute-values related to entities from web texts is an important step in numerous web related tasks such as information retrieval, information extraction, and entity disambiguation (namesake disambiguation). For example, for a search query that contains a personal name, we can not only return documents that contain that personal name, but if we have attribute-values such as the organization for which that person works, we can also suggest documents that contain information related to that organization, thereby improving the user's search experience. Despite numerous potential applications of attribute extraction, it remains a challenging task due to the inherent noise in web data -- often a single web page contains multiple entities and attributes. We propose a graph-based approach to select the correct attribute-values from a set of candidate attribute-values extracted for a particular entity. First, we build an undirected weighted graph in which, attribute-values are represented by nodes, and the edge that connects two nodes in the graph represents the degree of relatedness between the corresponding attribute-values. Next, we find the maximum spanning tree of this graph that connects exactly one attribute-value for each attribute-type. The proposed method outperforms previously proposed attribute extraction methods on a dataset that contains 5000 web pages.
Information extraction system

DOEpatents

Lemmond, Tracy D; Hanley, William G; Guensche, Joseph Wendell; Perry, Nathan C; Nitao, John J; Kidwell, Paul Brandon; Boakye, Kofi Agyeman; Glaser, Ron E; Prenger, Ryan James

2014-05-13

An information extraction system and methods of operating the system are provided. In particular, an information extraction system for performing meta-extraction of named entities of people, organizations, and locations as well as relationships and events from text documents are described herein.
Developing a hybrid dictionary-based bio-entity recognition technique.

PubMed

Song, Min; Yu, Hwanjo; Han, Wook-Shin

2015-01-01

Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.
Developing a hybrid dictionary-based bio-entity recognition technique

PubMed Central

2015-01-01

Background Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall. PMID:26043907
PKDE4J: Entity and relation extraction for public knowledge discovery.

PubMed

Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young

2015-10-01

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.

Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction.

PubMed

Le, Hoang-Quynh; Tran, Mai-Vu; Dang, Thanh Hai; Ha, Quang-Thuy; Collier, Nigel

2016-07-01

The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization. In the chemical-induced disease (CID) relation extraction phase, we proposed a pipeline that includes a coreference resolution module and a Support Vector Machine relation extraction model. The former module utilized a multi-pass sieve to extend entity recall. In this article, the UET-CAM system was improved by adding a 'silver' CID corpus to train the prediction model. This silver standard corpus of more than 50 thousand sentences was automatically built based on the Comparative Toxicogenomics Database (CTD) database. We evaluated our method on the CDR test set. Results showed that our system could reach the state of the art performance with F1 of 82.44 for the DNER task and 58.90 for the CID task. Analysis demonstrated substantial benefits of both the multi-pass sieve coreference resolution method (F1 + 4.13%) and the silver CID corpus (F1 +7.3%).Database URL: SilverCID-The silver-standard corpus for CID relation extraction is freely online available at: https://zenodo.org/record/34530 (doi:10.5281/zenodo.34530). © The Author(s) 2016. Published by Oxford University Press.
Clinical Named Entity Recognition Using Deep Learning Models.

PubMed

Wu, Yonghui; Jiang, Min; Xu, Jun; Zhi, Degui; Xu, Hua

2017-01-01

Clinical Named Entity Recognition (NER) is a critical natural language processing (NLP) task to extract important concepts (named entities) from clinical narratives. Researchers have extensively investigated machine learning models for clinical NER. Recently, there have been increasing efforts to apply deep learning models to improve the performance of current clinical NER systems. This study examined two popular deep learning architectures, the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), to extract concepts from clinical texts. We compared the two deep neural network architectures with three baseline Conditional Random Fields (CRFs) models and two state-of-the-art clinical NER systems using the i2b2 2010 clinical concept extraction corpus. The evaluation results showed that the RNN model trained with the word embeddings achieved a new state-of-the- art performance (a strict F1 score of 85.94%) for the defined clinical NER task, outperforming the best-reported system that used both manually defined and unsupervised learning features. This study demonstrates the advantage of using deep neural network architectures for clinical concept extraction, including distributed feature representation, automatic feature learning, and long-term dependencies capture. This is one of the first studies to compare the two widely used deep learning models and demonstrate the superior performance of the RNN model for clinical NER.
Clinical Named Entity Recognition Using Deep Learning Models

PubMed Central

Wu, Yonghui; Jiang, Min; Xu, Jun; Zhi, Degui; Xu, Hua

2017-01-01

Clinical Named Entity Recognition (NER) is a critical natural language processing (NLP) task to extract important concepts (named entities) from clinical narratives. Researchers have extensively investigated machine learning models for clinical NER. Recently, there have been increasing efforts to apply deep learning models to improve the performance of current clinical NER systems. This study examined two popular deep learning architectures, the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), to extract concepts from clinical texts. We compared the two deep neural network architectures with three baseline Conditional Random Fields (CRFs) models and two state-of-the-art clinical NER systems using the i2b2 2010 clinical concept extraction corpus. The evaluation results showed that the RNN model trained with the word embeddings achieved a new state-of-the- art performance (a strict F1 score of 85.94%) for the defined clinical NER task, outperforming the best-reported system that used both manually defined and unsupervised learning features. This study demonstrates the advantage of using deep neural network architectures for clinical concept extraction, including distributed feature representation, automatic feature learning, and long-term dependencies capture. This is one of the first studies to compare the two widely used deep learning models and demonstrate the superior performance of the RNN model for clinical NER. PMID:29854252
Character-level neural network for biomedical named entity recognition.

PubMed

Gridach, Mourad

2017-06-01

Biomedical named entity recognition (BNER), which extracts important named entities such as genes and proteins, is a challenging task in automated systems that mine knowledge in biomedical texts. The previous state-of-the-art systems required large amounts of task-specific knowledge in the form of feature engineering, lexicons and data pre-processing to achieve high performance. In this paper, we introduce a novel neural network architecture that benefits from both word- and character-level representations automatically, by using a combination of bidirectional long short-term memory (LSTM) and conditional random field (CRF) eliminating the need for most feature engineering tasks. We evaluate our system on two datasets: JNLPBA corpus and the BioCreAtIvE II Gene Mention (GM) corpus. We obtained state-of-the-art performance by outperforming the previous systems. To the best of our knowledge, we are the first to investigate the combination of deep neural networks, CRF, word embeddings and character-level representation in recognizing biomedical named entities. Copyright © 2017 Elsevier Inc. All rights reserved.
Rapid Training of Information Extraction with Local and Global Data Views

DTIC Science & Technology

2012-05-01

relation type extension system based on active learning a relation type extension system based on semi-supervised learning, and a crossdomain...bootstrapping system for domain adaptive named entity extraction. The active learning procedure adopts features extracted at the sentence level as the local
A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text

PubMed Central

Basaruddin, T.

2016-01-01

One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, that is, MLP. The second technique involves two deep network classifiers, that is, DBN and SAE. The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645. PMID:27843447
BANNER: an executable survey of advances in biomedical named entity recognition.

PubMed

Leaman, Robert; Gonzalez, Graciela

2008-01-01

There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freely-available, open source system implementing the advances described in the literature. In this paper we present BANNER, an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machine-learning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.
FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.

PubMed

Bachman, John A; Gyori, Benjamin M; Sorger, Peter K

2018-06-28

For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents

NASA Astrophysics Data System (ADS)

Thiam, Mouhamadou; Bennacer, Nacéra; Pernelle, Nathalie; Lô, Moussa

SHIRIis an ontology-based system for integration of semi-structured documents related to a specific domain. The system’s purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic annotation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental behaviour of Extract-Align algorithm enriches the ontology and the number of terms (or named entities) aligned directly with the ontology increases.
A method for named entity normalization in biomedical articles: application to diseases and plants.

PubMed

Cho, Hyejin; Choi, Wonjun; Lee, Hyunju

2017-10-13

In biomedical articles, a named entity recognition (NER) technique that identifies entity names from texts is an important element for extracting biological knowledge from articles. After NER is applied to articles, the next step is to normalize the identified names into standard concepts (i.e., disease names are mapped to the National Library of Medicine's Medical Subject Headings disease terms). In biomedical articles, many entity normalization methods rely on domain-specific dictionaries for resolving synonyms and abbreviations. However, the dictionaries are not comprehensive except for some entities such as genes. In recent years, biomedical articles have accumulated rapidly, and neural network-based algorithms that incorporate a large amount of unlabeled data have shown considerable success in several natural language processing problems. In this study, we propose an approach for normalizing biological entities, such as disease names and plant names, by using word embeddings to represent semantic spaces. For diseases, training data from the National Center for Biotechnology Information (NCBI) disease corpus and unlabeled data from PubMed abstracts were used to construct word representations. For plants, a training corpus that we manually constructed and unlabeled PubMed abstracts were used to represent word vectors. We showed that the proposed approach performed better than the use of only the training corpus or only the unlabeled data and showed that the normalization accuracy was improved by using our model even when the dictionaries were not comprehensive. We obtained F-scores of 0.808 and 0.690 for normalizing the NCBI disease corpus and manually constructed plant corpus, respectively. We further evaluated our approach using a data set in the disease normalization task of the BioCreative V challenge. When only the disease corpus was used as a dictionary, our approach significantly outperformed the best system of the task. The proposed approach shows robust performance for normalizing biological entities. The manually constructed plant corpus and the proposed model are available at http://gcancer.org/plant and http://gcancer.org/normalization , respectively.
Identifying interactions between chemical entities in biomedical text.

PubMed

Lamurias, Andre; Ferreira, João D; Couto, Francisco M

2014-10-23

Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, \\"Identifying Interactions between Chemical Entities\\" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to state-of-the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.
Identifying interactions between chemical entities in biomedical text.

PubMed

Lamurias, Andre; Ferreira, João D; Couto, Francisco M

2014-12-01

Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, "Identifying Interactions between Chemical Entities" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to stateof- the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.
Structured prediction models for RNN based sequence labeling in clinical text.

PubMed

Jagannatha, Abhyuday N; Yu, Hong

2016-11-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Structured prediction models for RNN based sequence labeling in clinical text

PubMed Central

Jagannatha, Abhyuday N; Yu, Hong

2016-01-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040
Spatial distribution and influence factors of interprovincial terrestrial physical geographical names in China

NASA Astrophysics Data System (ADS)

Zhang, S.; Wang, Y.; Ju, H.

2017-12-01

The interprovincial terrestrial physical geographical entities are the key areas of regional integrated management. Based on toponomy dictionaries and different thematic maps, the attributes and the spatial extent of the interprovincial terrestrial physical geographical names (ITPGN, including terrain ITPGN and water ITPGN) were extracted. The coefficient of variation and Moran's I were combined together to measure the spatial variation and spatial association of ITPGN. The influencing factors of the distribution of ITPGN and the implications for the regional management were further discussed. The results showed that 11325 ITPGN were extracted, including 7082 terrain ITPGN and 4243 water ITPGN. Hunan Province had the largest number of ITPGN in China, and Shanghai had the smallest number. The spatial variance of the terrain ITPGN was larger than that of the water ITPGN, and the ITPGN showed a significant agglomeration phenomenon in the southern part of China. Further analysis showed that the number of ITPGN was positively related with the relative elevation and the population where the relative elevation was lower than 2000m and the population was less than 50 million. But the number of ITPGN showed a negative relationship with the two factors when their values became larger, indicating a large number of unnamed entities existed in complex terrain areas and a decreasing number of terrestrial physical geographical entities in densely populated area. Based on these analysis, we suggest the government take the ITPGN as management units to realize a balance development between different parts of the entities and strengthen the geographical names census and the nomination of unnamed interprovincial physical geographical entities. This study also demonstrated that the methods of literature survey, coefficient of variation and Moran's I can be combined to enhance the understanding of the spatial pattern of ITPGN.
A Statistical Model for Multilingual Entity Detection and Tracking

DTIC Science & Technology

2004-01-01

tomatic Content Extraction ( ACE ) evaluation achieved top-tier results in all three evaluation languages. 1 Introduction Detecting entities, whether named...of com- bining the detected mentions into groups of references to the same object. The work presented here is motivated by the ACE eval- uation...Entropy (MaxEnt henceforth) (Berger et al., 1996) and Robust Risk Minimization (RRM henceforth) 1For a description of the ACE program see http
#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs.

PubMed

Schedl, Markus

2012-01-01

Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures , normalization techniques , query schemes , index term sets , and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts . We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com. For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb. We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.
Chemical name extraction based on automatic training data generation and rich feature set.

PubMed

Yan, Su; Spangler, W Scott; Chen, Ying

2013-01-01

The automation of extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable and good quality data to train a reliable entity extraction model. Another difficulty is the selection of informative features of chemical names, since comprehensive domain knowledge on chemistry nomenclature is required. Leveraging random text generation techniques, we explore the idea of automatically creating training sets for the task of chemical name extraction. Assuming the availability of an incomplete list of chemical names, called a dictionary, we are able to generate well-controlled, random, yet realistic chemical-like training documents. We statistically analyze the construction of chemical names based on the incomplete dictionary, and propose a series of new features, without relying on any domain knowledge. Compared to state-of-the-art models learned from manually labeled data and domain knowledge, our solution shows better or comparable results in annotating real-world data with less human effort. Moreover, we report an interesting observation about the language for chemical names. That is, both the structural and semantic components of chemical names follow a Zipfian distribution, which resembles many natural languages.
Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives

PubMed Central

Polepalli Ramesh, Balaji; Belknap, Steven M; Li, Zuofeng; Frid, Nadya; West, Dennis P

2014-01-01

Background The Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) is a repository of spontaneously-reported adverse drug events (ADEs) for FDA-approved prescription drugs. FAERS reports include both structured reports and unstructured narratives. The narratives often include essential information for evaluation of the severity, causality, and description of ADEs that are not present in the structured data. The timely identification of unknown toxicities of prescription drugs is an important, unsolved problem. Objective The objective of this study was to develop an annotated corpus of FAERS narratives and biomedical named entity tagger to automatically identify ADE related information in the FAERS narratives. Methods We developed an annotation guideline and annotate medication information and adverse event related entities on 122 FAERS narratives comprising approximately 23,000 word tokens. A named entity tagger using supervised machine learning approaches was built for detecting medication information and adverse event entities using various categories of features. Results The annotated corpus had an agreement of over .9 Cohen’s kappa for medication and adverse event entities. The best performing tagger achieves an overall performance of 0.73 F1 score for detection of medication, adverse event and other named entities. Conclusions In this study, we developed an annotated corpus of FAERS narratives and machine learning based models for automatically extracting medication and adverse event information from the FAERS narratives. Our study is an important step towards enriching the FAERS data for postmarketing pharmacovigilance. PMID:25600332
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

PubMed

Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

2012-10-01

In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.

Linked Data for Software Security Concepts and Vulnerability Descriptions

DTIC Science & Technology

2013-07-01

named entity (NE) extractors such as DBpedia Spotlight, Alchemy API1, Extractiv2, OpenCalais3 and Zemanta were compared for their overall performance...presents substantial agreement for URI dis- ambiguation. Alchemy API, although preserving good performance in NE extraction and 1http://www.alchemyapi.com
An annotated corpus with nanomedicine and pharmacokinetic parameters

PubMed Central

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. PMID:29066897
Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection

PubMed Central

Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Hicks, Amanda; Hanna, Josh; Guo, Yi; He, Zhe; Bian, Jiang

2017-01-01

Obesity is associated with increased risks of various types of cancer, as well as a wide range of other chronic diseases. On the other hand, access to health information activates patient participation, and improve their health outcomes. However, existing online information on obesity and its relationship to cancer is heterogeneous ranging from pre-clinical models and case studies to mere hypothesis-based scientific arguments. A formal knowledge representation (i.e., a semantic knowledge base) would help better organizing and delivering quality health information related to obesity and cancer that consumers need. Nevertheless, current ontologies describing obesity, cancer and related entities are not designed to guide automatic knowledge base construction from heterogeneous information sources. Thus, in this paper, we present methods for named-entity recognition (NER) to extract biomedical entities from scholarly articles and for detecting if two biomedical entities are related, with the long term goal of building a obesity-cancer knowledge base. We leverage both linguistic and statistical approaches in the NER task, which supersedes the state-of-the-art results. Further, based on statistical features extracted from the sentences, our method for relation detection obtains an accuracy of 99.3% and a f-measure of 0.993. PMID:28503356
Human-machine interaction to disambiguate entities in unstructured text and structured datasets

NASA Astrophysics Data System (ADS)

Ward, Kevin; Davenport, Jack

2017-05-01

Creating entity network graphs is a manual, time consuming process for an intelligence analyst. Beyond the traditional big data problems of information overload, individuals are often referred to by multiple names and shifting titles as they advance in their organizations over time which quickly makes simple string or phonetic alignment methods for entities insufficient. Conversely, automated methods for relationship extraction and entity disambiguation typically produce questionable results with no way for users to vet results, correct mistakes or influence the algorithm's future results. We present an entity disambiguation tool, DRADIS, which aims to bridge the gap between human-centric and machinecentric methods. DRADIS automatically extracts entities from multi-source datasets and models them as a complex set of attributes and relationships. Entities are disambiguated across the corpus using a hierarchical model executed in Spark allowing it to scale to operational sized data. Resolution results are presented to the analyst complete with sourcing information for each mention and relationship allowing analysts to quickly vet the correctness of results as well as correct mistakes. Corrected results are used by the system to refine the underlying model allowing analysts to optimize the general model to better deal with their operational data. Providing analysts with the ability to validate and correct the model to produce a system they can trust enables them to better focus their time on producing higher quality analysis products.
An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

PubMed

Luo, Ling; Yang, Zhihao; Yang, Pei; Zhang, Yin; Wang, Lei; Lin, Hongfei; Wang, Jian

2018-04-15

In biomedical research, chemical is an important class of entities, and chemical named entity recognition (NER) is an important task in the field of biomedical information extraction. However, most popular chemical NER methods are based on traditional machine learning and their performances are heavily dependent on the feature engineering. Moreover, these methods are sentence-level ones which have the tagging inconsistency problem. In this paper, we propose a neural network approach, i.e. attention-based bidirectional Long Short-Term Memory with a conditional random field layer (Att-BiLSTM-CRF), to document-level chemical NER. The approach leverages document-level global information obtained by attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. It achieves better performances with little feature engineering than other state-of-the-art methods on the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus and the BioCreative V chemical-disease relation (CDR) task corpus (the F-scores of 91.14 and 92.57%, respectively). Data and code are available at https://github.com/lingluodlut/Att-ChemdNER. yangzh@dlut.edu.cn or wangleibihami@gmail.com. Supplementary data are available at Bioinformatics online.
AutoMap User’s Guide

DTIC Science & Technology

2006-10-01

Hierarchy of Pre-Processing Techniques 3. NLP (Natural Language Processing) Utilities 3.1 Named-Entity Recognition 3.1.1 Example for Named-Entity... Recognition 3.2 Symbol RemovalN-Gram Identification: Bi-Grams 4. Stemming 4.1 Stemming Example 5. Delete List 5.1 Open a Delete List 5.1.1 Small...iterative and involves several key processes: • Named-Entity Recognition Named-Entity Recognition is an Automap feature that allows you to
Segregation of anterior temporal regions critical for retrieving names of unique and nonunique entities reflects underlying long-range connectivity

PubMed Central

Mehta, Sonya; Inoue, Kayo; Rudrauf, David; Damasio, Hanna; Tranel, Daniel; Grabowski, Thomas

2015-01-01

Lesion-deficit studies support the hypothesis that the left anterior temporal lobe (ATL) plays a critical role in retrieving names of concrete entities. They further suggest that different regions of the left ATL process different conceptual categories. Here we test the specificity of these relationships and whether the anatomical segregation is related to the underlying organization of white matter connections. We reanalyzed data from a previous lesion study of naming and recognition across five categories of concrete entities. In voxelwise logistic regressions of lesion-deficit associations, we formally incorporated measures of disconnection of long-range association fiber tracts (FTs) and covaried for recognition and non-category specific naming deficits. We also performed fiber tractwise analyses to assess whether damage to specific FTs was preferentially associated with category-selective naming deficits. Damage to the basolateral ATL was associated with naming deficits for both unique (famous faces) and non-unique entities, whereas the damage to the temporal pole was associated with naming deficits for unique entities only. This segregation pattern remained after accounting for comorbid recognition deficits or naming deficits in other categories. The tractwise analyses showed that damage to the uncinate fasciculus was associated with naming impairments for unique entities, while damage to the inferior longitudinal fasciculus was associated with naming impairments for non-unique entities. Covarying for FT transection in voxelwise analyses rendered the cortical association for unique entities more focal. These results are consistent with the partial segregation of brain system support for name retrieval of unique and non-unique entities at both the level of cortical components and underlying white matter fiber bundles. Our study reconciles theoretic accounts of the functional organization of the left ATL by revealing both category-related processing and semantic hub sectors. PMID:26707082
Designing Rules for Accounting Transaction Identification based on Indonesian NLP

NASA Astrophysics Data System (ADS)

Iswandi, I.; Suwardi, I. S.; Maulidevi, N. U.

2017-03-01

Recording accounting transactions carried out by the evidence of the transactions. It can be invoices, receipts, letters of intent, electricity bill, telephone bill, etc. In this paper, we proposed design of rules to identify the entities located on the sales invoice. There are some entities identified in a sales invoice, namely : invoice date, company name, invoice number, product id, product name, quantity and total price. Identification this entities using named entity recognition method. The entities generated from the rules used as a basis for automation process of data input into the accounting system.
Entity-based Stochastic Analysis of Search Results for Query Expansion and Results Re-Ranking

DTIC Science & Technology

2015-11-20

pages) and struc- tured data (e.g. Linked Open Data ( LOD ) [8]) coexist in var- ious forms. An important observation is that entity names (like names of...the top-L (e.g. L = 1, 000) results are retrieved. Then, Named Entity Recognition (NER) is applied in these results for identifying LOD entities. In...the next (optional) step, more semantic information about the identified entities is retrieved from the LOD (like properties and related entities). A
Encoding of Fundamental Chemical Entities of Organic Reactivity Interest using chemical ontology and XML.

PubMed

Durairaj, Vijayasarathi; Punnaivanam, Sankar

2015-09-01

Fundamental chemical entities are identified in the context of organic reactivity and classified as appropriate concept classes namely ElectronEntity, AtomEntity, AtomGroupEntity, FunctionalGroupEntity and MolecularEntity. The entity classes and their subclasses are organized into a chemical ontology named "ChemEnt" for the purpose of assertion, restriction and modification of properties through entity relations. Individual instances of entity classes are defined and encoded as a library of chemical entities in XML. The instances of entity classes are distinguished with a unique notation and identification values in order to map them with the ontology definitions. A model GUI named Entity Table is created to view graphical representations of all the entity instances. The detection of chemical entities in chemical structures is achieved through suitable algorithms. The possibility of asserting properties to the entities at different levels and the mechanism of property flow within the hierarchical entity levels is outlined. Copyright © 2015 Elsevier Inc. All rights reserved.
Entity recognition in the biomedical domain using a hybrid approach.

PubMed

Basaldella, Marco; Furrer, Lenz; Tasso, Carlo; Rinaldi, Fabio

2017-11-09

This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a machine-learning classifier. First, the OGER entity recognizer, which has a bias towards high recall, annotates the terms that appear in selected domain ontologies. Subsequently, the Distiller framework uses this information as a feature for a machine learning algorithm to select the relevant entities only. For this step, we compare two different supervised machine-learning algorithms: Conditional Random Fields and Neural Networks. In an in-domain evaluation using the CRAFT corpus, we test the performance of the combined systems when recognizing chemicals, cell types, cellular components, biological processes, molecular functions, organisms, proteins, and biological sequences. Our best system combines dictionary-based candidate generation with Neural-Network-based filtering. It achieves an overall precision of 86% at a recall of 60% on the named entity recognition task, and a precision of 51% at a recall of 49% on the concept recognition task. These results are to our knowledge the best reported so far in this particular task.
42 CFR 455.104 - Disclosure by Medicaid providers and fiscal agents: Information on ownership and control.

Code of Federal Regulations, 2013 CFR

2013-10-01

..., fiscal agents, and managed care entities provide the following disclosures: (1)(i) The name and address... entity, fiscal agent, or managed care entity. The address for corporate entities must include as... disclosing entity as a spouse, parent, child, or sibling. (3) The name of any other disclosing entity (or...
Finding geospatial pattern of unstructured data by clustering routes

NASA Astrophysics Data System (ADS)

Boustani, M.; Mattmann, C. A.; Ramirez, P.; Burke, W.

2016-12-01

Today the majority of data generated has a geospatial context to it. Either in attribute form as a latitude or longitude, or name of location or cross referenceable using other means such as an external gazetteer or location service. Our research is interested in exploiting geospatial location and context in unstructured data such as that found on the web in HTML pages, images, videos, documents, and other areas, and in structured information repositories found on intranets, in scientific environments, and otherwise. We are working together on the DARPA MEMEX project to exploit open source software tools such as the Lucene Geo Gazetteer, Apache Tika, Apache Lucene, and Apache OpenNLP, to automatically extract, and make meaning out of geospatial information. In particular, we are interested in unstructured descriptors e.g., a phone number, or a named entity, and the ability to automatically learn geospatial paths related to these descriptors. For example, a particular phone number may represent an entity that travels on a monthly basis, according to easily identifiable and somes more difficult to track patterns. We will present a set of automatic techniques to extract descriptors, and then to geospatially infer their paths across unstructured data.
Segregation of anterior temporal regions critical for retrieving names of unique and non-unique entities reflects underlying long-range connectivity.

PubMed

Mehta, Sonya; Inoue, Kayo; Rudrauf, David; Damasio, Hanna; Tranel, Daniel; Grabowski, Thomas

2016-02-01

Lesion-deficit studies support the hypothesis that the left anterior temporal lobe (ATL) plays a critical role in retrieving names of concrete entities. They further suggest that different regions of the left ATL process different conceptual categories. Here we test the specificity of these relationships and whether the anatomical segregation is related to the underlying organization of white matter connections. We reanalyzed data from a previous lesion study of naming and recognition across five categories of concrete entities. In voxelwise logistic regressions of lesion-deficit associations, we formally incorporated measures of disconnection of long-range association fiber tracts (FTs) and covaried for recognition and non-category-specific naming deficits. We also performed fiber tractwise analyses to assess whether damage to specific FTs was preferentially associated with category-selective naming deficits. Damage to the basolateral ATL was associated with naming deficits for both unique (famous faces) and non-unique entities, whereas the damage to the temporal pole was associated with naming deficits for unique entities only. This segregation pattern remained after accounting for comorbid recognition deficits or naming deficits in other categories. The tractwise analyses showed that damage to the uncinate fasciculus (UNC) was associated with naming impairments for unique entities, while damage to the inferior longitudinal fasciculus (ILF) was associated with naming impairments for non-unique entities. Covarying for FT transection in voxelwise analyses rendered the cortical association for unique entities more focal. These results are consistent with the partial segregation of brain system support for name retrieval of unique and non-unique entities at both the level of cortical components and underlying white matter fiber bundles. Our study reconciles theoretic accounts of the functional organization of the left ATL by revealing both category-related processing and semantic hub sectors. Copyright © 2015 Elsevier Ltd. All rights reserved.
Naming unique entities in the semantic variant of primary progressive aphasia and Alzheimer's disease: Towards a better understanding of the semantic impairment.

PubMed

Montembeault, M; Brambati, S M; Joubert, S; Boukadi, M; Chapleau, M; Laforce, R Jr; Wilson, M A; Macoir, J; Rouleau, I

2017-01-27

While the semantic variant of primary progressive aphasia (svPPA) is characterized by a predominant semantic memory impairment, episodic memory impairments are the clinical hallmark of Alzheimer's disease (AD). However, AD patients also present with semantic deficits, which are more severe for semantically unique entities (e.g. a famous person) than for common concepts (e.g. a beaver). Previous studies in these patient populations have largely focused on famous-person naming. Therefore, we aimed to evaluate if these impairments also extend to other semantically unique entities such as famous places and famous logos. In this study, 13 AD patients, 9 svPPA patients, and 12 cognitively unimpaired elderly subjects (CTRL) were tested with a picture-naming test of non-unique entities (Boston Naming Test) and three experimental tests of semantically unique entities assessing naming of famous persons, places, and logos. Both clinical groups were overall more impaired at naming semantically unique entities than non-unique entities. Naming impairments in AD and svPPA extended to the other types of semantically unique entities, since a CTRL>AD>svPPA pattern was found on the performance of all naming tests. Naming famous places and famous persons appeared to be most impaired in svPPA, and both specific and general semantic knowledge for these entities were affected in these patients. Although AD patients were most significantly impaired on famous-person naming, only their specific semantic knowledge was impaired, while general knowledge was preserved. Post-hoc neuroimaging analyses also showed that famous-person naming impairments in AD correlated with atrophy in the temporo-parietal junction, a region functionally associated with lexical access. In line with previous studies, svPPA patients' impairment in both naming and semantic knowledge suggest a more profound semantic impairment, while naming impairments in AD may arise to a greater extent from impaired lexical access, even though semantic impairment for specific knowledge is also present. These results highlight the critical importance of developing and using a variety of semantically-unique-entity naming tests in neuropsychological assessments of patients with neurodegenerative diseases, which may unveil different patterns of lexical-semantic deficits. Copyright © 2016 Elsevier Ltd. All rights reserved.
LispSEI: The Programmer’s Manual

DTIC Science & Technology

1988-01-01

defun print-entities ( str entities etype) (format t str ) (dolist (entity entities) (format t " -A" (entity-name entity *type)))) (detun entity-name...fields are munged only after the filters are executed. This makes things much easier. ;:Algorithm: (1) get initial list. (2) take out those entitles which...don’t meet all the constraints. 1, 3) pass the entities list through all the filters.(4) munge the appropriate fields (5)u return the result. (defn s
A stacked sequential learning method for investigator name recognition from web-based medical articles

NASA Astrophysics Data System (ADS)

Zhang, Xiaoli; Zou, Jie; Le, Daniel X.; Thoma, George

2010-01-01

"Investigator Names" is a newly required field in MEDLINE citations. It consists of personal names listed as members of corporate organizations in an article. Extracting investigator names automatically is necessary because of the increasing volume of articles reporting collaborative biomedical research in which a large number of investigators participate. In this paper, we present an SVM-based stacked sequential learning method in a novel application - recognizing named entities such as the first and last names of investigators from online medical journal articles. Stacked sequential learning is a meta-learning algorithm which can boost any base learner. It exploits contextual information by adding the predicted labels of the surrounding tokens as features. We apply this method to tag words in text paragraphs containing investigator names, and demonstrate that stacked sequential learning improves the performance of a nonsequential base learner such as an SVM classifier.
Department of Defense Data Model, Version 1, Fy 1998, Volume 6.

DTIC Science & Technology

1998-05-31

Definition: A REQUIREMENT TO WITHHOLD PAYMENT ON A SPECIFIC CONTRACT. (5104) (1) (A) 138 Entity Report DOD Data Model VI FY98 Attribute Names...424 Entity Report DOD Data Model VI FY98 Entity Name: PAYMENT -MEANS-FINANCIAL-INSTITUTION-ACCOUNT Definition: THE ASSOCIATION OF A FINANCIAL...A) 453 Entity Report DOD Data Model VI FY98 Definition: PETITION FOR PAYMENT PRIOR TO PERFORMANCE BY A PERSONNEL-RESOURCE. Attribute Names
Automatic information extraction from unstructured mammography reports using distributed semantics.

PubMed

Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L

2018-02-01

To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.
10 CFR 300.3 - Guidance for defining and naming the reporting entity.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 10 Energy 3 2013-01-01 2013-01-01 false Guidance for defining and naming the reporting entity. 300.3 Section 300.3 Energy DEPARTMENT OF ENERGY CLIMATE CHANGE VOLUNTARY GREENHOUSE GAS REPORTING PROGRAM: GENERAL GUIDELINES § 300.3 Guidance for defining and naming the reporting entity. (a) A reporting...

10 CFR 300.3 - Guidance for defining and naming the reporting entity.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 10 Energy 3 2012-01-01 2012-01-01 false Guidance for defining and naming the reporting entity. 300.3 Section 300.3 Energy DEPARTMENT OF ENERGY CLIMATE CHANGE VOLUNTARY GREENHOUSE GAS REPORTING PROGRAM: GENERAL GUIDELINES § 300.3 Guidance for defining and naming the reporting entity. (a) A reporting...
10 CFR 300.3 - Guidance for defining and naming the reporting entity.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 10 Energy 3 2011-01-01 2011-01-01 false Guidance for defining and naming the reporting entity. 300.3 Section 300.3 Energy DEPARTMENT OF ENERGY CLIMATE CHANGE VOLUNTARY GREENHOUSE GAS REPORTING PROGRAM: GENERAL GUIDELINES § 300.3 Guidance for defining and naming the reporting entity. (a) A reporting...
10 CFR 300.3 - Guidance for defining and naming the reporting entity.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 10 Energy 3 2014-01-01 2014-01-01 false Guidance for defining and naming the reporting entity. 300.3 Section 300.3 Energy DEPARTMENT OF ENERGY CLIMATE CHANGE VOLUNTARY GREENHOUSE GAS REPORTING PROGRAM: GENERAL GUIDELINES § 300.3 Guidance for defining and naming the reporting entity. (a) A reporting...
10 CFR 300.3 - Guidance for defining and naming the reporting entity.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 10 Energy 3 2010-01-01 2010-01-01 false Guidance for defining and naming the reporting entity. 300.3 Section 300.3 Energy DEPARTMENT OF ENERGY CLIMATE CHANGE VOLUNTARY GREENHOUSE GAS REPORTING PROGRAM: GENERAL GUIDELINES § 300.3 Guidance for defining and naming the reporting entity. (a) A reporting...
The CHEMDNER corpus of chemicals and drugs and its annotation principles.

PubMed

Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The CHEMDNER corpus of chemicals and drugs and its annotation principles

PubMed Central

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773
77 FR 61658 - Designation of Two Entities Pursuant to Executive Orders

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-10

... DEPARTMENT OF THE TREASURY Office of Foreign Assets Control Designation of Two Entities Pursuant... Treasury Department's Office of Foreign Assets Control (``OFAC'') is publishing the names of two entities....'' DATES: The designation by the Director of OFAC of the two entities named in this notice, pursuant to...
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature

PubMed Central

Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.

2015-01-01

Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285
Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

PubMed

Dai, Hong-Jie; Lai, Po-Ting; Chang, Yung-Chun; Tsai, Richard Tzong-Han

2015-01-01

The functions of chemical compounds and drugs that affect biological processes and their particular effect on the onset and treatment of diseases have attracted increasing interest with the advancement of research in the life sciences. To extract knowledge from the extensive literatures on such compounds and drugs, the organizers of BioCreative IV administered the CHEMical Compound and Drug Named Entity Recognition (CHEMDNER) task to establish a standard dataset for evaluating state-of-the-art chemical entity recognition methods. This study introduces the approach of our CHEMDNER system. Instead of emphasizing the development of novel feature sets for machine learning, this study investigates the effect of various tag schemes on the recognition of the names of chemicals and drugs by using conditional random fields. Experiments were conducted using combinations of different tokenization strategies and tag schemes to investigate the effects of tag set selection and tokenization method on the CHEMDNER task. This study presents the performance of CHEMDNER of three more representative tag schemes-IOBE, IOBES, and IOB12E-when applied to a widely utilized IOB tag set and combined with the coarse-/fine-grained tokenization methods. The experimental results thus reveal that the fine-grained tokenization strategy performance best in terms of precision, recall and F-scores when the IOBES tag set was utilized. The IOBES model with fine-grained tokenization yielded the best-F-scores in the six chemical entity categories other than the "Multiple" entity category. Nonetheless, no significant improvement was observed when a more representative tag schemes was used with the coarse or fine-grained tokenization rules. The best F-scores that were achieved using the developed system on the test dataset of the CHEMDNER task were 0.833 and 0.815 for the chemical documents indexing and the chemical entity mention recognition tasks, respectively. The results herein highlight the importance of tag set selection and the use of different tokenization strategies. Fine-grained tokenization combined with the tag set IOBES most effectively recognizes chemical and drug names. To the best of the authors' knowledge, this investigation is the first comprehensive investigation use of various tag set schemes combined with different tokenization strategies for the recognition of chemical entities.
A sentence sliding window approach to extract protein annotations from biomedical articles

PubMed Central

Krallinger, Martin; Padron, Maria; Valencia, Alfonso

2005-01-01

Background Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. Results The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). Conclusion We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. PMID:15960831
A modular framework for biomedical concept recognition

PubMed Central

2013-01-01

Background Concept recognition is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. The development of such solutions is typically performed in an ad-hoc manner or using general information extraction frameworks, which are not optimized for the biomedical domain and normally require the integration of complex external libraries and/or the development of custom tools. Results This article presents Neji, an open source framework optimized for biomedical concept recognition built around four key characteristics: modularity, scalability, speed, and usability. It integrates modules for biomedical natural language processing, such as sentence splitting, tokenization, lemmatization, part-of-speech tagging, chunking and dependency parsing. Concept recognition is provided through dictionary matching and machine learning with normalization methods. Neji also integrates an innovative concept tree implementation, supporting overlapped concept names and respective disambiguation techniques. The most popular input and output formats, namely Pubmed XML, IeXML, CoNLL and A1, are also supported. On top of the built-in functionalities, developers and researchers can implement new processing modules or pipelines, or use the provided command-line interface tool to build their own solutions, applying the most appropriate techniques to identify heterogeneous biomedical concepts. Neji was evaluated against three gold standard corpora with heterogeneous biomedical concepts (CRAFT, AnEM and NCBI disease corpus), achieving high performance results on named entity recognition (F1-measure for overlap matching: species 95%, cell 92%, cellular components 83%, gene and proteins 76%, chemicals 65%, biological processes and molecular functions 63%, disorders 85%, and anatomical entities 82%) and on entity normalization (F1-measure for overlap name matching and correct identifier included in the returned list of identifiers: species 88%, cell 71%, cellular components 72%, gene and proteins 64%, chemicals 53%, and biological processes and molecular functions 40%). Neji provides fast and multi-threaded data processing, annotating up to 1200 sentences/second when using dictionary-based concept identification. Conclusions Considering the provided features and underlying characteristics, we believe that Neji is an important contribution to the biomedical community, streamlining the development of complex concept recognition solutions. Neji is freely available at http://bioinformatics.ua.pt/neji. PMID:24063607
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

PubMed

Munkhdalai, Tsendsuren; Li, Meijing; Batsuren, Khuyagbaatar; Park, Hyeon Ah; Choi, Nak Hyeon; Ryu, Keun Ho

2015-01-01

Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
Noun and knowledge retrieval for biological and non-biological entities following right occipitotemporal lesions.

PubMed

Bruffaerts, Rose; De Weer, An-Sofie; De Grauwe, Sophie; Thys, Miek; Dries, Eva; Thijs, Vincent; Sunaert, Stefan; Vandenbulcke, Mathieu; De Deyne, Simon; Storms, Gerrit; Vandenberghe, Rik

2014-09-01

We investigated the critical contribution of right ventral occipitotemporal cortex to knowledge of visual and functional-associative attributes of biological and non-biological entities and how this relates to category-specificity during confrontation naming. In a consecutive series of 7 patients with lesions confined to right ventral occipitotemporal cortex, we conducted an extensive assessment of oral generation of visual-sensory and functional-associative features in response to the names of biological and nonbiological entities. Subjects also performed a confrontation naming task for these categories. Our main novel finding related to a unique case with a small lesion confined to right medial fusiform gyrus who showed disproportionate naming impairment for nonbiological versus biological entities, specifically for tools. Generation of visual and functional-associative features was preserved for biological and non-biological entities. In two other cases, who had a relatively small posterior lesion restricted to primary visual and posterior fusiform cortex, retrieval of visual attributes was disproportionately impaired compared to functional-associative attributes, in particular for biological entities. However, these cases did not show a category-specific naming deficit. Two final cases with the largest lesions showed a classical dissociation between biological versus nonbiological entities during naming, with normal feature generation performance. This is the first lesion-based evidence of a critical contribution of the right medial fusiform cortex to tool naming. Second, dissociations along the dimension of attribute type during feature generation do not co-occur with category-specificity during naming in the current patient sample. Copyright © 2014 Elsevier Ltd. All rights reserved.
Collaborative human-machine analysis to disambiguate entities in unstructured text and structured datasets

NASA Astrophysics Data System (ADS)

Davenport, Jack H.

2016-05-01

Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.
Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.

PubMed

Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua

2014-01-01

Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.

PubMed

Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W

2015-01-01

Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.
Exploring Social Meaning in Online Bilingual Text through Social Network Analysis

DTIC Science & Technology

2015-09-01

p. 1). 30 GATE development began in 1995. As techniques for natural language processing ( NLP ) are investigated by the research community and...become part of the NLP repetoire, developers incorporate them with wrappers, which allow the output from GATE processes to be recognized as input by...University NEE Named Entity Extraction NLP natural language processing OSD Office of the Secretary of Defense POS parts of speech SBIR Small Business
Sortal anaphora resolution to enhance relation extraction from biomedical literature.

PubMed

Kilicoglu, Halil; Rosemblat, Graciela; Fiszman, Marcelo; Rindflesch, Thomas C

2016-04-14

Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.
Using nanoinformatics methods for automatically identifying relevant nanotoxicology entities from the literature.

PubMed

García-Remesal, Miguel; García-Ruiz, Alejandro; Pérez-Rey, David; de la Iglesia, Diana; Maojo, Víctor

2013-01-01

Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices and their potential applications in health care. In this paper, we have focused on the solutions that nanoinformatics can provide to facilitate nanotoxicology research. For this, we have taken a computational approach to automatically recognize and extract nanotoxicology-related entities from the scientific literature. The desired entities belong to four different categories: nanoparticles, routes of exposure, toxic effects, and targets. The entity recognizer was trained using a corpus that we specifically created for this purpose and was validated by two nanomedicine/nanotoxicology experts. We evaluated the performance of our entity recognizer using 10-fold cross-validation. The precisions range from 87.6% (targets) to 93.0% (routes of exposure), while recall values range from 82.6% (routes of exposure) to 87.4% (toxic effects). These results prove the feasibility of using computational approaches to reliably perform different named entity recognition (NER)-dependent tasks, such as for instance augmented reading or semantic searches. This research is a "proof of concept" that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating research in nanomedicine.
CD-REST: a system for extracting chemical-induced disease relation in literature.

PubMed

Xu, Jun; Wu, Yonghui; Zhang, Yaoyun; Wang, Jingqi; Lee, Hee-Jin; Xu, Hua

2016-01-01

Mining chemical-induced disease relations embedded in the vast biomedical literature could facilitate a wide range of computational biomedical applications, such as pharmacovigilance. The BioCreative V organized a Chemical Disease Relation (CDR) Track regarding chemical-induced disease relation extraction from biomedical literature in 2015. We participated in all subtasks of this challenge. In this article, we present our participation system Chemical Disease Relation Extraction SysTem (CD-REST), an end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug-disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request. The web services can be accessed fromhttp://clinicalnlptool.com/cdr The online CD-REST demonstration system is available athttp://clinicalnlptool.com/cdr/cdr.html. Database URL:http://clinicalnlptool.com/cdr;http://clinicalnlptool.com/cdr/cdr.html. © The Author(s) 2016. Published by Oxford University Press.

Context and Domain Knowledge Enhanced Entity Spotting in Informal Text

NASA Astrophysics Data System (ADS)

Gruhl, Daniel; Nagarajan, Meena; Pieper, Jan; Robson, Christine; Sheth, Amit

This paper explores the application of restricted relationship graphs (RDF) and statistical NLP techniques to improve named entity annotation in challenging Informal English domains. We validate our approach using on-line forums discussing popular music. Named entity annotation is particularly difficult in this domain because it is characterized by a large number of ambiguous entities, such as the Madonna album "Music" or Lilly Allen's pop hit "Smile".
Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: an annotation and machine learning study.

PubMed

Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules

2014-06-01

Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that the approaches used for English are also suitable for Swedish clinical text. However, a small proportion of the errors made by the model are less likely to occur in English text, showing that results might be improved by further tailoring the system to clinical Swedish. The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Transfer learning for biomedical named entity recognition with neural networks.

PubMed

Giorgi, John M; Bader, Gary D

2018-06-01

The explosive increase of biomedical literature has made information extraction an increasingly important tool for biomedical research. A fundamental task is the recognition of biomedical named entities in text (BNER) such as genes/proteins, diseases, and species. Recently, a domain-independent method based on deep learning and statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), has been shown to outperform state-of-the-art entity-specific BNER tools. However, this method is dependent on gold-standard corpora (GSCs) consisting of hand-labeled entities, which tend to be small but highly reliable. An alternative to GSCs are silver-standard corpora (SSCs), which are generated by harmonizing the annotations made by several automatic annotation systems. SSCs typically contain more noise than GSCs but have the advantage of containing many more training examples. Ideally, these corpora could be combined to achieve the benefits of both, which is an opportunity for transfer learning. In this work, we analyze to what extent transfer learning improves upon state-of-the-art results for BNER. We demonstrate that transferring a deep neural network (DNN) trained on a large, noisy SSC to a smaller, but more reliable GSC significantly improves upon state-of-the-art results for BNER. Compared to a state-of-the-art baseline evaluated on 23 GSCs covering four different entity classes, transfer learning results in an average reduction in error of approximately 11%. We found transfer learning to be especially beneficial for target data sets with a small number of labels (approximately 6000 or less). Source code for the LSTM-CRF is available athttps://github.com/Franck-Dernoncourt/NeuroNER/ and links to the corpora are available athttps://github.com/BaderLab/Transfer-Learning-BNER-Bioinformatics-2018/. john.giorgi@utoronto.ca. Supplementary data are available at Bioinformatics online.
Extracting and standardizing medication information in clinical text – the MedEx-UIMA system

PubMed Central

Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C.; Xu, Hua

2014-01-01

Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/. PMID:25954575
Extraction of Pharmacokinetic Evidence of Drug–Drug Interactions from the Literature

PubMed Central

Kolchinsky, Artemy; Lourenço, Anália; Wu, Heng-Yi; Li, Lang; Rocha, Luis M.

2015-01-01

Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence. PMID:25961290
12 CFR 1010.208 - General information.

Code of Federal Regulations, 2012 CFR

2012-01-01

... owner or developer are corporate entities, name the parent and/or corporate entity and state the... registration or prohibited sales, name the state involved and give the reasons cited by the state for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
12 CFR 1010.208 - General information.

Code of Federal Regulations, 2013 CFR

2013-01-01

... owner or developer are corporate entities, name the parent and/or corporate entity and state the... registration or prohibited sales, name the state involved and give the reasons cited by the state for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

PubMed

Santos, Carlos; Eggle, Daniela; States, David J

2005-04-15

Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases. The pipeline software components are freely available on request to the authors. dstates@umich.edu http://stateslab.bioinformatics.med.umich.edu/software.html.
2 CFR 170.110 - Types of entities to which this part applies.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 2 Grants and Agreements 1 2014-01-01 2014-01-01 false Types of entities to which this part applies... or receive agency awards; or (2) Receive subawards under those awards. (b) Exceptions. (1) None of... her name). (2) None of the requirements regarding reporting names and total compensation of an entity...
2 CFR 170.110 - Types of entities to which this part applies.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 2 Grants and Agreements 1 2011-01-01 2011-01-01 false Types of entities to which this part applies... or receive agency awards; or (2) Receive subawards under those awards. (b) Exceptions. (1) None of... her name). (2) None of the requirements regarding reporting names and total compensation of an entity...
A neural joint model for entity and relation extraction from biomedical text.

PubMed

Li, Fei; Zhang, Meishan; Fu, Guohong; Ji, Donghong

2017-03-31

Extracting biomedical entities and their relations from text has important applications on biomedical research. Previous work primarily utilized feature-based pipeline models to process this task. Many efforts need to be made on feature engineering when feature-based models are employed. Moreover, pipeline models may suffer error propagation and are not able to utilize the interactions between subtasks. Therefore, we propose a neural joint model to extract biomedical entities as well as their relations simultaneously, and it can alleviate the problems above. Our model was evaluated on two tasks, i.e., the task of extracting adverse drug events between drug and disease entities, and the task of extracting resident relations between bacteria and location entities. Compared with the state-of-the-art systems in these tasks, our model improved the F1 scores of the first task by 5.1% in entity recognition and 8.0% in relation extraction, and that of the second task by 9.2% in relation extraction. The proposed model achieves competitive performances with less work on feature engineering. We demonstrate that the model based on neural networks is effective for biomedical entity and relation extraction. In addition, parameter sharing is an alternative method for neural models to jointly process this task. Our work can facilitate the research on biomedical text mining.
Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

PubMed Central

Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie

2016-01-01

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13. Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract PMID:27504009
EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

PubMed

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

2016-01-01

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.
Update from the 4th Edition of the World Health Organization Classification of Head and Neck Tumours: Tumors of the Salivary Gland.

PubMed

Seethala, Raja R; Stenman, Göran

2017-03-01

The salivary gland section in the 4th edition of the World Health Organization classification of head and neck tumors features the description and inclusion of several entities, the most significant of which is represented by (mammary analogue) secretory carcinoma. This entity was extracted mainly from acinic cell carcinoma based on recapitulation of breast secretory carcinoma and a shared ETV6-NTRK3 gene fusion. Also new is the subsection of "Other epithelial lesions," for which key entities include sclerosing polycystic adenosis and intercalated duct hyperplasia. Many entities have been compressed into their broader categories given clinical and morphologic similarities, or transitioned to a different grouping as was the case with low-grade cribriform cystadenocarcinoma reclassified as intraductal carcinoma (with the applied qualifier of low-grade). Specific grade has been removed from the names of the salivary gland entities such as polymorphous adenocarcinoma, providing pathologists flexibility in assigning grade and allowing for recognition of a broader spectrum within an entity. Cribriform adenocarcinoma of (minor) salivary gland origin continues to be divisive in terms of whether it should be recognized as a distinct category. This chapter also features new key concepts such as high-grade transformation. The new paradigm of translocations and gene fusions being common in salivary gland tumors is featured heavily in this chapter.
17 CFR 229.1107 - (Item 1107) Issuing entities.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 17 Commodity and Securities Exchanges 2 2011-04-01 2011-04-01 false (Item 1107) Issuing entities....1107 (Item 1107) Issuing entities. Provide the following information about the issuing entity: (a) State the issuing entity's name and describe the issuing entity's form of organization, including the...
17 CFR 229.1107 - (Item 1107) Issuing entities.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 17 Commodity and Securities Exchanges 2 2010-04-01 2010-04-01 false (Item 1107) Issuing entities....1107 (Item 1107) Issuing entities. Provide the following information about the issuing entity: (a) State the issuing entity's name and describe the issuing entity's form of organization, including the...
Medical Named Entity Recognition for Indonesian Language Using Word Representations

NASA Astrophysics Data System (ADS)

Rahman, Arief

2018-03-01

Nowadays, Named Entity Recognition (NER) system is used in medical texts to obtain important medical information, like diseases, symptoms, and drugs. While most NER systems are applied to formal medical texts, informal ones like those from social media (also called semi-formal texts) are starting to get recognition as a gold mine for medical information. We propose a theoretical Named Entity Recognition (NER) model for semi-formal medical texts in our medical knowledge management system by comparing two kinds of word representations: cluster-based word representation and distributed representation.
Identifying non-elliptical entity mentions in a coordinated NP with ellipses.

PubMed

Chae, Jeongmin; Jung, Younghee; Lee, Taemin; Jung, Soonyoung; Huh, Chan; Kim, Gilhan; Kim, Hyeoncheol; Oh, Heungbum

2014-02-01

Named entities in the biomedical domain are often written using a Noun Phrase (NP) along with a coordinating conjunction such as 'and' and 'or'. In addition, repeated words among named entity mentions are frequently omitted. It is often difficult to identify named entities. Although various Named Entity Recognition (NER) methods have tried to solve this problem, these methods can only deal with relatively simple elliptical patterns in coordinated NPs. We propose a new NER method for identifying non-elliptical entity mentions with simple or complex ellipses using linguistic rules and an entity mention dictionary. The GENIA and CRAFT corpora were used to evaluate the performance of the proposed system. The GENIA corpus was used to evaluate the performance of the system according to the quality of the dictionary. The GENIA corpus comprises 3434 non-elliptical entity mentions in 1585 coordinated NPs with ellipses. The system achieves 92.11% precision, 95.20% recall, and 93.63% F-score in identification of non-elliptical entity mentions in coordinated NPs. The accuracy of the system in resolving simple and complex ellipses is 94.54% and 91.95%, respectively. The CRAFT corpus was used to evaluate the performance of the system under realistic conditions. The system achieved 78.47% precision, 67.10% recall, and 72.34% F-score in coordinated NPs. The performance evaluations of the system show that it efficiently solves the problem caused by ellipses, and improves NER performance. The algorithm is implemented in PHP and the code can be downloaded from https://code.google.com/p/medtextmining/. Copyright © 2013. Published by Elsevier Inc.
Recognition of chemical entities: combining dictionary-based and grammar-based approaches.

PubMed

Akhondi, Saber A; Hettne, Kristina M; van der Horst, Eelke; van Mulligen, Erik M; Kors, Jan A

2015-01-01

The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance.
Recognition of chemical entities: combining dictionary-based and grammar-based approaches

PubMed Central

2015-01-01

Background The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. Results The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. Conclusions We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance. PMID:25810767

Leveraging Pattern Semantics for Extracting Entities in Enterprises

PubMed Central

Tao, Fangbo; Zhao, Bo; Fuxman, Ariel; Li, Yang; Han, Jiawei

2015-01-01

Entity Extraction is a process of identifying meaningful entities from text documents. In enterprises, extracting entities improves enterprise efficiency by facilitating numerous applications, including search, recommendation, etc. However, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redundancy of enterprise entities makes previous web-based systems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enterprises (“Blue” refers to “Windows Blue”), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage public signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined semantics, and yield more accurate and complete entities. Experiments on Microsoft enterprise data show the effectiveness of our approach. PMID:26705540
Leveraging Pattern Semantics for Extracting Entities in Enterprises.

PubMed

Tao, Fangbo; Zhao, Bo; Fuxman, Ariel; Li, Yang; Han, Jiawei

2015-05-01

Entity Extraction is a process of identifying meaningful entities from text documents. In enterprises, extracting entities improves enterprise efficiency by facilitating numerous applications, including search, recommendation, etc. However, the problem is particularly challenging on enterprise domains due to several reasons. First, the lack of redundancy of enterprise entities makes previous web-based systems like NELL and OpenIE not effective, since using only high-precision/low-recall patterns like those systems would miss the majority of sparse enterprise entities, while using more low-precision patterns in sparse setting also introduces noise drastically. Second, semantic drift is common in enterprises ("Blue" refers to "Windows Blue"), such that public signals from the web cannot be directly applied on entities. Moreover, many internal entities never appear on the web. Sparse internal signals are the only source for discovering them. To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. We introduce the novel concept of Semantic Pattern Graph to leverage public signals to understand the underlying semantics of lexical patterns, reinforce pattern evaluation using mined semantics, and yield more accurate and complete entities. Experiments on Microsoft enterprise data show the effectiveness of our approach.
EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, wellmore » documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.« less
EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation

DOE PAGES

Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; ...

2016-01-01

The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, wellmore » documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Here the comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15–25% and helps curators to detect terms that would otherwise have been missed.« less
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning.

PubMed

Feng, Yuntian; Zhang, Hongjun; Hao, Wenning; Chen, Gang

2017-01-01

We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q -Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score.
Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning

PubMed Central

Zhang, Hongjun; Chen, Gang

2017-01-01

We use both reinforcement learning and deep learning to simultaneously extract entities and relations from unstructured texts. For reinforcement learning, we model the task as a two-step decision process. Deep learning is used to automatically capture the most important information from unstructured texts, which represent the state in the decision process. By designing the reward function per step, our proposed method can pass the information of entity extraction to relation extraction and obtain feedback in order to extract entities and relations simultaneously. Firstly, we use bidirectional LSTM to model the context information, which realizes preliminary entity extraction. On the basis of the extraction results, attention based method can represent the sentences that include target entity pair to generate the initial state in the decision process. Then we use Tree-LSTM to represent relation mentions to generate the transition state in the decision process. Finally, we employ Q-Learning algorithm to get control policy π in the two-step decision process. Experiments on ACE2005 demonstrate that our method attains better performance than the state-of-the-art method and gets a 2.4% increase in recall-score. PMID:28894463
Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.

PubMed

Raja, Kalpana; Natarajan, Jeyakumar

2018-07-01

Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.
Impact of translation on named-entity recognition in radiology texts

PubMed Central

Pedro, Vasco

2017-01-01

Abstract Radiology reports describe the results of radiography procedures and have the potential of being a useful source of information which can bring benefits to health care systems around the world. One way to automatically extract information from the reports is by using Text Mining tools. The problem is that these tools are mostly developed for English and reports are usually written in the native language of the radiologist, which is not necessarily English. This creates an obstacle to the sharing of Radiology information between different communities. This work explores the solution of translating the reports to English before applying the Text Mining tools, probing the question of what translation approach should be used. We created MRRAD (Multilingual Radiology Research Articles Dataset), a parallel corpus of Portuguese research articles related to Radiology and a number of alternative translations (human, automatic and semi-automatic) to English. This is a novel corpus which can be used to move forward the research on this topic. Using MRRAD we studied which kind of automatic or semi-automatic translation approach is more effective on the Named-entity recognition task of finding RadLex terms in the English version of the articles. Considering the terms extracted from human translations as our gold standard, we calculated how similar to this standard were the terms extracted using other translations. We found that a completely automatic translation approach using Google leads to F-scores (between 0.861 and 0.868, depending on the extraction approach) similar to the ones obtained through a more expensive semi-automatic translation approach using Unbabel (between 0.862 and 0.870). To better understand the results we also performed a qualitative analysis of the type of errors found in the automatic and semi-automatic translations. Database URL: https://github.com/lasigeBioTM/MRRAD PMID:29220455
Question analysis for Indonesian comparative question

NASA Astrophysics Data System (ADS)

Saelan, A.; Purwarianti, A.; Widyantoro, D. H.

2017-01-01

Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.

PubMed

Usié, Anabel; Cruz, Joaquim; Comas, Jorge; Solsona, Francesc; Alves, Rui

2015-01-01

Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. We evaluate the performance of the various approaches using the F-score statistics. Higher F-scores indicate better performance. The highest F-score we obtain in identifying unique chemical entities is 72.88%. The highest F-score we obtain in identifying all chemical entities is 73.07%. We also evaluate the F-Score of combining our system with ChemSpot, and find an increase from 72.88% to 73.83%. CheNER presents a valid alternative for automated annotation of chemical entities in biomedical documents. In addition, CheNER may be used to derive new features to train newer methods for tagging chemical entities. CheNER can be downloaded from http://metres.udl.cat and included in text annotation pipelines.
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

PubMed Central

Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

2011-01-01

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677
Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.

PubMed

Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie

2016-01-01

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13.Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract. © The Author(s) 2016. Published by Oxford University Press.
76 FR 28503 - Identification of Three Entities as Government of Libya Entities Pursuant to Executive Order 13566

Federal Register 2010, 2011, 2012, 2013, 2014

2011-05-17

... DEPARTMENT OF THE TREASURY Office of Foreign Assets Control Identification of Three Entities as Government of Libya Entities Pursuant to Executive Order 13566 AGENCY: Department of the Treasury. ACTION... names of three entities identified on May 5, 2011 as persons whose property and interests in property...
University of Glasgow at TREC 2009: Experiments with Terrier

DTIC Science & Technology

2009-11-01

identify entities in the category B subset of the corpus, we resort to an efficient dictionary -based named en- tity recognition approach.4 In particular...we build a large dictio- nary of entity names using DBPedia,5 a structured representation of Wikipedia. Dictionary entries comprise all known...aliases for each unique entity, as obtained from DBPedia (e.g., ‘Barack Obama’ is represented by the dictionary entries ‘Barack Obama’ and ‘44th President
50 CFR 679.81 - Rockfish Program annual harvester and processor privileges.

Code of Federal Regulations, 2010 CFR

2010-10-01

... legal name; the type of business entity under which the rockfish cooperative is organized; the state in which the rockfish cooperative is legally registered as a business entity; Tax ID number, date of incorporation, the printed name of the rockfish cooperative's designated representative; the permanent business...
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.

PubMed

Li, Jiao; Sun, Yueping; Johnson, Robin J; Sciaky, Daniela; Wei, Chih-Hsuan; Leaman, Robert; Davis, Allan Peter; Mattingly, Carolyn J; Wiegers, Thomas C; Lu, Zhiyong

2016-01-01

Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
Gimli: open source and high-performance biomedical name recognition

PubMed Central

2013-01-01

Background Automatic recognition of biomedical names is an essential task in biomedical information extraction, presenting several complex and unsolved challenges. In recent years, various solutions have been implemented to tackle this problem. However, limitations regarding system characteristics, customization and usability still hinder their wider application outside text mining research. Results We present Gimli, an open-source, state-of-the-art tool for automatic recognition of biomedical names. Gimli includes an extended set of implemented and user-selectable features, such as orthographic, morphological, linguistic-based, conjunctions and dictionary-based. A simple and fast method to combine different trained models is also provided. Gimli achieves an F-measure of 87.17% on GENETAG and 72.23% on JNLPBA corpus, significantly outperforming existing open-source solutions. Conclusions Gimli is an off-the-shelf, ready to use tool for named-entity recognition, providing trained and optimized models for recognition of biomedical entities from scientific text. It can be used as a command line tool, offering full functionality, including training of new models and customization of the feature set and model parameters through a configuration file. Advanced users can integrate Gimli in their text mining workflows through the provided library, and extend or adapt its functionalities. Based on the underlying system characteristics and functionality, both for final users and developers, and on the reported performance results, we believe that Gimli is a state-of-the-art solution for biomedical NER, contributing to faster and better research in the field. Gimli is freely available at http://bioinformatics.ua.pt/gimli. PMID:23413997
The left temporal pole is a heteromodal hub for retrieving proper names

PubMed Central

Waldron, Eric J.; Manzel, Kenneth; Tranel, Daniel

2015-01-01

The left temporal pole (LTP) has been posited to be a heteromodal hub for retrieving proper names for semantically unique entities. Previous investigations have demonstrated that LTP is important for retrieving names for famous faces and unique landmarks. However, whether such a relationship would hold for unique entities apprehended through stimulus modalities other than vision has not been well established, and such evidence is critical to adjudicate claims about the “heteromodal” nature of the LTP. Here, we tested the hypothesis that the LTP would be important for naming famous voices. Individuals with LTP lesions were asked to recognize and name famous persons speaking in audio clips. Relative to neurologically normal and brain damaged comparison participants, patients with LTP lesions were able to recognize famous persons from their voices normally, but were selectively impaired in naming famous persons from their voices. The current results extend previous research and provide further support for the notion that the LTP is a convergence region serving as a heteromodal hub for retrieving the names of semantically unique entities. PMID:24389260
76 FR 52384 - Designation of Additional Entities Pursuant to Executive Order 13405

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-22

... DEPARTMENT OF THE TREASURY Office of Foreign Assets Control Designation of Additional Entities... Assets Control (``OFAC'') is publishing the names of four newly-designated entities whose property and... the Director of OFAC of the four entities identified in this notice, pursuant to Executive [[Page...
31 CFR 306.88 - Political entities and public corporations.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 31 Money and Finance:Treasury 2 2011-07-01 2011-07-01 false Political entities and public... entities and public corporations. Securities registered in the name of, or assigned to, a State, county, city, town, village, school district or other political entity, public body or corporation, may be...

31 CFR 306.88 - Political entities and public corporations.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Political entities and public... entities and public corporations. Securities registered in the name of, or assigned to, a State, county, city, town, village, school district or other political entity, public body or corporation, may be...
Mining dynamic noteworthy functions in software execution sequences.

PubMed

Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

2017-01-01

As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.
A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

PubMed Central

Dingare, Shipra; Nissim, Malvina; Finkel, Jenny; Grover, Claire

2005-01-01

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal. PMID:18629295
A Novel Approach towards Medical Entity Recognition in Chinese Clinical Text

PubMed Central

Yu, Jian

2017-01-01

Medical entity recognition, a basic task in the language processing of clinical data, has been extensively studied in analyzing admission notes in alphabetic languages such as English. However, much less work has been done on nonstructural texts that are written in Chinese, or in the setting of differentiation of Chinese drug names between traditional Chinese medicine and Western medicine. Here, we propose a novel cascade-type Chinese medication entity recognition approach that aims at integrating the sentence category classifier from a support vector machine and the conditional random field-based medication entity recognition. We hypothesized that this approach could avoid the side effects of abundant negative samples and improve the performance of the named entity recognition from admission notes written in Chinese. Therefore, we applied this approach to a test set of 324 Chinese-written admission notes with manual annotation by medical experts. Our data demonstrated that this approach had a score of 94.2% in precision, 92.8% in recall, and 93.5% in F-measure for the recognition of traditional Chinese medicine drug names and 91.2% in precision, 92.6% in recall, and 91.7% F-measure for the recognition of Western medicine drug names. The differences in F-measure were significant compared with those in the baseline systems. PMID:29065612
Cloud Computing in Higher Education Sector for Sustainable Development

ERIC Educational Resources Information Center

Duan, Yuchao

2016-01-01

Cloud computing is considered a new frontier in the field of computing, as this technology comprises three major entities namely: software, hardware and network. The collective nature of all these entities is known as the Cloud. This research aims to examine the impacts of various aspects namely: cloud computing, sustainability, performance…
A transition-based joint model for disease named entity recognition and normalization.

PubMed

Lou, Yinxia; Zhang, Yue; Qian, Tao; Li, Fei; Xiong, Shufeng; Ji, Donghong

2017-08-01

Disease named entities play a central role in many areas of biomedical research, and automatic recognition and normalization of such entities have received increasing attention in biomedical research communities. Existing methods typically used pipeline models with two independent phases: (i) a disease named entity recognition (DER) system is used to find the boundaries of mentions in text and (ii) a disease named entity normalization (DEN) system is used to connect the mentions recognized to concepts in a controlled vocabulary. The main problems of such models are: (i) there is error propagation from DER to DEN and (ii) DEN is useful for DER, but pipeline models cannot utilize this. We propose a transition-based model to jointly perform disease named entity recognition and normalization, casting the output construction process into an incremental state transition process, learning sequences of transition actions globally, which correspond to joint structural outputs. Beam search and online structured learning are used, with learning being designed to guide search. Compared with the only existing method for joint DEN and DER, our method allows non-local features to be used, which significantly improves the accuracies. We evaluate our model on two corpora: the BioCreative V Chemical Disease Relation (CDR) corpus and the NCBI disease corpus. Experiments show that our joint framework achieves significantly higher performances compared to competitive pipeline baselines. Our method compares favourably to other state-of-the-art approaches. Data and code are available at https://github.com/louyinxia/jointRN. dhji@whu.edu.cn. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ChemBrowser: a flexible framework for mining chemical documents.

PubMed

Wu, Xian; Zhang, Li; Chen, Ying; Rhodes, James; Griffin, Thomas D; Boyer, Stephen K; Alba, Alfredo; Cai, Keke

2010-01-01

The ability to extract chemical and biological entities and relations from text documents automatically has great value to biochemical research and development activities. The growing maturity of text mining and artificial intelligence technologies shows promise in enabling such automatic chemical entity extraction capabilities (called "Chemical Annotation" in this paper). Many techniques have been reported in the literature, ranging from dictionary and rule-based techniques to machine learning approaches. In practice, we found that no single technique works well in all cases. A combinatorial approach that allows one to quickly compose different annotation techniques together for a given situation is most effective. In this paper, we describe the key challenges we face in real-world chemical annotation scenarios. We then present a solution called ChemBrowser which has a flexible framework for chemical annotation. ChemBrowser includes a suite of customizable processing units that might be utilized in a chemical annotator, a high-level language that describes the composition of various processing units that would form a chemical annotator, and an execution engine that translates the composition language to an actual annotator that can generate annotation results for a given set of documents. We demonstrate the impact of this approach by tailoring an annotator for extracting chemical names from patent documents and show how this annotator can be easily modified with simple configuration alone.
Extraction of CYP chemical interactions from biomedical literature using natural language processing methods.

PubMed

Jiao, Dazhi; Wild, David J

2009-02-01

This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in our system, such as part of speech (POS) tagging, Named Entity Recognition (NER), dependency parsing, and relation extraction. An evaluation of the system is conducted at the end, yielding very promising results: The POS, dependency parsing, and NER components in our system have achieved a very high level of accuracy as measured by precision, ranging from 85.9% to 98.5%, and the precision and the recall of the interaction extraction component are 76.0% and 82.6%, and for the overall system are 68.4% and 72.2%, respectively.
77 FR 31806 - Changes to Implement Micro Entity Status for Paying Patent Fees

Federal Register 2010, 2011, 2012, 2013, 2014

2012-05-30

... legislative history of 35 U.S.C. 123 is clear that it is directed to a subset of small entities, namely... history do not, for example, contemplate a for-profit, large entity applicant becoming a ``micro entity... across government agencies and identified goals designed to promote innovation; (8) considered approaches...
2 CFR 170.110 - Types of entities to which this part applies.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 2 Grants and Agreements 1 2013-01-01 2013-01-01 false Types of entities to which this part applies...) Apply for or receive agency awards; or (2) Receive subawards under those awards. (b) Exceptions. (1... his or her name). (2) None of the requirements regarding reporting names and total compensation of an...
2 CFR 170.110 - Types of entities to which this part applies.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 2 Grants and Agreements 1 2012-01-01 2012-01-01 false Types of entities to which this part applies...) Apply for or receive agency awards; or (2) Receive subawards under those awards. (b) Exceptions. (1... his or her name). (2) None of the requirements regarding reporting names and total compensation of an...
Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track

DTIC Science & Technology

2009-11-01

classes, depending on the categories they belong to. A music album could have any generic name, whereas a laptop model has a more generalizable name. A...names of music albums are simply plain text often capitalized, and so on. Thus, we feel that a better ap- proach would be to first identify the...origin domain of the text to be tagged (e.g., pharmaceutical, music , journal, etc.), and then apply tagging rules that are specific to that domain
Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation.

PubMed

Xie, Jiaheng; Liu, Xiao; Dajun Zeng, Daniel

2018-01-01

Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers' e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes

PubMed Central

Cañada, Andres; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso

2017-01-01

Abstract A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes—CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es PMID:28531339
A Tale of Two Paradigms: Disambiguating Extracted Entities with Applications to a Digital Library and the Web

ERIC Educational Resources Information Center

Huang, Jian

2010-01-01

With the increasing wealth of information on the Web, information integration is ubiquitous as the same real-world entity may appear in a variety of forms extracted from different sources. This dissertation proposes supervised and unsupervised algorithms that are naturally integrated in a scalable framework to solve the entity resolution problem,…
Induced lexico-syntactic patterns improve information extraction from online medical forums.

PubMed

Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

2014-01-01

To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
31 CFR 598.408 - Alleged change in ownership or control of an entity designated as a specially designated...

Code of Federal Regulations, 2010 CFR

2010-07-01

... of capital; and contracts evidencing the sale of the entity to its new owners. (b) Any continuing... narcotics trafficker could lead to designation of the purchaser. Mere change in name of an entity will not...
Mining Patients' Narratives in Social Media for Pharmacovigilance: Adverse Effects and Misuse of Methylphenidate.

PubMed

Chen, Xiaoyi; Faviez, Carole; Schuck, Stéphane; Lillo-Le-Louët, Agnès; Texier, Nathalie; Dahamna, Badisse; Huot, Charles; Foulquié, Pierre; Pereira, Suzanne; Leroux, Vincent; Karapetiantz, Pierre; Guenegou-Arnoux, Armelle; Katsahian, Sandrine; Bousquet, Cédric; Burgun, Anita

2018-01-01

Background: The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA) have recognized social media as a new data source to strengthen their activities regarding drug safety. Objective: Our objective in the ADR-PRISM project was to provide text mining and visualization tools to explore a corpus of posts extracted from social media. We evaluated this approach on a corpus of 21 million posts from five patient forums, and conducted a qualitative analysis of the data available on methylphenidate in this corpus. Methods: We applied text mining methods based on named entity recognition and relation extraction in the corpus, followed by signal detection using proportional reporting ratio (PRR). We also used topic modeling based on the Correlated Topic Model to obtain the list of the matics in the corpus and classify the messages based on their topics. Results: We automatically identified 3443 posts about methylphenidate published between 2007 and 2016, among which 61 adverse drug reactions (ADR) were automatically detected. Two pharmacovigilance experts evaluated manually the quality of automatic identification, and a f-measure of 0.57 was reached. Patient's reports were mainly neuro-psychiatric effects. Applying PRR, 67% of the ADRs were signals, including most of the neuro-psychiatric symptoms but also palpitations. Topic modeling showed that the most represented topics were related to Childhood and Treatment initiation , but also Side effects . Cases of misuse were also identified in this corpus, including recreational use and abuse. Conclusion: Named entity recognition combined with signal detection and topic modeling have demonstrated their complementarity in mining social media data. An in-depth analysis focused on methylphenidate showed that this approach was able to detect potential signals and to provide better understanding of patients' behaviors regarding drugs, including misuse.
Mining dynamic noteworthy functions in software execution sequences

PubMed Central

Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong

2017-01-01

As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely. PMID:28278276
31 CFR 306.88 - Political entities and public corporations.

Code of Federal Regulations, 2013 CFR

2013-07-01

... corporations. 306.88 Section 306.88 Money and Finance: Treasury Regulations Relating to Money and Finance... entities and public corporations. Securities registered in the name of, or assigned to, a State, county, city, town, village, school district or other political entity, public body or corporation, may be...

31 CFR 306.88 - Political entities and public corporations.

Code of Federal Regulations, 2014 CFR

2014-07-01

... corporations. 306.88 Section 306.88 Money and Finance: Treasury Regulations Relating to Money and Finance... entities and public corporations. Securities registered in the name of, or assigned to, a State, county, city, town, village, school district or other political entity, public body or corporation, may be...
31 CFR 306.88 - Political entities and public corporations.

Code of Federal Regulations, 2012 CFR

2012-07-01

... corporations. 306.88 Section 306.88 Money and Finance: Treasury Regulations Relating to Money and Finance... entities and public corporations. Securities registered in the name of, or assigned to, a State, county, city, town, village, school district or other political entity, public body or corporation, may be...
Moving Hands, Moving Entities

ERIC Educational Resources Information Center

Setti, Annalisa; Borghi, Anna M.; Tessari, Alessia

2009-01-01

In this study we investigated with a priming paradigm whether uni and bimanual actions presented as primes differently affected language processing. Animals' (self-moving entities) and plants' (not self-moving entities) names were used as targets. As prime we used grasping hands, presented both as static images and videos. The results showed an…
Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources

PubMed Central

Xu, Yan; Hua, Ji; Ni, Zhaoheng; Chen, Qinlang; Fan, Yubo; Ananiadou, Sophia; Chang, Eric I-Chao; Tsujii, Junichi

2014-01-01

References to anatomical entities in medical records consist not only of explicit references to anatomical locations, but also other diverse types of expressions, such as specific diseases, clinical tests, clinical treatments, which constitute implicit references to anatomical entities. In order to identify these implicit anatomical entities, we propose a hierarchical framework, in which two layers of named entity recognizers (NERs) work in a cooperative manner. Each of the NERs is implemented using the Conditional Random Fields (CRF) model, which use a range of external resources to generate features. We constructed a dictionary of anatomical entity expressions by exploiting four existing resources, i.e., UMLS, MeSH, RadLex and BodyPart3D, and supplemented information from two external knowledge bases, i.e., Wikipedia and WordNet, to improve inference of anatomical entities from implicit expressions. Experiments conducted on 300 discharge summaries showed a micro-averaged performance of 0.8509 Precision, 0.7796 Recall and 0.8137 F1 for explicit anatomical entity recognition, and 0.8695 Precision, 0.6893 Recall and 0.7690 F1 for implicit anatomical entity recognition. The use of the hierarchical framework, which combines the recognition of named entities of various types (diseases, clinical tests, treatments) with information embedded in external knowledge bases, resulted in a 5.08% increment in F1. The resources constructed for this research will be made publicly available. PMID:25343498
High-recall protein entity recognition using a dictionary

PubMed Central

Kou, Zhenzhen; Cohen, William W.; Murphy, Robert F.

2010-01-01

Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that recognizes phrases from the dictionary, as well as variations of these phrases. Standard training methods for HMMs can be used to learn which variants should be recognized. We compared the performance of our new approaches to that of Maximum Entropy (Max-Ent) and normal CRFs on three datasets, and improvement was obtained for all four methods over the best published results for two of the datasets. CRFs and semiCRFs achieved the highest overall performance according to the widely-used F-measure, while the dictionary HMMs performed the best at finding entities that actually appear in the dictionary—the measure of most interest in our intended application. PMID:15961466
Gene/protein name recognition based on support vector machine using dictionary as features.

PubMed

Mitsumori, Tomohiro; Fation, Sevrani; Murata, Masaki; Doi, Kouichi; Doi, Hirohumi

2005-01-01

Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets.

PubMed

Shu, Lei; Liu, Bing; Xu, Hu; Kim, Annice

2016-11-01

It is well-known that opinions have targets. Extracting such targets is an important problem of opinion mining because without knowing the target of an opinion, the opinion is of limited use. So far many algorithms have been proposed to extract opinion targets. However, an opinion target can be an entity or an aspect (part or attribute) of an entity. An opinion about an entity is an opinion about the entity as a whole, while an opinion about an aspect is just an opinion about that specific attribute or aspect of an entity. Thus, opinion targets should be separated into entities and aspects before use because they represent very different things about opinions. This paper proposes a novel algorithm, called Lifelong-RL , to solve the problem based on lifelong machine learning and relaxation labeling . Extensive experiments show that the proposed algorithm Lifelong-RL outperforms baseline methods markedly.
75 FR 54697 - Unblocking of Thirteen Specially Designated Nationals Pursuant to Executive Order 13224

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-08

... removing the names of ten entities and three individuals from the list of Specially Designated Nationals... Commit, Threaten To Commit, or Support Terrorism. DATES: The removal of ten entities and three... Foreign Assets Control has determined that these ten entities and three individuals no longer meet the...
78 FR 59880 - Enhanced Consumer Protections for Charter Air Transportation

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-30

...) The name of the company in operational control of the aircraft during flight; (2) any other ``doing... disclosure of the entity in operational control of the aircraft during the flight and seven of those comments... different from the entity in operational control of the aircraft, primarily on the basis that these entities...
Moving beyond the Name: Defining Corporate Entities to Support Provenance-Based Access

ERIC Educational Resources Information Center

Light, Michelle

2007-01-01

The second edition of the "International Standard Archival Authority Records for Corporate Bodies, Persons, and Families (ISAAR(CPF)2)" focuses on describing entities as they exist in reality, rather than on establishing authorized terms. This change allows authority records to include multiple authorized terms representing an entity as it changed…
Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

PubMed Central

Kolluru, BalaKrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR. PMID:21633495
Using workflows to explore and optimise named entity recognition for chemistry.

PubMed

Kolluru, Balakrishna; Hawizy, Lezan; Murray-Rust, Peter; Tsujii, Junichi; Ananiadou, Sophia

2011-01-01

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR.
Evaluating Stream Filtering for Entity Profile Updates in TREC 2012, 2013, and 2014 (KBA Track Overview, Notebook Paper)

DTIC Science & Technology

2014-11-01

possible future directions that build on the KBA experience. Data Assets In addition to the three hundred run submissions from diverse systems...form name of an entity and assigning a confidence score based on the number of matches of tokens in the name. See code in github [6]. macro-P...131 64 GENDER 4 2 FoundedBy 56 30 NAME 2 2 DateOfDeath 54 12 TOP_MEMBERS_EMPLOYEES 2 1 EmployeeOf 44 19 WON_AWARD 1 1
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation

PubMed Central

Huang, Chung-Chi; Lu, Zhiyong

2016-01-01

Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as ‘CHEMICAL-1 compared to CHEMICAL-2.’ With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical–disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. PMID:27016698
LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

PubMed

Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

2017-07-03

A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
ECO: A Framework for Entity Co-Occurrence Exploration with Faceted Navigation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halliday, K. D.

2010-08-20

Even as highly structured databases and semantic knowledge bases become more prevalent, a substantial amount of human knowledge is reported as written prose. Typical textual reports, such as news articles, contain information about entities (people, organizations, and locations) and their relationships. Automatically extracting such relationships from large text corpora is a key component of corporate and government knowledge bases. The primary goal of the ECO project is to develop a scalable framework for extracting and presenting these relationships for exploration using an easily navigable faceted user interface. ECO uses entity co-occurrence relationships to identify related entities. The system aggregates andmore » indexes information on each entity pair, allowing the user to rapidly discover and mine relational information.« less
41 CFR 102-173.50 - What is the naming convention for States?

Code of Federal Regulations, 2014 CFR

2014-01-01

...-INTERNET GOV DOMAIN Registration § 102-173.50 What is the naming convention for States? (a) To register any second-level domain within dot-gov, State government entities must register the full State name or clearly indicate the State postal code within the name. Examples of acceptable names include virginia.gov...
41 CFR 102-173.50 - What is the naming convention for States?

Code of Federal Regulations, 2011 CFR

2011-01-01

...-INTERNET GOV DOMAIN Registration § 102-173.50 What is the naming convention for States? (a) To register any second-level domain within dot-gov, State government entities must register the full State name or clearly indicate the State postal code within the name. Examples of acceptable names include virginia.gov...
41 CFR 102-173.50 - What is the naming convention for States?

Code of Federal Regulations, 2010 CFR

2010-07-01

...-INTERNET GOV DOMAIN Registration § 102-173.50 What is the naming convention for States? (a) To register any second-level domain within dot-gov, State government entities must register the full State name or clearly indicate the State postal code within the name. Examples of acceptable names include virginia.gov...
41 CFR 102-173.50 - What is the naming convention for States?

Code of Federal Regulations, 2013 CFR

2013-07-01

...-INTERNET GOV DOMAIN Registration § 102-173.50 What is the naming convention for States? (a) To register any second-level domain within dot-gov, State government entities must register the full State name or clearly indicate the State postal code within the name. Examples of acceptable names include virginia.gov...

41 CFR 102-173.50 - What is the naming convention for States?

Code of Federal Regulations, 2012 CFR

2012-01-01

...-INTERNET GOV DOMAIN Registration § 102-173.50 What is the naming convention for States? (a) To register any second-level domain within dot-gov, State government entities must register the full State name or clearly indicate the State postal code within the name. Examples of acceptable names include virginia.gov...
41 CFR 102-173.25 - What definitions apply to this part?

Code of Federal Regulations, 2014 CFR

2014-01-01

... Management Regulations System (Continued) FEDERAL MANAGEMENT REGULATION TELECOMMUNICATIONS 173-INTERNET GOV... Administration (GSA) is responsible for registrations in the dot-gov domain. Domain name is a name assigned to an... domain name server. A domain name locates the organization or other entity on the Internet. The dot gov...
41 CFR 102-173.25 - What definitions apply to this part?

Code of Federal Regulations, 2013 CFR

2013-07-01

... Management Regulations System (Continued) FEDERAL MANAGEMENT REGULATION TELECOMMUNICATIONS 173-INTERNET GOV... Administration (GSA) is responsible for registrations in the dot-gov domain. Domain name is a name assigned to an... domain name server. A domain name locates the organization or other entity on the Internet. The dot gov...
41 CFR 102-173.25 - What definitions apply to this part?

Code of Federal Regulations, 2012 CFR

2012-01-01

... Management Regulations System (Continued) FEDERAL MANAGEMENT REGULATION TELECOMMUNICATIONS 173-INTERNET GOV... Administration (GSA) is responsible for registrations in the dot-gov domain. Domain name is a name assigned to an... domain name server. A domain name locates the organization or other entity on the Internet. The dot gov...
Mutant swarms of a totivirus-like entities are present in the red macroalga Chondrus crispus and have been partially transferred to the nuclear genome.

PubMed

Rousvoal, Sylvie; Bouyer, Betty; López-Cristoffanini, Camilo; Boyen, Catherine; Collén, Jonas

2016-08-01

Chondrus crispus Stackhouse (Gigartinales) is a red seaweed found on North Atlantic rocky shores. Electrophoresis of RNA extracts showed a prominent band with a size of around 6,000 bp. Sequencing of the band revealed several sequences with similarity to totiviruses, double-stranded RNA viruses that normally infect fungi. This virus-like entity was named C. crispus virus (CcV). It should probably be regarded as an extreme viral quasispecies or a mutant swarm since low identity (<65%) was found between sequences. Totiviruses typically code for two genes: one capsid gene (gag) and one RNA-dependent RNA polymerase gene (pol) with a pseudoknot structure between the genes. Both the genes and the intergenic structures were found in the CcV sequences. A nonidentical gag gene was also found in the nuclear genome of C. crispus, with associated expressed sequence tags (EST) and upstream regulatory features. The gene was presumably horizontally transferred from the virus to the alga. Similar dsRNA bands were seen in extracts from different life cycle stages of C. crispus and from all geographic locations tested. In addition, similar bands were also observed in RNA extractions from other red algae; however, the significance of this apparently widespread phenomenon is unknown. Neither phenotype caused by the infection nor any virus particles or capsid proteins were identified; thus, the presence of viral particles has not been validated. These findings increase the known host range of totiviruses to include marine red algae. © 2016 Phycological Society of America.
12 CFR 602.24 - Responses to demands served on non-FCA employees or entities.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 12 Banks and Banking 6 2010-01-01 2010-01-01 false Responses to demands served on non-FCA employees or entities. 602.24 Section 602.24 Banks and Banking FARM CREDIT ADMINISTRATION ADMINISTRATIVE... Not a Named Party § 602.24 Responses to demands served on non-FCA employees or entities. If you are...
Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (LexEBI)

PubMed Central

Rebholz-Schuhmann, Dietrich; Kim, Jee-Hyub; Yan, Ying; Dixit, Abhishek; Friteyre, Caroline; Hoehndorf, Robert; Backofen, Rolf; Lewin, Ian

2013-01-01

Motivation Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness). Result This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions. Conclusion LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources. PMID:24124474
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

PubMed

Naderi, Nona; Kappler, Thomas; Baker, Christopher J O; Witte, René

2011-10-01

Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. witte@semanticsoftware.info.
Use of Co-occurrences for Temporal Expressions Annotation

NASA Astrophysics Data System (ADS)

Craveiro, Olga; Macedo, Joaquim; Madeira, Henrique

The annotation or extraction of temporal information from text documents is becoming increasingly important in many natural language processing applications such as text summarization, information retrieval, question answering, etc.. This paper presents an original method for easy recognition of temporal expressions in text documents. The method creates semantically classified temporal patterns, using word co-occurrences obtained from training corpora and a pre-defined seed keywords set, derived from the used language temporal references. A participation on a Portuguese named entity evaluation contest showed promising effectiveness and efficiency results. This approach can be adapted to recognize other type of expressions or languages, within other contexts, by defining the suitable word sets and training corpora.
Boosting drug named entity recognition using an aggregate classifier.

PubMed

Korkontzelos, Ioannis; Piliouras, Dimitrios; Dowsey, Andrew W; Ananiadou, Sophia

2015-10-01

Drug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations. We perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition. Our approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers. Aggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names. We conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
78 FR 26244 - Updating of Employer Identification Numbers

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-06

... (including updated application information regarding the name and taxpayer identifying number of the... require these persons to update application information regarding the name and taxpayer identifying number..., Application for Employer Identification Number, requires entities to disclose the name of the EIN applicant's...
Chemical Entity Recognition and Resolution to ChEBI

PubMed Central

Grego, Tiago; Pesquita, Catia; Bastos, Hugo P.; Couto, Francisco M.

2012-01-01

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. PMID:25937941
PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media

PubMed Central

Cameron, Delroy; Smith, Gary A.; Daniulaityte, Raminta; Sheth, Amit P.; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z.; Falck, Russel

2013-01-01

Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel Semantic Web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO) (pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC). A combination of lexical, pattern-based and semantics-based techniques is used together with the domain knowledge to extract fine-grained semantic information from UGC. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, routes of administration, etc. The DAO is also used to help recognize three types of data, namely: 1) entities, 2) relationships and 3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information from UGC, and querying, search, trend analysis and overall content analysis of social media related to prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. PMID:23892295
Taxonomic indexing--extending the role of taxonomy.

PubMed

Patterson, David J; Remsen, David; Marino, William A; Norton, Cathy

2006-06-01

Taxonomic indexing refers to a new array of taxonomically intelligent network services that use nomenclatural principles and elements of expert taxonomic knowledge to manage information about organisms. Taxonomic indexing was introduced to help manage the increasing amounts of digital information about biology. It has been designed to form a near basal layer in a layered cyberinfrastructure that deals with biological information. Taxonomic Indexing accommodates the special problems of using names of organisms to index biological material. It links alternative names for the same entity (reconciliation), and distinguishes between uses of the same name for different entities (disambiguation), and names are placed within an indefinite number of hierarchical schemes. In order to access all information on all organisms, Taxonomic indexing must be able to call on a registry of all names in all forms for all organisms. NameBank has been developed to meet that need. Taxonomic indexing is an area of informatics that overlaps with taxonomy, is dependent on the expert input of taxonomists, and reveals the relevance of the discipline to a wide audience.
Deep learning with word embeddings improves biomedical named entity recognition.

PubMed

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-07-15

Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/ . habibima@informatik.hu-berlin.de. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Deep learning with word embeddings improves biomedical named entity recognition

PubMed Central

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-01-01

Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de PMID:28881963
Considering context: reliable entity networks through contextual relationship extraction

NASA Astrophysics Data System (ADS)

David, Peter; Hawes, Timothy; Hansen, Nichole; Nolan, James J.

2016-05-01

Existing information extraction techniques can only partially address the problem of exploiting unreadable-large amounts text. When discussion of events and relationships is limited to simple, past-tense, factual descriptions of events, current NLP-based systems can identify events and relationships and extract a limited amount of additional information. But the simple subset of available information that existing tools can extract from text is only useful to a small set of users and problems. Automated systems need to find and separate information based on what is threatened or planned to occur, has occurred in the past, or could potentially occur. We address the problem of advanced event and relationship extraction with our event and relationship attribute recognition system, which labels generic, planned, recurring, and potential events. The approach is based on a combination of new machine learning methods, novel linguistic features, and crowd-sourced labeling. The attribute labeler closes the gap between structured event and relationship models and the complicated and nuanced language that people use to describe them. Our operational-quality event and relationship attribute labeler enables Warfighters and analysts to more thoroughly exploit information in unstructured text. This is made possible through 1) More precise event and relationship interpretation, 2) More detailed information about extracted events and relationships, and 3) More reliable and informative entity networks that acknowledge the different attributes of entity-entity relationships.
LeadMine: a grammar and dictionary driven approach to entity recognition.

PubMed

Lowe, Daniel M; Sayle, Roger A

2015-01-01

Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution.
LeadMine: a grammar and dictionary driven approach to entity recognition

PubMed Central

2015-01-01

Background Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Results Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Conclusions Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution. PMID:25810776
BioC: a minimalist approach to interoperability for biomedical text processing

PubMed Central

Comeau, Donald C.; Islamaj Doğan, Rezarta; Ciccarese, Paolo; Cohen, Kevin Bretonnel; Krallinger, Martin; Leitner, Florian; Lu, Zhiyong; Peng, Yifan; Rinaldi, Fabio; Torii, Manabu; Valencia, Alfonso; Verspoor, Karin; Wiegers, Thomas C.; Wu, Cathy H.; Wilbur, W. John

2013-01-01

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ PMID:24048470

Recognizing the Emotional Valence of Names: An ERP Study

ERIC Educational Resources Information Center

Wang, Lin; Zhu, Zude; Bastiaansen, Marcel; Hagoort, Peter; Yang, Yufang

2013-01-01

Unlike common nouns, person names refer to unique entities and generally have a referring function. We used event-related potentials to investigate the time course of identifying the emotional meaning of nouns and names. The emotional valence of names and nouns were manipulated separately. The results show early N1 effects in response to emotional…
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

PubMed

Huang, Chung-Chi; Lu, Zhiyong

2016-01-01

Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order covering diverse bio-entity relations. To assess the potential utility of our automated top-ranked patterns of a given relation in semantic search, we performed a pilot study on frequently sought semantic relations in PubMed and observed improved literature retrieval effectiveness based on post-hoc human relevance evaluation. Further investigation in larger tests and in real-world scenarios is warranted. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
PREDOSE: a semantic web platform for drug abuse epidemiology using social media.

PubMed

Cameron, Delroy; Smith, Gary A; Daniulaityte, Raminta; Sheth, Amit P; Dave, Drashti; Chen, Lu; Anand, Gaurish; Carlson, Robert; Watkins, Kera Z; Falck, Russel

2013-12-01

The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO--pronounced dow), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future. Copyright © 2013 Elsevier Inc. All rights reserved.
24 CFR 1710.208 - General information.

Code of Federal Regulations, 2010 CFR

2010-04-01

... any of the principals of the owner or developer are corporate entities, name the parent and/or... registration or prohibited sales, name the State involved and give the reasons cited by the State for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
24 CFR 1710.208 - General information.

Code of Federal Regulations, 2014 CFR

2014-04-01

... any of the principals of the owner or developer are corporate entities, name the parent and/or... registration or prohibited sales, name the State involved and give the reasons cited by the State for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
24 CFR 1710.208 - General information.

Code of Federal Regulations, 2013 CFR

2013-04-01

... any of the principals of the owner or developer are corporate entities, name the parent and/or... registration or prohibited sales, name the State involved and give the reasons cited by the State for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
24 CFR 1710.208 - General information.

Code of Federal Regulations, 2011 CFR

2011-04-01

... any of the principals of the owner or developer are corporate entities, name the parent and/or... registration or prohibited sales, name the State involved and give the reasons cited by the State for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
24 CFR 1710.208 - General information.

Code of Federal Regulations, 2012 CFR

2012-04-01

... any of the principals of the owner or developer are corporate entities, name the parent and/or... registration or prohibited sales, name the State involved and give the reasons cited by the State for their... made with the SEC, give the SEC identification number; identify the prospectus by name; date of filing...
Deadlock Detection in Computer Networks

DTIC Science & Technology

1977-09-01

it entity class name (ndm-procownerref) = -:"node tab5le" I procnode_name z res-rnode-name call then return; nc ll c eck -for-deadlock(p_obplref...demo12 ~-exlusive sae con Caobridg Fina Sttonaa con0 Official Distribution List Defense Documentation Center New York Area Office Cameron Station 715
Anaplastic sarcoma of the kidney.

PubMed

Labanaris, Apostolos; Zugor, Vahudin; Smiszek, Robert; Nützel, Reinhold; Kühn, Reinhard

2009-02-15

Wilms tumor can appear with a wide spectrum of morphologic features and can sometimes cover or delay the recognition of other clinicopathologic entities of the kidney. We present a case of a new tumor entity of the kidney, namely the anaplastic sarcoma of the kidney, a tumor of high malignancy.
78 FR 60378 - Request for Applications for the IRS Advisory Committee on Tax Exempt and Government Entities

Federal Register 2010, 2011, 2012, 2013, 2014

2013-10-01

... letter with the following information: Name; Other Name(s) Used and Date(s) (required for FBI check); Date of Birth (required for FBI check); City and State of Birth (required for FBI Check); Current..., among other things, pre-appointment and annual tax checks, and an FBI criminal and subversive name check...
Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

PubMed Central

Kavuluru, Ramakanth; Han, Sifei; Harris, Daniel

2017-01-01

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient’s medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult. PMID:28748227
Discussion: Imagining the Languaged Worker's Language

ERIC Educational Resources Information Center

Urciuoli, Bonnie

2016-01-01

What people perceive as "a language"--a named entity--is abstracted from practices and notions about those practices. People take for granted that language is somehow a "thing," an objectively distinct and bounded entity. How languages come to be thus imagined indexes the conditions under which they are imagined. The articles…
24 CFR 202.5 - General approval standards.

Code of Federal Regulations, 2011 CFR

2011-04-01

... years from the date that the materials are circulated or used to advertise. (3) Non-FHA-approved entities. A lender or mortgagee that accepts a loan application from a non-FHA-approved entity must confirm..., including, but not limited to, mergers, terminations, name, location, control of ownership, and character of...
24 CFR 202.5 - General approval standards.

Code of Federal Regulations, 2013 CFR

2013-04-01

... years from the date that the materials are circulated or used to advertise. (3) Non-FHA-approved entities. A lender or mortgagee that accepts a loan application from a non-FHA-approved entity must confirm..., including, but not limited to, mergers, terminations, name, location, control of ownership, and character of...
24 CFR 202.5 - General approval standards.

Code of Federal Regulations, 2012 CFR

2012-04-01

... years from the date that the materials are circulated or used to advertise. (3) Non-FHA-approved entities. A lender or mortgagee that accepts a loan application from a non-FHA-approved entity must confirm..., including, but not limited to, mergers, terminations, name, location, control of ownership, and character of...
49 CFR Appendix C to Part 37 - Certifications

Code of Federal Regulations, 2010 CFR

2010-10-01

..., including individuals who use wheelchairs, is equivalent to the level and quality of service offered to... (name of public entity (ies)) has conducted a survey of existing paratransit services as required by 49... is to certify that service provided by other entities but included in the ADA paratransit plan...
Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities

NASA Astrophysics Data System (ADS)

Müller, Bernd; Klinger, Roman; Gurulingappa, Harsha; Mevissen, Heinz-Theodor; Hofmann-Apitius, Martin; Fluck, Juliane; Friedrich, Christoph M.

In information retrieval, named entity recognition gives the opportunity to apply semantic search in domain specific corpora. Recently, more full text patents and journal articles became freely available. As the information distribution amongst the different sections is unknown, an analysis of the diversity is of interest.
77 FR 48609 - Additional Designations, Foreign Narcotics Kingpin Designation Act

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-14

... the names of three individuals and five entities whose property and interests in property have been... designation by the Director of OFAC of the three individuals and five entities identified in this notice... transactions involving U.S. companies and individuals. The Kingpin Act blocks all property and interests in...
22 CFR 96.32 - Internal structure and oversight.

Code of Federal Regulations, 2011 CFR

2011-04-01

... known, under either its current or any former form of organization, and the addresses and phone numbers used when such names were used; (2) The name, address, and phone number of each current director... number of such other provider; and (3) The name, address, and phone number of any entity it uses or...

22 CFR 96.32 - Internal structure and oversight.

Code of Federal Regulations, 2012 CFR

2012-04-01

... known, under either its current or any former form of organization, and the addresses and phone numbers used when such names were used; (2) The name, address, and phone number of each current director... number of such other provider; and (3) The name, address, and phone number of any entity it uses or...
22 CFR 96.32 - Internal structure and oversight.

Code of Federal Regulations, 2014 CFR

2014-04-01

... known, under either its current or any former form of organization, and the addresses and phone numbers used when such names were used; (2) The name, address, and phone number of each current director... number of such other provider; and (3) The name, address, and phone number of any entity it uses or...
22 CFR 96.32 - Internal structure and oversight.

Code of Federal Regulations, 2013 CFR

2013-04-01

... known, under either its current or any former form of organization, and the addresses and phone numbers used when such names were used; (2) The name, address, and phone number of each current director... number of such other provider; and (3) The name, address, and phone number of any entity it uses or...
30 CFR 1218.540 - How does ONRR serve official correspondence?

Code of Federal Regulations, 2014 CFR

2014-07-01

... reporting entity is responsible for notifying ONRR of any name or address changes on Form ONRR-4444. The... name and address, position title, or department name and address in our database, based on previous... registered agent; (ii) Any corporate officer; or (iii) The addressee of record shown in the files of any...
A Concept Hierarchy Based Ontology Mapping Approach

NASA Astrophysics Data System (ADS)

Wang, Ying; Liu, Weiru; Bell, David

Ontology mapping is one of the most important tasks for ontology interoperability and its main aim is to find semantic relationships between entities (i.e. concept, attribute, and relation) of two ontologies. However, most of the current methods only consider one to one (1:1) mappings. In this paper we propose a new approach (CHM: Concept Hierarchy based Mapping approach) which can find simple (1:1) mappings and complex (m:1 or 1:m) mappings simultaneously. First, we propose a new method to represent the concept names of entities. This method is based on the hierarchical structure of an ontology such that each concept name of entity in the ontology is included in a set. The parent-child relationship in the hierarchical structure of an ontology is then extended as a set-inclusion relationship between the sets for the parent and the child. Second, we compute the similarities between entities based on the new representation of entities in ontologies. Third, after generating the mapping candidates, we select the best mapping result for each source entity. We design a new algorithm based on the Apriori algorithm for selecting the mapping results. Finally, we obtain simple (1:1) and complex (m:1 or 1:m) mappings. Our experimental results and comparisons with related work indicate that utilizing this method in dealing with ontology mapping is a promising way to improve the overall mapping results.
Efficient authentication scheme based on near-ring root extraction problem

NASA Astrophysics Data System (ADS)

Muthukumaran, V.; Ezhilmaran, D.

2017-11-01

An authentication protocolis the type of computer communication protocol or cryptography protocol specifically designed for transfer of authentication data between two entities. We have planned a two new entity authentication scheme on the basis of root extraction problem near-ring in this article. We suggest that this problem is suitably difficult to serve as a cryptographic assumption over the platform of near-ring N. The security issues also discussed.
77 FR 44717 - Unblocking of Specially Designated Nationals and Blocked Persons Pursuant to the Foreign...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-30

... Foreign Assets Control (``OFAC'') is publishing the names of ten individuals and nine entities whose... ten individuals and nine entities identified in this notice whose property and interests in property... international narcotics trafficking. On July 24, 2012, the Director of OFAC removed from the SDN List the ten...
78 FR 37664 - Identification of Entities Pursuant to the Iranian Transactions and Sanctions Regulations and...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-21

... Control (``OFAC'') is publishing the names of 38 entities identified as the Government of Iran under the... Government of Iran and Iranian Financial Institutions'' (the ``Order''). Section 1(a) of the Order blocks, with certain exceptions, all property and interests in property of the Government of Iran, including...
78 FR 78514 - Designation of One Individual and Three Entities Pursuant to Executive Order

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-26

... DEPARTMENT OF THE TREASURY Office of Foreign Assets Control Designation of One Individual and... publishing the name of one individual and three entities whose property and interests in property are blocked..., Security, or Stability of Burma.'' DATES: The designation by the Director of OFAC of the one individual and...
18 CFR 131.31 - FERC Form No. 561, Annual report of interlocking positions.

Code of Federal Regulations, 2010 CFR

2010-04-01

... supplies electric equipment (ELEQ) named in Column (3) enter the aggregate amount of revenues from... utility ELEQ Entity which produces/supplies electric equipment for the use of any public utility FUEL Entity which produces/supplies coal, natural gas, nuclear fuel, or other fuel for the use of any public...
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

PubMed

Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua

2015-01-01

Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.
Chemical-induced disease relation extraction with various linguistic features.

PubMed

Gu, Jinghang; Qian, Longhua; Zhou, Guodong

2016-01-01

Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- and inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promisingF-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively. Database URL:https://github.com/JHnlp/BC5CIDTask. © The Author(s) 2016. Published by Oxford University Press.
Extracting laboratory test information from biomedical text

PubMed Central

Kang, Yanna Shen; Kayaalp, Mehmet

2013-01-01

Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058
Classifying Web Pages by Using Knowledge Bases for Entity Retrieval

NASA Astrophysics Data System (ADS)

Kiritani, Yusuke; Ma, Qiang; Yoshikawa, Masatoshi

In this paper, we propose a novel method to classify Web pages by using knowledge bases for entity search, which is a kind of typical Web search for information related to a person, location or organization. First, we map a Web page to entities according to the similarities between the page and the entities. Various methods for computing such similarity are applied. For example, we can compute the similarity between a given page and a Wikipedia article describing a certain entity. The frequency of an entity appearing in the page is another factor used in computing the similarity. Second, we construct a directed acyclic graph, named PEC graph, based on the relations among Web pages, entities, and categories, by referring to YAGO, a knowledge base built on Wikipedia and WordNet. Finally, by analyzing the PEC graph, we classify Web pages into categories. The results of some preliminary experiments validate the methods proposed in this paper.
Extracting biomedical events from pairs of text entities

PubMed Central

2015-01-01

Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478
12 CFR 204.130 - Eligibility for NOW accounts.

Code of Federal Regulations, 2010 CFR

2010-01-01

... clarify the types of entities that may maintain NOW accounts at member banks. (b) Individuals. (1) Any individual may maintain a NOW account regardless of the purposes that the funds will serve. Thus, deposits of... under a trade name is eligible to maintain a NOW account in the individual's name or in the “DBA” name...
30 CFR 1218.540 - How does ONRR serve official correspondence?

Code of Federal Regulations, 2012 CFR

2012-07-01

... correspondence at issue. The company or reporting entity is responsible for notifying ONRR of any name or address changes on Form MMS-4444. The addressee of record in a part 1290, appeal will be the person or...-4444, we may use the individual name and address, position title, or department name and address in our...
30 CFR 1218.540 - How does ONRR serve official correspondence?

Code of Federal Regulations, 2013 CFR

2013-07-01

... correspondence at issue. The company or reporting entity is responsible for notifying ONRR of any name or address changes on Form ONRR-4444. The addressee of record in a part 1290, appeal will be the person or...-4444, we may use the individual name and address, position title, or department name and address in our...
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

PubMed

Soysal, Ergin; Wang, Jingqi; Jiang, Min; Wu, Yonghui; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua

2017-11-24

Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

PubMed Central

Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G

2010-01-01

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853

Aggregating and Predicting Sequence Labels from Crowd Annotations

PubMed Central

Nguyen, An T.; Wallace, Byron C.; Li, Junyi Jessy; Nenkova, Ani; Lease, Matthew

2017-01-01

Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online1. PMID:29093611
78 FR 39021 - Privacy Act of 1974; Privacy and Civil Liberties Oversight Board; System of Records Notice

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-28

... requests or appeals on behalf of other persons or entities; individuals who are the subjects of FOIA or PA... number) information, and proof of identification; names and other information about persons who are the... oversight function. E. To appropriate agencies, entities, and persons when: 1. The Board suspects or has...
21 CFR 203.30 - Sample distribution by mail or common carrier.

Code of Federal Regulations, 2011 CFR

2011-04-01

... pharmacy of a hospital or other health care entity, by mail or common carrier, provided that: (1) The... to the pharmacy of a hospital or other health care entity is required to contain, in addition to all of the information in paragraph (b)(l) of this section, the name and address of the pharmacy of the...
21 CFR 203.30 - Sample distribution by mail or common carrier.

Code of Federal Regulations, 2013 CFR

2013-04-01

... pharmacy of a hospital or other health care entity, by mail or common carrier, provided that: (1) The... to the pharmacy of a hospital or other health care entity is required to contain, in addition to all of the information in paragraph (b)(l) of this section, the name and address of the pharmacy of the...
21 CFR 203.30 - Sample distribution by mail or common carrier.

Code of Federal Regulations, 2012 CFR

2012-04-01

... pharmacy of a hospital or other health care entity, by mail or common carrier, provided that: (1) The... to the pharmacy of a hospital or other health care entity is required to contain, in addition to all of the information in paragraph (b)(l) of this section, the name and address of the pharmacy of the...
21 CFR 203.30 - Sample distribution by mail or common carrier.

Code of Federal Regulations, 2014 CFR

2014-04-01

... pharmacy of a hospital or other health care entity, by mail or common carrier, provided that: (1) The... to the pharmacy of a hospital or other health care entity is required to contain, in addition to all of the information in paragraph (b)(l) of this section, the name and address of the pharmacy of the...
78 FR 38537 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-68; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-26

... Entity Compliance Guide. SUMMARY: This document is issued under the joint authority of DOD, GSA, and NASA... whose name appears in the table below. Please cite FAC 2005-68 and the FAR case number. For information... Listed in FAC 2005-68 Subject FAR Case Analyst *Expansion of Applicability of the Senior Executive...
[Nonspecific interstitial pneumonitis: a clinicopathologic entity, histologic pattern or unclassified group of heterogeneous interstitial pneumonitis?].

PubMed

Morais, António; Moura, M Conceição Souto; Cruz, M Rosa; Gomes, Isabel

2004-01-01

Nonspecific interstitial pneumonitis (NSIP) initially described by Katzenstein and Fiorelli in 1994, seems to be a distinct clinicopathologic entity among idiopathic interstitial pneumonitis (IIP). Besides different histologic features from other IIP, NSIP is characterized by a better long-term outcome, associated with a better steroids responsiveness than idiopathic pulmonar fibrosis (IPF), where usually were included. Thus, differentiating NSIP from other IIP, namely IPF is very significant, since it has important therapeutic and prognostic implications. NSIP encloses different pathologies, namely those with inflammatory predominance (cellular subtype) or fibrous predominance (fibrosing subtype). NSIP is reviewed and discussed by the authors, after two clinical cases description.
Disambiguation of patent inventors and assignees using high-resolution geolocation data.

PubMed

Morrison, Greg; Riccaboni, Massimo; Pammolli, Fabio

2017-05-16

Patent data represent a significant source of information on innovation, knowledge production, and the evolution of technology through networks of citations, co-invention and co-assignment. A major obstacle to extracting useful information from this data is the problem of name disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in knowledge production and diffusion. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventors and assignees on about 8.5 million patents found in the European Patent Office (EPO), under the Patent Cooperation Treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show this disambiguation is consistent with a number of ground-truth benchmarks of both assignees and inventors, significantly outperforming the use of undisambiguated names to identify unique entities. A significant benefit of this work is the high quality assignee disambiguation with coverage across the world coupled with an inventor disambiguation (that is competitive with other state of the art approaches) in multiple patent offices.
Disambiguation of patent inventors and assignees using high-resolution geolocation data

PubMed Central

Morrison, Greg; Riccaboni, Massimo; Pammolli, Fabio

2017-01-01

Patent data represent a significant source of information on innovation, knowledge production, and the evolution of technology through networks of citations, co-invention and co-assignment. A major obstacle to extracting useful information from this data is the problem of name disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in knowledge production and diffusion. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventors and assignees on about 8.5 million patents found in the European Patent Office (EPO), under the Patent Cooperation Treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show this disambiguation is consistent with a number of ground-truth benchmarks of both assignees and inventors, significantly outperforming the use of undisambiguated names to identify unique entities. A significant benefit of this work is the high quality assignee disambiguation with coverage across the world coupled with an inventor disambiguation (that is competitive with other state of the art approaches) in multiple patent offices. PMID:28509897
77 FR 63217 - Use of Additional Portable Oxygen Concentrators on Board Aircraft

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-16

..., organizations, and governmental jurisdictions subject to regulation. To achieve this principle, agencies are... small entities, including small businesses, not-for-profit organizations, and small governmental... manufacturer's names. In this final rule, the FAA will add those previous manufacturer's names (International...
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

PubMed

Döring, Kersten; Grüning, Björn A; Telukunta, Kiran K; Thomas, Philippe; Günther, Stefan

2016-01-01

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.
PubMedPortable: A Framework for Supporting the Development of Text Mining Applications

PubMed Central

Döring, Kersten; Grüning, Björn A.; Telukunta, Kiran K.; Thomas, Philippe; Günther, Stefan

2016-01-01

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user’s system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects. PMID:27706202
46 CFR 520.3 - Publication responsibilities.

Code of Federal Regulations, 2013 CFR

2013-10-01

... tariff, of its organization name, organization number, home office address, name and telephone number of... tariffs, by electronically submitting Form FMC-1 via the Commission's website at www.fmc.gov. Any changes... unique organization number to new entities operating as common carriers or conferences in the U.S...
46 CFR 520.3 - Publication responsibilities.

Code of Federal Regulations, 2014 CFR

2014-10-01

... tariff, of its organization name, organization number, home office address, name and telephone number of... tariffs, by electronically submitting Form FMC-1 via the Commission's website at www.fmc.gov. Any changes... unique organization number to new entities operating as common carriers or conferences in the U.S...
46 CFR 520.3 - Publication responsibilities.

Code of Federal Regulations, 2011 CFR

2011-10-01

... tariff, of its organization name, organization number, home office address, name and telephone number of... tariffs, by electronically submitting Form FMC-1 via the Commission's website at www.fmc.gov. Any changes... unique organization number to new entities operating as common carriers or conferences in the U.S...
46 CFR 520.3 - Publication responsibilities.

Code of Federal Regulations, 2012 CFR

2012-10-01

... tariff, of its organization name, organization number, home office address, name and telephone number of... tariffs, by electronically submitting Form FMC-1 via the Commission's website at www.fmc.gov. Any changes... unique organization number to new entities operating as common carriers or conferences in the U.S...
75 FR 18887 - FBI Criminal Justice Information Services Division User Fees

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-13

.... SUMMARY: This notice establishes the user fee schedule for fingerprint- based and name-based criminal... fingerprint-based and other identification services as authorized by federal law. These fees apply to federal, state and any other authorized entities requesting fingerprint identification records and name checks...
31 CFR 535.508 - Payments to blocked accounts in domestic banks.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Iran or any Iranian entity is hereby authorized: Provided, Such payment or transfer shall not be made... the interest of Iran or an Iranian entity to any other country or person. (b) This section does not authorize: (1) Any payment or transfer to any blocked account held in a name other than that of Iran or the...
31 CFR 535.508 - Payments to blocked accounts in domestic banks.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Iran or any Iranian entity is hereby authorized: Provided, Such payment or transfer shall not be made... the interest of Iran or an Iranian entity to any other country or person. (b) This section does not authorize: (1) Any payment or transfer to any blocked account held in a name other than that of Iran or the...

78 FR 80381 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-72; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-31

...: Small Entity Compliance Guide. SUMMARY: This document is issued under the joint authority of DOD, GSA..., contact the analyst whose name appears in the table below. Please cite FAC 2005-72 and the FAR case number... 202- 501-4755. Rules Listed in FAC 2005-72 Item Subject FAR Case Analyst *I Service 2010-010 Loeb...
Assessment of disease named entity recognition on a corpus of annotated sentences.

PubMed

Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich

2008-04-11

In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298).
Algorithms and semantic infrastructure for mutation impact extraction and grounding.

PubMed

Laurila, Jonas B; Naderi, Nona; Witte, René; Riazanov, Alexandre; Kouznetsov, Alexandre; Baker, Christopher J O

2010-12-02

Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
DISEASES: text mining and data integration of disease-gene associations.

PubMed

Pletscher-Frankild, Sune; Pallejà, Albert; Tsafou, Kalliopi; Binder, Janos X; Jensen, Lars Juhl

2015-03-01

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Data Processing and Text Mining Technologies on Electronic Medical Records: A Review

PubMed Central

Sun, Wencheng; Li, Yangyang; Liu, Fang; Fang, Shengqun; Wang, Guoyan

2018-01-01

Currently, medical institutes generally use EMR to record patient's condition, including diagnostic information, procedures performed, and treatment results. EMR has been recognized as a valuable resource for large-scale analysis. However, EMR has the characteristics of diversity, incompleteness, redundancy, and privacy, which make it difficult to carry out data mining and analysis directly. Therefore, it is necessary to preprocess the source data in order to improve data quality and improve the data mining results. Different types of data require different processing technologies. Most structured data commonly needs classic preprocessing technologies, including data cleansing, data integration, data transformation, and data reduction. For semistructured or unstructured data, such as medical text, containing more health information, it requires more complex and challenging processing methods. The task of information extraction for medical texts mainly includes NER (named-entity recognition) and RE (relation extraction). This paper focuses on the process of EMR processing and emphatically analyzes the key techniques. In addition, we make an in-depth study on the applications developed based on text mining together with the open challenges and research issues for future work. PMID:29849998
Locating and parsing bibliographic references in HTML medical articles

PubMed Central

Zou, Jie; Le, Daniel; Thoma, George R.

2010-01-01

The set of references that typically appear toward the end of journal articles is sometimes, though not always, a field in bibliographic (citation) databases. But even if references do not constitute such a field, they can be useful as a preprocessing step in the automated extraction of other bibliographic data from articles, as well as in computer-assisted indexing of articles. Automation in data extraction and indexing to minimize human labor is key to the affordable creation and maintenance of large bibliographic databases. Extracting the components of references, such as author names, article title, journal name, publication date and other entities, is therefore a valuable and sometimes necessary task. This paper describes a two-step process using statistical machine learning algorithms, to first locate the references in HTML medical articles and then to parse them. Reference locating identifies the reference section in an article and then decomposes it into individual references. We formulate this step as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles drawn from 100 medical journals achieves near-perfect precision and recall rates for locating references. Reference parsing identifies the components of each reference. For this second step, we implement and compare two algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, followed by a search algorithm that systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference-parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level. PMID:20640222
Locating and parsing bibliographic references in HTML medical articles.

PubMed

Zou, Jie; Le, Daniel; Thoma, George R

2010-06-01

The set of references that typically appear toward the end of journal articles is sometimes, though not always, a field in bibliographic (citation) databases. But even if references do not constitute such a field, they can be useful as a preprocessing step in the automated extraction of other bibliographic data from articles, as well as in computer-assisted indexing of articles. Automation in data extraction and indexing to minimize human labor is key to the affordable creation and maintenance of large bibliographic databases. Extracting the components of references, such as author names, article title, journal name, publication date and other entities, is therefore a valuable and sometimes necessary task. This paper describes a two-step process using statistical machine learning algorithms, to first locate the references in HTML medical articles and then to parse them. Reference locating identifies the reference section in an article and then decomposes it into individual references. We formulate this step as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles drawn from 100 medical journals achieves near-perfect precision and recall rates for locating references. Reference parsing identifies the components of each reference. For this second step, we implement and compare two algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, followed by a search algorithm that systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference-parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

PubMed

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.Available on: http://babar.unige.ch:8082/neXtA5Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp. © The Author(s) 2016. Published by Oxford University Press.
neXtA5: accelerating annotation of articles via automated approaches in neXtProt

PubMed Central

Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

2016-01-01

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp PMID:27374119
Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

NASA Astrophysics Data System (ADS)

Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.
Combination of Evidence for Effective Web Search

DTIC Science & Technology

2010-11-01

SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION /AVAILABILITY...STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES Presented at the Nineteenth Text REtrieval Conference (TREC...use that page to expand. This happens often with named entity queries (such as ‘the secret garden’ or ‘ starbucks ’). However, when the query is
Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.

PubMed

Wu, Stephen; Liu, Hongfang

2011-01-01

Natural language processing (NLP) has become crucial in unlocking information stored in free text, from both clinical notes and biomedical literature. Clinical notes convey clinical information related to individual patient health care, while biomedical literature communicates scientific findings. This work focuses on semantic characterization of texts at an enterprise scale, comparing and contrasting the two domains and their NLP approaches. We analyzed the empirical distributional characteristics of NLP-discovered named entities in Mayo Clinic clinical notes from 2001-2010, and in the 2011 MetaMapped Medline Baseline. We give qualitative and quantitative measures of domain similarity and point to the feasibility of transferring resources and techniques. An important by-product for this study is the development of a weighted ontology for each domain, which gives distributional semantic information that may be used to improve NLP applications.
Framework for automatic information extraction from research papers on nanocrystal devices

PubMed Central

Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

2015-01-01

Summary To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system. PMID:26665057
Framework for automatic information extraction from research papers on nanocrystal devices.

PubMed

Dieb, Thaer M; Yoshioka, Masaharu; Hara, Shinjiro; Newton, Marcus C

2015-01-01

To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called " NaDev" (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called "NaDevEx" (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39-73%); however, precision is better (75-97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.
Chemical-induced disease relation extraction via convolutional neural network.

PubMed

Gu, Jinghang; Sun, Fuqing; Qian, Longhua; Zhou, Guodong

2017-01-01

This article describes our work on the BioCreative-V chemical-disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach. http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/. © The Author(s) 2017. Published by Oxford University Press.
78 FR 79007 - Certain Multiple Mode Outdoor Grills and Parts Thereof; Commission Determination Not To Review an...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-27

... Antonio, Texas; Kmart Corporation of Hoffman Estates, Illinois; Sears Brands Management Corporation, Sears... Specialty Brands, LLC, (2) change the name of Respondent Rankam Group to Rankam Metal Products Manufactory... Kamado Joe Company is a trade name for the legal entity Premier Specialty Brands, LLC; Rankam Metal...
49 CFR Appendix E to Part 512 - Consumer Assistance to Recycle and Save (CARS) Class Determinations

Code of Federal Regulations, 2013 CFR

2013-10-01

... of the new vehicle owner's name, home address, telephone number, state identification number and last... harm to the competitive position of the entity submitting the information: (1) Vehicle Manufacturer Issued Dealer Identification Code; (2) Dealer Bank Name, ABA Routing Number and Bank Account Number; and...
77 FR 31356 - Pesticide Products; Receipt of Applications To Register New Uses

Federal Register 2010, 2011, 2012, 2013, 2014

2012-05-25

... Number: EPA-HQ-OPP-2012- 0241. Company name and address: Bayer CropScience LP, 2 T. W. Alexander Drive.... Registration Number: 264-825. Docket Number: EPA-HQ-OPP-2012- 0325. Company name and address: Bayer CropScience... pesticide manufacturer. Potentially affected entities may include, but are not limited to: Crop production...
Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx

PubMed Central

Weegar, Rebecka; Kvist, Maria; Sundström, Karin; Brunak, Søren; Dalianis, Hercules

2015-01-01

Detection of early symptoms in cervical cancer is crucial for early treatment and survival. To find symptoms of cervical cancer in clinical text, Named Entity Recognition is needed. In this paper the Clinical Entity Finder, a machine-learning tool trained on annotated clinical text from a Swedish internal medicine emergency unit, is evaluated on cervical cancer records. The Clinical Entity Finder identifies entities of the types body part, finding and disorder and is extended with negation detection using the rule-based tool NegEx, to distinguish between negated and non-negated entities. To measure the performance of the tools on this new domain, two physicians annotated a set of clinical notes from the health records of cervical cancer patients. The inter-annotator agreement for finding, disorder and body part obtained an average F-score of 0.677 and the Clinical Entity Finder extended with NegEx had an average F-score of 0.667. PMID:26958270
Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx.

PubMed

Weegar, Rebecka; Kvist, Maria; Sundström, Karin; Brunak, Søren; Dalianis, Hercules

2015-01-01

Detection of early symptoms in cervical cancer is crucial for early treatment and survival. To find symptoms of cervical cancer in clinical text, Named Entity Recognition is needed. In this paper the Clinical Entity Finder, a machine-learning tool trained on annotated clinical text from a Swedish internal medicine emergency unit, is evaluated on cervical cancer records. The Clinical Entity Finder identifies entities of the types body part, finding and disorder and is extended with negation detection using the rule-based tool NegEx, to distinguish between negated and non-negated entities. To measure the performance of the tools on this new domain, two physicians annotated a set of clinical notes from the health records of cervical cancer patients. The inter-annotator agreement for finding, disorder and body part obtained an average F-score of 0.677 and the Clinical Entity Finder extended with NegEx had an average F-score of 0.667.

Network analysis of named entity co-occurrences in written texts

NASA Astrophysics Data System (ADS)

Amancio, Diego Raphael

2016-06-01

The use of methods borrowed from statistics and physics to analyze written texts has allowed the discovery of unprecedent patterns of human behavior and cognition by establishing links between models features and language structure. While current models have been useful to unveil patterns via analysis of syntactical and semantical networks, only a few works have probed the relevance of investigating the structure arising from the relationship between relevant entities such as characters, locations and organizations. In this study, we represent entities appearing in the same context as a co-occurrence network, where links are established according to a null model based on random, shuffled texts. Computational simulations performed in novels revealed that the proposed model displays interesting topological features, such as the small world feature, characterized by high values of clustering coefficient. The effectiveness of our model was verified in a practical pattern recognition task in real networks. When compared with traditional word adjacency networks, our model displayed optimized results in identifying unknown references in texts. Because the proposed representation plays a complementary role in characterizing unstructured documents via topological analysis of named entities, we believe that it could be useful to improve the characterization of written texts (and related systems), specially if combined with traditional approaches based on statistical and deeper paradigms.
NeuroNames: an ontology for the BrainInfo portal to neuroscience on the web.

PubMed

Bowden, Douglas M; Song, Evan; Kosheleva, Julia; Dubach, Mark F

2012-01-01

BrainInfo ( http://braininfo.org ) is a growing portal to neuroscientific information on the Web. It is indexed by NeuroNames, an ontology designed to compensate for ambiguities in neuroanatomical nomenclature. The 20-year old ontology continues to evolve toward the ideal of recognizing all names of neuroanatomical entities and accommodating all structural concepts about which neuroscientists communicate, including multiple concepts of entities for which neuroanatomists have yet to determine the best or 'true' conceptualization. To make the definitions of structural concepts unambiguous and terminologically consistent we created a 'default vocabulary' of unique structure names selected from existing terminology. We selected standard names by criteria designed to maximize practicality for use in verbal communication as well as computerized knowledge management. The ontology of NeuroNames accommodates synonyms and homonyms of the standard terms in many languages. It defines complex structures as models composed of primary structures, which are defined in unambiguous operational terms. NeuroNames currently relates more than 16,000 names in eight languages to some 2,500 neuroanatomical concepts. The ontology is maintained in a relational database with three core tables: Names, Concepts and Models. BrainInfo uses NeuroNames to index information by structure, to interpret users' queries and to clarify terminology on remote web pages. NeuroNames is a resource vocabulary of the NLM's Unified Medical Language System (UMLS, 2011) and the basis for the brain regions component of NIFSTD (NeuroLex, 2011). The current version has been downloaded to hundreds of laboratories for indexing data and linking to BrainInfo, which attracts some 400 visitors/day, downloading 2,000 pages/day.
Cueing Animations: Dynamic Signaling Aids Information Extraction and Comprehension

ERIC Educational Resources Information Center

Boucheix, Jean-Michel; Lowe, Richard K.; Putri, Dian K.; Groff, Jonathan

2013-01-01

The effectiveness of animations containing two novel forms of animation cueing that target relations between event units rather than individual entities was compared with that of animations containing conventional entity-based cueing or no cues. These relational event unit cues ("progressive path" and "local coordinated" cues) were specifically…
Geoparsing text for characterizing urban operational environments through machine learning techniques

NASA Astrophysics Data System (ADS)

Garfinkle, Noah W.; Selig, Lucas; Perkins, Timothy K.; Calfas, George W.

2017-05-01

Increasing worldwide internet connectivity and access to sources of print and open social media has increased near realtime availability of textual information. Capabilities to structure and integrate textual data streams can contribute to more meaningful representations of operational environment factors (i.e., Political, Military, Economic, Social, Infrastructure, Information, Physical Environment, and Time [PMESII-PT]) and tactical civil considerations (i.e., Areas, Structures, Capabilities, Organizations, People and Events [ASCOPE]). However, relying upon human analysts to encode this information as it arrives quickly proves intractable. While human analysts possess an ability to comprehend context in unstructured text far beyond that of computers, automated geoparsing (the extraction of locations from unstructured text) can empower analysts to automate sifting through datasets for areas of interest. This research evaluates existing approaches to geoprocessing as well as initiating the research and development of locally-improved methods of tagging parts of text as possible locations, resolving possible locations into coordinates, and interfacing such results with human analysts. The objective of this ongoing research is to develop a more contextually-complete picture of an area of interest (AOI) including human-geographic context for events. In particular, our research is working to make improvements to geoparsing (i.e., the extraction of spatial context from documents), which requires development, integration, and validation of named-entity recognition (NER) tools, gazetteers, and entity-attribution. This paper provides an overview of NER models and methodologies as applied to geoparsing, explores several challenges encountered, presents preliminary results from the creation of a flexible geoparsing research pipeline, and introduces ongoing and future work with the intention of contributing to the efficient geocoding of information containing valuable insights into human activities in space.
LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC.

PubMed

Allot, Alexis; Peng, Yifan; Wei, Chih-Hsuan; Lee, Kyubum; Phan, Lon; Lu, Zhiyong

2018-05-14

The identification and interpretation of genomic variants play a key role in the diagnosis of genetic diseases and related research. These tasks increasingly rely on accessing relevant manually curated information from domain databases (e.g. SwissProt or ClinVar). However, due to the sheer volume of medical literature and high cost of expert curation, curated variant information in existing databases are often incomplete and out-of-date. In addition, the same genetic variant can be mentioned in publications with various names (e.g. 'A146T' versus 'c.436G>A' versus 'rs121913527'). A search in PubMed using only one name usually cannot retrieve all relevant articles for the variant of interest. Hence, to help scientists, healthcare professionals, and database curators find the most up-to-date published variant research, we have developed LitVar for the search and retrieval of standardized variant information. In addition, LitVar uses advanced text mining techniques to compute and extract relationships between variants and other associated entities such as diseases and chemicals/drugs. LitVar is publicly available at https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/LitVar.
Exploring the Repeated Name Penalty and the Overt Pronoun Penalty in Spanish

ERIC Educational Resources Information Center

Gelormini-Lezama, Carlos

2018-01-01

Anaphoric expressions such as repeated names, overt pronouns, and null pronouns serve a major role in the creation and maintenance of discourse coherence. The felicitous use of an anaphoric expression is highly dependent on the discourse salience of the entity introduced by the antecedent. Gordon et al. ("Cogn Sci" 17:311-347, 1993)…
Machine learning to parse breast pathology reports in Chinese.

PubMed

Tang, Rong; Ouyang, Lizhi; Li, Clara; He, Yue; Griffin, Molly; Taghian, Alphonse; Smith, Barbara; Yala, Adam; Barzilay, Regina; Hughes, Kevin

2018-06-01

Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages. We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital. Physicians with native Chinese proficiency reviewed the reports and annotated a variety of binary and numerical pathologic entities. After excluding 78 cases with a bilateral lesion in the same report, 1216 cases were used as a training set for the algorithm, which was then refined by 405 development cases. The Natural language processing algorithm was tested by using the remaining 405 cases to evaluate the machine learning outcome. The model was used to extract 13 binary entities and 8 numerical entities. When compared to physicians with native Chinese proficiency, the model showed a per-entity accuracy from 91 to 100% for all common diagnoses on the test set. The overall accuracy of binary entities was 98% and of numerical entities was 95%. In a per-report evaluation for binary entities with more than 100 training cases, 85% of all the testing reports were completely correct and 11% had an error in 1 out of 22 entities. We have demonstrated that Chinese breast pathology reports can be automatically parsed into structured data using standard machine learning approaches. The results of our study demonstrate that techniques effective in parsing English reports can be scaled to other languages.
Assessment of Orthographic Similarity of Drugs Names between Iran and Overseas Using the Solar Model

PubMed Central

ABOLHASSANI, Nazanin; AKBARI SARI, Ali; RASHIDIAN, Arash; RASTEGARPANAH, Mansoor

2017-01-01

Background: The recognition of patient safety is now occupying a prominent place on the health policy agenda since medical errors can result in adverse events. The existence of confusing drug names is one of the most common causes of medication errors. In Iran, the General Office of Trademarks Registry (GOTR), for four years (2010–2014) was responsible for approving drug proprietary names. This study aimed to investigate the performance of the GOTR in terms of drug names orthographic similarity using the SOLAR model. Methods: First, 100 names were randomly selected from the GOTR’s database. Then, each name was searched through pharmaceutical websites including Martindale (the Complete Drug Reference published by Pharmaceutical Press), Drugs.com and Medicines Complete. Pair of drugs whose names look orthographically similar with different indications were identified. Then, the SOLAR model was utilized to determine orthographic similarity between all pair of drug names. Results: The mean of match values of these 100 pairs of drug was 77% indicating the high risk of similarity. The match value for most of the reviewed pairs (92%) was high (≥66%). This value was medium (≥ 33% and <66%) just for 8% of the pairs of drug. These results indicate high risk of confusion due to similarity of drug names. Conclusion: The stewardship of the GOTR in patient safety considerations is fundamentally problematic. Thus, as a best practice, we recommend that proprietary names of drugs be evaluated by an entity within the health system. While an entity within the health system should address patient safety considerations, the GOTR is responsible for intellectual property rights. PMID:29259940
Proceedings of Conference on Variable-Resolution Modeling, Washington, DC, 5-6 May 1992

DTIC Science & Technology

1992-05-01

of powerful new computer architectures for supporting object-oriented computing. Objects, as self -contained data-code packages with orderly...another entity structure. For example, (copy-entstr e:sys- tcm ’ new -system) creates an entity structure named c:new-system that has the same structure...324 Parry, S-H. (1984): A Self -contained Hierarchical Model Construct. In: Systems Analysis and Modeling in Defense (R.K. Huber, Ed.), New York
Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

PubMed

Urbain, Jay

2015-12-01

We present the design, and analyze the performance of a multi-stage natural language processing system employing named entity recognition, Bayesian statistics, and rule logic to identify and characterize heart disease risk factor events in diabetic patients over time. The system was originally developed for the 2014 i2b2 Challenges in Natural Language in Clinical Data. The system's strengths included a high level of accuracy for identifying named entities associated with heart disease risk factor events. The system's primary weakness was due to inaccuracies when characterizing the attributes of some events. For example, determining the relative time of an event with respect to the record date, whether an event is attributable to the patient's history or the patient's family history, and differentiating between current and prior smoking status. We believe these inaccuracies were due in large part to the lack of an effective approach for integrating context into our event detection model. To address these inaccuracies, we explore the addition of a distributional semantic model for characterizing contextual evidence of heart disease risk factor events. Using this semantic model, we raise our initial 2014 i2b2 Challenges in Natural Language of Clinical data F1 score of 0.838 to 0.890 and increased precision by 10.3% without use of any lexicons that might bias our results. Copyright © 2015 Elsevier Inc. All rights reserved.
Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts

PubMed Central

Chun, Hong-Woo; Tsuruoka, Yoshimasa; Kim, Jin-Dong; Shiba, Rie; Nagata, Naoki; Hishiki, Teruyoshi; Tsujii, Jun'ichi

2006-01-01

Background Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of the results of this approach, we identified prostate cancer and gene terms with the ID tags of public biomedical databases. Moreover, considering that genetics experts will use our results, we classified them based on six topics that can be used to analyze the type of prostate cancers, genes, and their relations. Methods We developed a maximum entropy-based named entity recognizer and a relation recognizer and applied them to a corpus-based approach. We collected prostate cancer-related abstracts from MEDLINE, and constructed an annotated corpus of gene and prostate cancer relations based on six topics by biologists. We used it to train the maximum entropy-based named entity recognizer and relation recognizer. Results Topic-classified relation recognition achieved 92.1% precision for the relation (an increase of 11.0% from that obtained in a baseline experiment). For all topics, the precision was between 67.6 and 88.1%. Conclusion A series of experimental results revealed two important findings: a carefully designed relation recognition system using named entity recognition can improve the performance of relation recognition, and topic-classified relation recognition can be effectively addressed through a corpus-based approach using manual annotation and machine learning techniques. PMID:17134477
Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts.

PubMed

Chun, Hong-Woo; Tsuruoka, Yoshimasa; Kim, Jin-Dong; Shiba, Rie; Nagata, Naoki; Hishiki, Teruyoshi; Tsujii, Jun'ichi

2006-11-24

Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of the results of this approach, we identified prostate cancer and gene terms with the ID tags of public biomedical databases. Moreover, considering that genetics experts will use our results, we classified them based on six topics that can be used to analyze the type of prostate cancers, genes, and their relations. We developed a maximum entropy-based named entity recognizer and a relation recognizer and applied them to a corpus-based approach. We collected prostate cancer-related abstracts from MEDLINE, and constructed an annotated corpus of gene and prostate cancer relations based on six topics by biologists. We used it to train the maximum entropy-based named entity recognizer and relation recognizer. Topic-classified relation recognition achieved 92.1% precision for the relation (an increase of 11.0% from that obtained in a baseline experiment). For all topics, the precision was between 67.6 and 88.1%. A series of experimental results revealed two important findings: a carefully designed relation recognition system using named entity recognition can improve the performance of relation recognition, and topic-classified relation recognition can be effectively addressed through a corpus-based approach using manual annotation and machine learning techniques.
30 CFR 700.11 - Applicability.

Code of Federal Regulations, 2011 CFR

2011-07-01

.... Noncommercial use does not include the extraction of coal by one unit of an integrated company or other business or nonprofit entity which uses the coal in its own manufacturing or power plants; (2) The extraction... all coal exploration and surface coal mining and reclamation operations, except: (1) The extraction of...
30 CFR 700.11 - Applicability.

Code of Federal Regulations, 2010 CFR

2010-07-01

.... Noncommercial use does not include the extraction of coal by one unit of an integrated company or other business or nonprofit entity which uses the coal in its own manufacturing or power plants; (2) The extraction... all coal exploration and surface coal mining and reclamation operations, except: (1) The extraction of...
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

PubMed

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.

PubMed

Westergaard, David; Stærfeldt, Hans-Henrik; Tønsberg, Christian; Jensen, Lars Juhl; Brunak, Søren

2018-02-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

PubMed Central

Westergaard, David; Stærfeldt, Hans-Henrik

2018-01-01

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only. PMID:29447159
Building an Entity-Centric Stream Filtering Test Collection for TREC 2012

DTIC Science & Technology

2012-11-01

spikes correspond to events, such as James McCartney suggesting that the sons of The Beatles form “The Beatles -- the next generation.” 4. KBA Task...gslis_adaptive, gslis_mult: Initial queries consist of wikitext extracted from each entitys history . We impose a document prior favoring docs with high in
Search

EPA Pesticide Factsheets

2018-05-18

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
7 CFR 1466.3 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

...) means an individual, private-sector entity, or public agency certified by NRCS to provide technical...,” “land conservation committee,” “natural resource district,” or similar name. Conservation Innovation...

7 CFR 1466.3 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

...) means an individual, private-sector entity, or public agency certified by NRCS to provide technical...,” “land conservation committee,” “natural resource district,” or similar name. Conservation Innovation...
7 CFR 1466.3 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-01-01

...) means an individual, private-sector entity, or public agency certified by NRCS to provide technical...,” “land conservation committee,” “natural resource district,” or similar name. Conservation Innovation...
Data fusion in cyber security: first order entity extraction from common cyber data

NASA Astrophysics Data System (ADS)

Giacobe, Nicklaus A.

2012-06-01

The Joint Directors of Labs Data Fusion Process Model (JDL Model) provides a framework for how to handle sensor data to develop higher levels of inference in a complex environment. Beginning from a call to leverage data fusion techniques in intrusion detection, there have been a number of advances in the use of data fusion algorithms in this subdomain of cyber security. While it is tempting to jump directly to situation-level or threat-level refinement (levels 2 and 3) for more exciting inferences, a proper fusion process starts with lower levels of fusion in order to provide a basis for the higher fusion levels. The process begins with first order entity extraction, or the identification of important entities represented in the sensor data stream. Current cyber security operational tools and their associated data are explored for potential exploitation, identifying the first order entities that exist in the data and the properties of these entities that are described by the data. Cyber events that are represented in the data stream are added to the first order entities as their properties. This work explores typical cyber security data and the inferences that can be made at the lower fusion levels (0 and 1) with simple metrics. Depending on the types of events that are expected by the analyst, these relatively simple metrics can provide insight on their own, or could be used in fusion algorithms as a basis for higher levels of inference.
IGMS Overview

EPA Pesticide Factsheets

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
IGMS Model

EPA Pesticide Factsheets

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
KneeTex: an ontology-driven system for information extraction from MRI reports.

PubMed

Spasić, Irena; Zhao, Bo; Jones, Christopher B; Button, Kate

2015-01-01

In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexico-semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00 %, recall of 97.63 % and F-measure of 97.81 %, the values of which are in line with human-like performance. KneeTex is an open-source, stand-alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions.
Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning.

PubMed

Munkhdalai, Tsendsuren; Liu, Feifan; Yu, Hong

2018-04-25

Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community. ©Tsendsuren Munkhdalai, Feifan Liu, Hong Yu. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 25.04.2018.
Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning

PubMed Central

Munkhdalai, Tsendsuren; Liu, Feifan

2018-01-01

Background Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. Objective To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. Methods We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. Results Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. Conclusions It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community. PMID:29695376
Materials in Manufacturing and Packaging Systems as Sources of Elemental Impurities in Packaged Drug Products: A Literature Review.

PubMed

Jenke, Dennis R; Stults, Cheryl L M; Paskiet, Diane M; Ball, Douglas J; Nagao, Lee M

Elemental impurities in drug products can arise from a number of different sources and via a number of different means, including the active pharmaceutical ingredient, excipients, the vehicle, and leaching of elemental entities that are present in the drug product's manufacturing or packaging systems. Thus, knowledge about the presence, level, and likelihood of leaching of elemental entities in manufacturing and packaging systems is relevant to understanding how these systems contribute to a drug product's total elemental impurity burden. To that end, a joint team from the Extractables and Leachables Safety Information Exchange (ELSIE) Consortium and the International Pharmaceutical Aerosol Consortium on Regulation and Science (IPAC-RS) has conducted a review of the available literature on elemental entities in pharmaceutically relevant polymers and the presence of these elemental entities in material extracts and/or drug products. This review article contains the information compiled from the available body of literature and considers two questions: (1) What elemental entities are present in the relevant polymers and materials and at what levels are they present? (2) To what extent are these elemental entities leached from these materials under conditions relevant to the manufacturing and storage/distribution of solution drug products? Conclusions drawn from the compiled data are as follows: (1) Elemental entities are present in the materials used to construct packaging and manufacturing systems as these materials either contain these elemental entities as additives or are exposed to elemental entities during their production. (2) Unless the elemental entities are parts of the materials themselves (for example, SiO 2 in glass) or intentionally added to the materials (for example, metal stearates in polymers), their incidental amounts in the materials are generally low. (3) When elemental entities are present in materials and systems, generally only a very small fraction of the total available amount of the entity can be leached under conditions that are relevant to packaged drug products. Thus, while sources of certain elemental impurities may be ubiquitous in the natural environment, they are not ubiquitous in materials used in pharmaceutical packaging and manufacturing systems and when they are present, they are not extensively leached under relevant conditions. The information summarized here can be utilized to aid the elemental impurity risk assessment process by providing the identities of commonly reported elements and data to support probability estimates of those becoming elemental impurities in the drug product. Furthermore, recommendations are made related to establishing elements of potential product impact for individual materials. Extraneous impurities in drug products provide no therapeutic benefit and thus should be known and controlled. Elemental impurities can arise from a number of sources and by a number of means, including the leaching of elemental entities from drug product packaging and manufacturing systems. To understand the extent to which materials used in packaging systems contain elemental entities and the extent to which those entities leach into drug products to become elemental impurities, the Extractables and Leachables Safety Information Exchange (ELSIE) and International Pharmaceutical Aerosol Consortium on Regulation and Science (IPAC-RS) Consortia have jointly performed a literature review on this subject. Using the compiled information, it was concluded that while packaging materials may contain elemental entities, unless those entities are intentional parts of the materials, the amounts of those elemental entities are generally low. Furthermore, generally only a very small fraction of the total available amount of the entity can be leached under conditions that are relevant to packaged drug products. Thus, risk assessment of sources of elemental impurities in drug products that may be related to materials used in pharmaceutical packaging and manufacturing systems can utilize the information and recommendations presented here. © PDA, Inc. 2015.
A database of natural products and chemical entities from marine habitat

PubMed Central

Babu, Padavala Ajay; Puppala, Suma Sree; Aswini, Satyavarapu Lakshmi; Vani, Metta Ramya; Kumar, Chinta Narasimha; Prasanna, Tallapragada

2008-01-01

Marine compound database consists of marine natural products and chemical entities, collected from various literature sources, which are known to possess bioactivity against human diseases. The database is constructed using html code. The 12 categories of 182 compounds are provided with the source, compound name, 2-dimensional structure, bioactivity and clinical trial information. The database is freely available online and can be accessed at http://www.progenebio.in/mcdb/index.htm PMID:19238254
Information Retrieval and Text Mining Technologies for Chemistry.

PubMed

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

PubMed Central

Ahmed, Syed Toufeeq; Davulcu, Hasan; Tikves, Sukru; Nair, Radhika; Zhao, Zhongming

2012-01-01

Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease. PMID:22693501
Multi-Filter String Matching and Human-Centric Entity Matching for Information Extraction

ERIC Educational Resources Information Center

Sun, Chong

2012-01-01

More and more information is being generated in text documents, such as Web pages, emails and blogs. To effectively manage this unstructured information, one broadly used approach includes locating relevant content in documents, extracting structured information and integrating the extracted information for querying, mining or further analysis. In…
EPA Grants Information

EPA Pesticide Factsheets

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
Gene and protein nomenclature in public databases

PubMed Central

Fundel, Katrin; Zimmer, Ralf

2006-01-01

Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134
IGMS Construction Grants Overview

EPA Pesticide Factsheets

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
IGMS Search User Guide

EPA Pesticide Factsheets

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
Ontology-based reusable clinical document template production system.

PubMed

Nam, Sejin; Lee, Sungin; Kim, James G Boram; Kim, Hong-Gee

2012-01-01

Clinical documents embody professional clinical knowledge. This paper shows an effective clinical document template (CDT) production system that uses a clinical description entity (CDE) model, a CDE ontology, and a knowledge management system called STEP that manages ontology-based clinical description entities. The ontology represents CDEs and their inter-relations, and the STEP system stores and manages CDE ontology-based information regarding CDTs. The system also provides Web Services interfaces for search and reasoning over clinical entities. The system was populated with entities and relations extracted from 35 CDTs that were used in admission, discharge, and progress reports, as well as those used in nursing and operation functions. A clinical document template editor is shown that uses STEP.
77 FR 9021 - Regulations Relating to Information Reporting by Foreign Financial Institutions and Withholding...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-02-15

..., address, and taxpayer identifying number (TIN) of each account holder who is a specified U.S. person (or, in the case of an account holder that is a U.S. owned foreign entity, the name, address, and TIN of... that such beneficial owner does not have any substantial U.S. owners, or the name, address, and TIN of...
Rendering of Names of Corporate Bodies. Subject Analysis, With Special Reference to Social Sciences. Documentation Systems for Industry (8th Annual Seminar). Part 1: Papers.

ERIC Educational Resources Information Center

Documentation Research and Training Centre, Bangalore (India).

The four sections of the report cover the topics of cataloging, subject analysis, documentation systems for industry and the Documentation Research and Training Centre (DRTC) research report for 1970. The cataloging section covers the conflicts of cataloging, recall, corporate bodies, titles, publishers series and the entity name. The subject…

EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning.

PubMed

Zhao, Chao; Jiang, Jingchi; Guan, Yi; Guo, Xitong; He, Bin

2018-05-01

Electronic medical records (EMRs) contain medical knowledge that can be used for clinical decision support (CDS). Our objective is to develop a general system that can extract and represent knowledge contained in EMRs to support three CDS tasks-test recommendation, initial diagnosis, and treatment plan recommendation-given the condition of a patient. We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a record. Three bipartite subgraphs (bigraphs) were extracted from the EMKN, one to support each task. One part of the bigraph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bigraph was regarded as a Markov random field (MRF) to support the inference. We proposed three graph-based energy functions and three likelihood-based energy functions. Two of these functions are based on knowledge representation learning and can provide distributed representations of medical entities. Two EMR datasets and three metrics were utilized to evaluate the performance. As a whole, the evaluation results indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships with respect to knowledge level. Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require individually designed energy functions. Copyright © 2018 Elsevier B.V. All rights reserved.
46 CFR 515.34 - Regulated Persons Index.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Commission § 515.34 Regulated Persons Index. The Regulated Persons Index is a database containing the names...-regulated entities. The database may be purchased for $108 by contacting the Bureau of Certification and...
46 CFR 515.34 - Regulated Persons Index.

Code of Federal Regulations, 2013 CFR

2013-10-01

... Commission § 515.34 Regulated Persons Index. The Regulated Persons Index is a database containing the names...-regulated entities. The database may be purchased for $108 by contacting the Bureau of Certification and...
46 CFR 515.34 - Regulated Persons Index.

Code of Federal Regulations, 2014 CFR

2014-10-01

... Commission § 515.34 Regulated Persons Index. The Regulated Persons Index is a database containing the names...-regulated entities. The database may be purchased for $108 by contacting the Bureau of Certification and...
46 CFR 515.34 - Regulated Persons Index.

Code of Federal Regulations, 2012 CFR

2012-10-01

... Commission § 515.34 Regulated Persons Index. The Regulated Persons Index is a database containing the names...-regulated entities. The database may be purchased for $108 by contacting the Bureau of Certification and...
46 CFR 515.34 - Regulated Persons Index.

Code of Federal Regulations, 2011 CFR

2011-10-01

... Commission § 515.34 Regulated Persons Index. The Regulated Persons Index is a database containing the names...-regulated entities. The database may be purchased for $108 by contacting the Bureau of Certification and...
Search | IGMS | Envirofacts | US EPA

EPA Pesticide Factsheets

2016-02-23

The Integrated Grants Management System (IGMS) is a web-based system that contains information on the recipient of the grant, fellowship, cooperative agreement and interagency agreement, including the name of the entity accepting the award.
Searching for the elusive neural substrates of body part terms: a neuropsychological study.

PubMed

Kemmerer, David; Tranel, Daniel

2008-06-01

Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices and in the white matter underlying these regions (8 patients). Also, 1 patient with body part anomia had a left occipital lesion that included the "extrastriate body area" (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures.
Searching for the Elusive Neural Substrates of Body Part Terms: A Neuropsychological Study

PubMed Central

Kemmerer, David; Tranel, Daniel

2010-01-01

Previous neuropsychological studies suggest that, compared to other categories of concrete entities, lexical and conceptual aspects of body part knowledge are frequently spared in brain-damaged patients. To further investigate this issue, we administered a battery of 12 tests assessing lexical and conceptual aspects of body part knowledge to 104 brain-damaged patients with lesions distributed throughout the telencephalon. There were two main outcomes. First, impaired oral naming of body parts, attributable to a disturbance of the mapping between lexical-semantic and lexical-phonological structures, was most reliably and specifically associated with lesions in the left frontal opercular and anterior/inferior parietal opercular cortices, and in the white matter underlying these regions (8 patients). Also, one patient with body part anomia had a left occipital lesion that included the “extrastriate body area” (EBA). Second, knowledge of the meanings of body part terms was remarkably resistant to impairment, regardless of lesion site; in fact, we did not uncover a single patient who exhibited significantly impaired understanding of the meanings of these terms. In the 9 patients with body part anomia, oral naming of concrete entities was evaluated, and this revealed that 4 patients had disproportionately worse naming of body parts relative to other types of concrete entities. Taken together, these findings extend previous neuropsychological and functional neuroimaging studies of body part knowledge, and add to our growing understanding of the nuances of how different linguistic and conceptual categories are operated by left frontal and parietal structures. PMID:18608319
Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

PubMed Central

Song, Min

2016-01-01

In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications. PMID:27195695
Normalizing biomedical terms by minimizing ambiguity and variability

PubMed Central

Tsuruoka, Yoshimasa; McNaught, John; Ananiadou, Sophia

2008-01-01

Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known. PMID:18426547
TaggerOne: joint named entity recognition and normalization with semi-Markov Models

PubMed Central

Leaman, Robert; Lu, Zhiyong

2016-01-01

Motivation: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. Methods: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. Results: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. Availability and Implementation: The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone Contact: zhiyong.lu@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27283952
TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

PubMed

Leaman, Robert; Lu, Zhiyong

2016-09-15

Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization. We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput. We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance. The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone zhiyong.lu@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
Neural systems underlying lexical retrieval for sign language.

PubMed

Emmorey, Karen; Grabowski, Thomas; McCullough, Stephen; Damasio, Hanna; Ponto, Laura L B; Hichwa, Richard D; Bellugi, Ursula

2003-01-01

Positron emission tomography was used to investigate whether signed languages exhibit the same neural organization for lexical retrieval within classical and non-classical language areas as has been described for spoken English. Ten deaf native American sign language (ASL) signers were shown pictures of unique entities (famous persons) and non-unique entities (animals) and were asked to name each stimulus with an overt signed response. Proper name signed responses to famous people were fingerspelled, and common noun responses to animals were both fingerspelled and signed with native ASL signs. In general, retrieving ASL signs activated neural sites similar to those activated by hearing subjects retrieving English words. Naming famous persons activated the left temporal pole (TP), whereas naming animals (whether fingerspelled or signed) activated left inferotemporal (IT) cortex. The retrieval of fingerspelled and native signs generally engaged the same cortical regions, but fingerspelled signs in addition activated a premotor region, perhaps due to the increased motor planning and sequencing demanded by fingerspelling. Native signs activated portions of the left supramarginal gyrus (SMG), an area previously implicated in the retrieval of phonological features of ASL signs. Overall, the findings indicate that similar neuroanatomical areas are involved in lexical retrieval for both signs and words. Copyright 2003 Elsevier Science Ltd.
Semantic annotation of consumer health questions.

PubMed

Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina

2018-02-06

Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
Neutrophilic leukocyte membrane proteins. I. Isolation.

PubMed

Hawkins, D; Sauvé, M

1978-03-01

Rabbit exudate-derived PMN were homogenized and the cell membranes isolated on a two-phase aqueous system. Glycoproteins were extracted from cell membranes with lithium diiodosalicylate. SDS polyacrylamide gel electrophoretic analysis showed a consistent pattern of three major glycoprotein entities. Cells radioiodinated supravitally showed most of the radioactivity associated with larger glycoprotein entities whereas PMN membranes radiolabeled after isolation yielded a single major peak of radioactivity associated with a much smaller protein entity. Heterologous antisera against rabbit PMN, PMN membranes, and membrane glycoproteins were all cytotoxic for PMN in the presence of complement, and all bound to the PMN surface as demonstrated with immunocolloidal gold on electron microscopy. The data suggest that one or more glycoprotein entities are membrane-associated ectoglycoproteins which can be radiolabeled supravitally.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.

PubMed

Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A

2017-03-01

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

PubMed Central

Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A.

2017-01-01

Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. PMID:28328252
Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text

PubMed Central

Gan, Liang; Cheng, Mian; Wu, Quanyuan

2018-01-01

Online medical text is full of references to medical entities (MEs), which are valuable in many applications, including medical knowledge-based (KB) construction, decision support systems, and the treatment of diseases. However, the diverse and ambiguous nature of the surface forms gives rise to a great difficulty for ME identification. Many existing solutions have focused on supervised approaches, which are often task-dependent. In other words, applying them to different kinds of corpora or identifying new entity categories requires major effort in data annotation and feature definition. In this paper, we propose unMERL, an unsupervised framework for recognizing and linking medical entities mentioned in Chinese online medical text. For ME recognition, unMERL first exploits a knowledge-driven approach to extract candidate entities from free text. Then, the categories of the candidate entities are determined using a distributed semantic-based approach. For ME linking, we propose a collaborative inference approach which takes full advantage of heterogenous entity knowledge and unstructured information in KB. Experimental results on real corpora demonstrate significant benefits compared to recent approaches with respect to both ME recognition and linking. PMID:29849994
Disambiguating the species of biomedical named entities using natural language parsers

PubMed Central

Wang, Xinglong; Tsujii, Jun'ichi; Ananiadou, Sophia

2010-01-01

Motivation: Text mining technologies have been shown to reduce the laborious work involved in organizing the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts. This article reports on a comprehensive study on resolving the ambiguity in mentions of biomedical named entities with respect to model organisms and presents an array of approaches, with focus on methods utilizing natural language parsers. Results: We build a corpus for organism disambiguation where every occurrence of protein/gene entity is manually tagged with a species ID, and evaluate a number of methods on it. Promising results are obtained by training a machine learning model on syntactic parse trees, which is then used to decide whether an entity belongs to the model organism denoted by a neighbouring species-indicating word (e.g. yeast). The parser-based approaches are also compared with a supervised classification method and results indicate that the former are a more favorable choice when domain portability is of concern. The best overall performance is obtained by combining the strengths of syntactic features and supervised classification. Availability: The corpus and demo are available at http://www.nactem.ac.uk/deca_details/start.cgi, and the software is freely available as U-Compare components (Kano et al., 2009): NaCTeM Species Word Detector and NaCTeM Species Disambiguator. U-Compare is available at http://-compare.org/ Contact: xinglong.wang@manchester.ac.uk PMID:20053840

Autism, Context/Noncontext Information Processing, and Atypical Development

PubMed Central

Skoyles, John R.

2011-01-01

Autism has been attributed to a deficit in contextual information processing. Attempts to understand autism in terms of such a defect, however, do not include more recent computational work upon context. This work has identified that context information processing depends upon the extraction and use of the information hidden in higher-order (or indirect) associations. Higher-order associations underlie the cognition of context rather than that of situations. This paper starts by examining the differences between higher-order and first-order (or direct) associations. Higher-order associations link entities not directly (as with first-order ones) but indirectly through all the connections they have via other entities. Extracting this information requires the processing of past episodes as a totality. As a result, this extraction depends upon specialised extraction processes separate from cognition. This information is then consolidated. Due to this difference, the extraction/consolidation of higher-order information can be impaired whilst cognition remains intact. Although not directly impaired, cognition will be indirectly impaired by knock on effects such as cognition compensating for absent higher-order information with information extracted from first-order associations. This paper discusses the implications of this for the inflexible, literal/immediate, and inappropriate information processing of autistic individuals. PMID:22937255
Detection and categorization of bacteria habitats using shallow linguistic analysis

PubMed Central

2015-01-01

Background Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas. Methods We introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach. Results We participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%. Conclusions Our results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013. PMID:26201262
Knowledge environments representing molecular entities for the virtual physiological human.

PubMed

Hofmann-Apitius, Martin; Fluck, Juliane; Furlong, Laura; Fornes, Oriol; Kolárik, Corinna; Hanser, Susanne; Boeker, Martin; Schulz, Stefan; Sanz, Ferran; Klinger, Roman; Mevissen, Theo; Gattermayer, Tobias; Oliva, Baldo; Friedrich, Christoph M

2008-09-13

In essence, the virtual physiological human (VPH) is a multiscale representation of human physiology spanning from the molecular level via cellular processes and multicellular organization of tissues to complex organ function. The different scales of the VPH deal with different entities, relationships and processes, and in consequence the models used to describe and simulate biological functions vary significantly. Here, we describe methods and strategies to generate knowledge environments representing molecular entities that can be used for modelling the molecular scale of the VPH. Our strategy to generate knowledge environments representing molecular entities is based on the combination of information extraction from scientific text and the integration of information from biomolecular databases. We introduce @neuLink, a first prototype of an automatically generated, disease-specific knowledge environment combining biomolecular, chemical, genetic and medical information. Finally, we provide a perspective for the future implementation and use of knowledge environments representing molecular entities for the VPH.
A user-friendly tool for medical-related patent retrieval.

PubMed

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnyakova, Dina; Lovis, Christian; Ruch, Patrick

2012-01-01

Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports.
WHU at TREC KBA Vital Filtering Track 2014

DTIC Science & Technology

2014-11-01

view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information. Various kinds of features are leveraged to...profile of an entity. Our approach is to view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information
Challenges in Managing Information Extraction

ERIC Educational Resources Information Center

Shen, Warren H.

2009-01-01

This dissertation studies information extraction (IE), the problem of extracting structured information from unstructured data. Example IE tasks include extracting person names from news articles, product information from e-commerce Web pages, street addresses from emails, and names of emerging music bands from blogs. IE is all increasingly…
Identification of related gene/protein names based on an HMM of name variations.

PubMed

Yeganova, L; Smith, L; Wilbur, W J

2004-04-01

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.
New Paradigm Shift for the Green Synthesis of Antibacterial Silver Nanoparticles Utilizing Plant Extracts

PubMed Central

2014-01-01

This review covers general information regarding the green synthesis of antibacterial silver nanoparticles. Owing to their antibacterial properties, silver nanoparticles are widely used in many areas, especially biomedical applications. In green synthesis practices, the chemical reducing agents are eliminated, and biological entities are utilized to convert silver ions to silver nanoparticles. Among the various biological entities, natural plant extracts have emerged as green reducing agents, providing eco-friendly routes for the preparation of silver nanomaterials. The most obvious merits of green synthesis are the increased biocompatibility of the resulting silver nanoparticles and the ease with which the reaction can be carried out. This review summarizes some of the plant extracts that are used to produce antibacterial silver nanoparticles. Additionally, background information regarding the green synthesis and antibacterial activity of silver nanoparticles is provided. Finally, the toxicological aspects of silver nanoparticles are briefly mentioned. PMID:25343010
Stochastic Process Creation

NASA Astrophysics Data System (ADS)

Esparza, Javier

In many areas of computer science entities can “reproduce”, “replicate”, or “create new instances”. Paramount examples are threads in multithreaded programs, processes in operating systems, and computer viruses, but many others exist: procedure calls create new incarnations of the callees, web crawlers discover new pages to be explored (and so “create” new tasks), divide-and-conquer procedures split a problem into subproblems, and leaves of tree-based data structures become internal nodes with children. For lack of a better name, I use the generic term systems with process creation to refer to all these entities.
Named Entity Recognition in a Hungarian NL Based QA System

NASA Astrophysics Data System (ADS)

Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor

In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.
Exploring Contextual Models in Chemical Patent Search

NASA Astrophysics Data System (ADS)

Urbain, Jay; Frieder, Ophir

We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.
A Three Dimensional Electronic Retina Architecture.

DTIC Science & Technology

1987-12-01

not guarantee that a biological entity is in fact the best design because of the unique constraining factors of a biological organism and the associated...4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) AFIT/GCS/ENG/87D-23 6a. NAME OF PERFORMING ORGANIZATION 6b...OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION (If applicable) School of Engineering AFIT/ENG 6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS
Cybersecurity: Utilizing Fusion Centers to Protect State, Local, Tribal, and Territorial Entities Against Cyber Threats

DTIC Science & Technology

2016-09-01

PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Postgraduate School Monterey, CA 93943-5000 8. PERFORMING ORGANIZATION REPORT NUMBER 9...state- and local-level computer networks fertile ground for the cyber adversary. This research focuses on the threat to SLTT computer networks and how...institutions, and banking systems. The array of responsibilities and the cybersecurity threat landscape make state- and local-level computer networks fertile
Training the max-margin sequence model with the relaxed slack variables.

PubMed

Niu, Lingfeng; Wu, Jianmin; Shi, Yong

2012-09-01

Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method. Copyright © 2012 Elsevier Ltd. All rights reserved.
Graph Learning in Knowledge Bases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goldberg, Sean; Wang, Daisy Zhe

The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text annotations in a database. The current state-of-theart structured prediction methods, however, are likely to contain errors and it’s important to be able to manage the overall uncertainty of the database. On the other hand, the advent of crowdsourcing has enabled humans to aid machine algorithms at scale. As part of this project we introduced pi-CASTLE , a system that optimizes and integrates human and machine computing as applied to a complex structured prediction problem involving conditional random fieldsmore » (CRFs). We proposed strategies grounded in information theory to select a token subset, formulate questions for the crowd to label, and integrate these labelings back into the database using a method of constrained inference. On both a text segmentation task over academic citations and a named entity recognition task over tweets we showed an order of magnitude improvement in accuracy gain over baseline methods.« less
Discovering Peripheral Arterial Disease Cases from Radiology Notes Using Natural Language Processing

PubMed Central

Savova, Guergana K.; Fan, Jin; Ye, Zi; Murphy, Sean P.; Zheng, Jiaping; Chute, Christopher G.; Kullo, Iftikhar J.

2010-01-01

As part of the Electronic Medical Records and Genomics Network, we applied, extended and evaluated an open source clinical Natural Language Processing system, Mayo’s Clinical Text Analysis and Knowledge Extraction System, for the discovery of peripheral arterial disease cases from radiology reports. The manually created gold standard consisted of 223 positive, 19 negative, 63 probable and 150 unknown cases. Overall accuracy agreement between the system and the gold standard was 0.93 as compared to a named entity recognition baseline of 0.46. Sensitivity for the positive, probable and unknown cases was 0.93–0.96, and for the negative cases was 0.72. Specificity and negative predictive value for all categories were in the 90’s. The positive predictive value for the positive and unknown categories was in the high 90’s, for the negative category was 0.84, and for the probable category was 0.63. We outline the main sources of errors and suggest improvements. PMID:21347073
Mesh Oriented datABase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tautges, Timothy J.

MOAB is a component for representing and evaluating mesh data. MOAB can store stuctured and unstructured mesh, consisting of elements in the finite element "zoo". The functional interface to MOAB is simple yet powerful, allowing the representation of many types of metadata commonly found on the mesh. MOAB is optimized for efficiency in space and time, based on access to mesh in chunks rather than through individual entities, while also versatile enough to support individual entity access. The MOAB data model consists of a mesh interface instance, mesh entities (vertices and elements), sets, and tags. Entities are addressed through handlesmore » rather than pointers, to allow the underlying representation of an entity to change without changing the handle to that entity. Sets are arbitrary groupings of mesh entities and other sets. Sets also support parent/child relationships as a relation distinct from sets containing other sets. The directed-graph provided by set parent/child relationships is useful for modeling topological relations from a geometric model or other metadata. Tags are named data which can be assigned to the mesh as a whole, individual entities, or sets. Tags are a mechanism for attaching data to individual entities and sets are a mechanism for describing relations between entities; the combination of these two mechanisms isa powerful yet simple interface for representing metadata or application-specific data. For example, sets and tags can be used together to describe geometric topology, boundary condition, and inter-processor interface groupings in a mesh. MOAB is used in several ways in various applications. MOAB serves as the underlying mesh data representation in the VERDE mesh verification code. MOAB can also be used as a mesh input mechanism, using mesh readers induded with MOAB, or as a tanslator between mesh formats, using readers and writers included with MOAB.« less
A crowdsourcing workflow for extracting chemical-induced disease relations from free text

PubMed Central

Li, Tong Shu; Bravo, Àlex; Furlong, Laura I.; Good, Benjamin M.; Su, Andrew I.

2016-01-01

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex PMID:27087308
Taking on Nationalism in the Name of Intercultural Competence

ERIC Educational Resources Information Center

Meadows, Bryan

2010-01-01

Nationalism presents significant challenges to intercultural competence instruction. On the one hand, nationalism promotes the compartmentalization of communities into mutually-exclusive and discretely-defined nationalist entities. In complementary fashion, nationalism also advocates the homogenization of cultural and linguistic practices within…
20 CFR 627.215 - Relocation.

Code of Federal Regulations, 2010 CFR

2010-04-01

... the original location. (b) For 120 days after the commencement or the expansion of commercial... original location. (c) For the purposes of this section, relocating establishment means a business entity... review should include names under which the establishment does business, including successors-in-interest...

Evaluation of Fly Ash Quality Control Tools

DOT National Transportation Integrated Search

2010-06-30

Many entities currently use fly ash in portland cement concrete (PCC) pavements and structures. Although the body of knowledge is : great concerning the use of fly ash, several projects per year are subject to poor performance where fly ash is named ...
Evaluation of fly ash quality control tools.

DOT National Transportation Integrated Search

2010-06-30

Many entities currently use fly ash in portland cement concrete (PCC) pavements and structures. Although the body of knowledge is : great concerning the use of fly ash, several projects per year are subject to poor performance where fly ash is named ...
Event-based text mining for biology and functional genomics

PubMed Central

Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

2015-01-01

The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
The J-Staff System, Network Synchronisation and Noise

DTIC Science & Technology

2014-06-01

GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S... work . A key challenge of such structures is their tendency to fall into extreme dynamical modes. One is a ‘two-speed’ mode, where units interacting with...longer term planning, led by the J5 Planning Branch, fall into a slow cycle of work , while those entities interacting predominately with operations
Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition.

PubMed

Jauregi Unanue, Iñigo; Zare Borzeshi, Ehsan; Piccardi, Massimo

2017-12-01

Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text "feature engineering" and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word "embeddings". (i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets. Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models. We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DrugBank and MedLine, but not in the i2b2/VA dataset. We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary. Copyright © 2017 Elsevier Inc. All rights reserved.
Information Extraction Using Controlled English to Support Knowledge-Sharing and Decision-Making

DTIC Science & Technology

2012-06-01

or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that enable forces...terminology or language variants. CE-based information extraction will greatly facilitate the processes in the cognitive and social domains that...processor is run to turn the atomic CE into a more “ stylistically felicitous” CE, using techniques such as: aggregating all information about an entity
45 CFR 164.508 - Uses and disclosures for which an authorization is required.

Code of Federal Regulations, 2011 CFR

2011-10-01

... is in the form of: (A) A face-to-face communication made by a covered entity to an individual; or (B... meaningful fashion. (ii) The name or other specific identification of the person(s), or class of persons...
45 CFR 164.508 - Uses and disclosures for which an authorization is required.

Code of Federal Regulations, 2010 CFR

2010-10-01

... is in the form of: (A) A face-to-face communication made by a covered entity to an individual; or (B... meaningful fashion. (ii) The name or other specific identification of the person(s), or class of persons...
Information Tailoring Enhancements for Large-Scale Social Data

DTIC Science & Technology

2016-06-15

Intelligent Automation Incorporated Information Tailoring Enhancements for Large-Scale... Automation Incorporated Progress Report No. 3 Information Tailoring Enhancements for Large-Scale Social Data Submitted in accordance with...1 Work Performed within This Reporting Period .................................................... 2 1.1 Enhanced Named Entity Recognition (NER
On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.

PubMed

Oronoz, Maite; Gojenola, Koldo; Pérez, Alicia; de Ilarraza, Arantza Díaz; Casillas, Arantza

2015-08-01

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning. Copyright © 2015 Elsevier Inc. All rights reserved.
Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS).

PubMed

Xu, Rong; Wang, Quanqiu

2014-02-01

Targeted drugs dramatically improve the treatment outcomes in cancer patients; however, these innovative drugs are often associated with unexpectedly high cardiovascular toxicity. Currently, cardiovascular safety represents both a challenging issue for drug developers, regulators, researchers, and clinicians and a concern for patients. While FDA drug labels have captured many of these events, spontaneous reporting systems are a main source for post-marketing drug safety surveillance in 'real-world' (outside of clinical trials) cancer patients. In this study, we present approaches to extracting, prioritizing, filtering, and confirming cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS). The dataset includes records of 4,285,097 patients from FAERS. We first extracted drug-cardiovascular event (drug-CV) pairs from FAERS through named entity recognition and mapping processes. We then compared six ranking algorithms in prioritizing true positive signals among extracted pairs using known drug-CV pairs derived from FDA drug labels. We also developed three filtering algorithms to further improve precision. Finally, we manually validated extracted drug-CV pairs using 21 million published MEDLINE records. We extracted a total of 11,173 drug-CV pairs from FAERS. We showed that ranking by frequency is significantly more effective than by the five standard signal detection methods (246% improvement in precision for top-ranked pairs). The filtering algorithm we developed further improved overall precision by 91.3%. By manual curation using literature evidence, we show that about 51.9% of the 617 drug-CV pairs that appeared in both FAERS and MEDLINE sentences are true positives. In addition, 80.6% of these positive pairs have not been captured by FDA drug labeling. The unique drug-CV association dataset that we created based on FAERS could facilitate our understanding and prediction of cardiotoxic events associated with targeted cancer drugs. Copyright © 2013 Elsevier Inc. All rights reserved.
Super-hydrophobicity fundamentals: implications to biofouling prevention.

PubMed

Marmur, Abraham

2006-01-01

The theory of wetting on super-hydrophobic surfaces is presented and discussed, within the general framework of equilibrium wetting and contact angles. Emphasis is put on the implications of super-hydrophobicity to the prevention of biofouling. Two main lines of thought are discussed, viz. i) "mirror imaging" of the Lotus effect, namely designing a surface that repels biological entities by being super-hydrophilic, and ii) designing a surface that minimises the water-wetted area when submerged in water (by keeping an air film between the water and the surface), so that the suspended biological entities have a low probability of encountering the solid surface.
Taking the fifth amendment in Turing's imitation game

NASA Astrophysics Data System (ADS)

Warwick, Kevin; Shah, Huma

2017-03-01

In this paper, we look at a specific issue with practical Turing tests, namely the right of the machine to remain silent during interrogation. In particular, we consider the possibility of a machine passing the Turing test simply by not saying anything. We include a number of transcripts from practical Turing tests in which silence has actually occurred on the part of a hidden entity. Each of the transcripts considered here resulted in a judge being unable to make the 'right identification', i.e., they could not say for certain which hidden entity was the machine.
NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.

PubMed

Tsai, Richard Tzong-Han; Sung, Cheng-Lung; Dai, Hong-Jie; Hung, Hsieh-Chuan; Sung, Ting-Yi; Hsu, Wen-Lian

2006-12-18

Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing. To develop our ML-based Bio-NER system, we employ conditional random fields, which have performed effectively in several well-known tasks, as our underlying ML model. Adding selected conjunction features, applying numerical normalization, and employing pattern-based post-processing improve the F-scores by 1.67%, 1.04%, and 0.57%, respectively. The combined increase of 3.28% yields a total score of 72.98%, which is better than the baseline system that only uses singleton features. We demonstrate the benefits of using the sequential forward search algorithm to select effective conjunction feature groups. In addition, we show that numerical normalization can effectively reduce the number of redundant and unseen features. Furthermore, the Smith-Waterman local alignment algorithm can help ML-based Bio-NER deal with difficult cases that need longer context windows.
Motivation and Organizational Principles for Anatomical Knowledge Representation

PubMed Central

Rosse, Cornelius; Mejino, José L.; Modayur, Bharath R.; Jakobovits, Rex; Hinshaw, Kevin P.; Brinkley, James F.

1998-01-01

Abstract Objective: Conceptualization of the physical objects and spaces that constitute the human body at the macroscopic level of organization, specified as a machine-parseable ontology that, in its human-readable form, is comprehensible to both expert and novice users of anatomical information. Design: Conceived as an anatomical enhancement of the UMLS Semantic Network and Metathesaurus, the anatomical ontology was formulated by specifying defining attributes and differentia for classes and subclasses of physical anatomical entities based on their partitive and spatial relationships. The validity of the classification was assessed by instantiating the ontology for the thorax. Several transitive relationships were used for symbolically modeling aspects of the physical organization of the thorax. Results: By declaring Organ as the macroscopic organizational unit of the body, and defining the entities that constitute organs and higher level entities constituted by organs, all anatomical entities could be assigned to one of three top level classes (Anatomical structure, Anatomical spatial entity and Body substance). The ontology accommodates both the systemic and regional (topographical) views of anatomy, as well as diverse clinical naming conventions of anatomical entities. Conclusions: The ontology formulated for the thorax is extendible to microscopic and cellular levels, as well as to other body parts, in that its classes subsume essentially all anatomical entities that constitute the body. Explicit definitions of these entities and their relationships provide the first requirement for standards in anatomical concept representation. Conceived from an anatomical viewpoint, the ontology can be generalized and mapped to other biomedical domains and problem solving tasks that require anatomical knowledge. PMID:9452983
Type specimens and basic principles of avian taxonomy

USGS Publications Warehouse

Banks, Richard C.; Goodman, Steven M.; Lanyon, Scott M.; Schulenberg, Thomas S.

1993-01-01

"Ornithology" may be defined as the scientific study of birds. No aspect of avian biology, including management and conservation, can be carried out without reference by name to birds at some taxonomic level. Thus, the names of species of birds, and of groups of species, can fairly be considered to be of primary importance in ornithology. To be useful, these names themselves must be defined and related to biological entities. The definition of a name is accomplished by the designation of a "type." The International Code of Zoological Nomenclature, in paragraph (C) of Article 72 (third edition, 1985), establishes criteria for eligibility of a name-bearing type. The type of a species or sub-species name is the biological specimen defined by the name, and later use of the name implies specific or subspecific identity with the type. It is imperative, therefore, that a type be available for study and comparison so that the identity of other material with it can be established.
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track

PubMed Central

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users—learning BEL, working with a completely new interface, and performing complex curation—a score so close to the overall SUS average highlights the usability of BELIEF. Database URL: BELIEF is available at http://www.scaiview.com/belief/ PMID:27694210
BioNLP Shared Task--The Bacteria Track.

PubMed

Bossy, Robert; Jourde, Julien; Manine, Alain-Pierre; Veber, Philippe; Alphonse, Erick; van de Guchte, Maarten; Bessières, Philippe; Nédellec, Claire

2012-06-26

We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found common trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence.
Deep Question Answering for protein annotation

PubMed Central

Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

2015-01-01

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ PMID:26384372
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.

PubMed

Madan, Sumit; Hodapp, Sven; Senger, Philipp; Ansari, Sam; Szostak, Justyna; Hoeng, Julia; Peitsch, Manuel; Fluck, Juliane

2016-01-01

Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/. © The Author(s) 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

MememxGATE: Unearthing Latent Content Features for Improved Search and Relevancy Ranking Across Scientific Literature

NASA Astrophysics Data System (ADS)

Wilson, B. D.; McGibbney, L. J.; Mattmann, C. A.; Ramirez, P.; Joyce, M.; Whitehall, K. D.

2015-12-01

Quantifying scientific relevancy is of increasing importance to NASA and the research community. Scientific relevancy may be defined by mapping the impacts of a particular NASA mission, instrument, and/or retrieved variables to disciplines such as climate predictions, natural hazards detection and mitigation processes, education, and scientific discoveries. Related to relevancy, is the ability to expose data with similar attributes. This in turn depends upon the ability for us to extract latent, implicit document features from scientific data and resources and make them explicit, accessible and useable for search activities amongst others. This paper presents MemexGATE; a server side application, command line interface and computing environment for running large scale metadata extraction, general architecture text engineering, document classification and indexing tasks over document resources such as social media streams, scientific literature archives, legal documentation, etc. This work builds on existing experiences using MemexGATE (funded, developed and validated through the DARPA Memex Progrjam PI Mattmann) for extracting and leveraging latent content features from document resources within the Materials Research domain. We extend the software functionality capability to the domain of scientific literature with emphasis on the expansion of gazetteer lists, named entity rules, natural language construct labeling (e.g. synonym, antonym, hyponym, etc.) efforts to enable extraction of latent content features from data hosted by wide variety of scientific literature vendors (AGU Meeting Abstract Database, Springer, Wiley Online, Elsevier, etc.) hosting earth science literature. Such literature makes both implicit and explicit references to NASA datasets and relationships between such concepts stored across EOSDIS DAAC's hence we envisage that a significant part of this effort will also include development and understanding of relevancy signals which can ultimately be utilized for improved search and relevancy ranking across scientific literature.
Deep Question Answering for protein annotation.

PubMed

Gobeill, Julien; Gaudinat, Arnaud; Pasche, Emilie; Vishnyakova, Dina; Gaudet, Pascale; Bairoch, Amos; Ruch, Patrick

2015-01-01

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/. © The Author(s) 2015. Published by Oxford University Press.
DOD’s POW/MIA Mission: Top-Level Leadership Attention Needed to Resolve Longstanding Challenges in Accounting for Missing Persons from Past Conflicts

DTIC Science & Technology

2013-07-01

ELEMENT NUMBER 6. AUTHOR (S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) U.S...persons for whom DOD must account. A committee report accompanying the National Defense Authorization Act for Fiscal Year 2013 mandated GAO to...many organizations and each reports through a different line of authority . Thus, no single entity is responsible for communitywide personnel and
Renal effects of Mammea africana Sabine (Guttiferae) stem bark methanol/methylene chloride extract on L-NAME hypertensive rats.

PubMed

Nguelefack-Mbuyo, Elvine Pami; Dimo, Théophile; Nguelefack, Télesphore Benoit; Dongmo, Alain Bertrand; Kamtchouing, Pierre; Kamanyi, Albert

2010-08-01

The present study aims at evaluating the effects of methanol/methylene chloride extract of the stem bark of Mammea africana on the renal function of L-NAME treated rats. Normotensive male Wistar rats were divided into five groups respectively treated with distilled water, L-NAME (40 mg/kg/day), L-NAME + L-arginine (100 mg/kg/day), L-NAME + captopril (20 mg/kg/day) or L-NAME + M. africana extract (200 mg/kg/day) for 30 days. Systolic blood pressure was measured before and at the end of treatment. Body weight was measured at the end of each week. Urine was collected 6 and 24 h after the first administration and further on day 15 and 30 of treatment for creatinine, sodium and potassium quantification, while plasma was collected at the end of treatment for the creatinine assay. ANOVA two way followed by Bonferonni or one way followed by Tukey were used for statistical analysis. M. africana successfully prevented the rise in blood pressure and the acute natriuresis and diuresis induced by L-NAME. When given chronically, the extract produced a sustained antinatriuretic effect, a non-significant increase in urine excretion and reduced the glomerular hyperfiltration induced by L-NAME. The above results suggest that the methanol/methylene chloride extract of the stem bark of M. africana may protect kidney against renal dysfunction and further demonstrate that its antihypertensive effect does not depend on a diuretic or natriuretic activity.
SKIMMR: facilitating knowledge discovery in life sciences by machine-aided skim reading

PubMed Central

Burns, Gully A.P.C.

2014-01-01

Background. Unlike full reading, ‘skim-reading’ involves the process of looking quickly over information in an attempt to cover more material whilst still being able to retain a superficial view of the underlying content. Within this work, we specifically emulate this natural human activity by providing a dynamic graph-based view of entities automatically extracted from text. For the extraction, we use shallow parsing, co-occurrence analysis and semantic similarity computation techniques. Our main motivation is to assist biomedical researchers and clinicians in coping with increasingly large amounts of potentially relevant articles that are being published ongoingly in life sciences. Methods. To construct the high-level network overview of articles, we extract weighted binary statements from the text. We consider two types of these statements, co-occurrence and similarity, both organised in the same distributional representation (i.e., in a vector-space model). For the co-occurrence weights, we use point-wise mutual information that indicates the degree of non-random association between two co-occurring entities. For computing the similarity statement weights, we use cosine distance based on the relevant co-occurrence vectors. These statements are used to build fuzzy indices of terms, statements and provenance article identifiers, which support fuzzy querying and subsequent result ranking. These indexing and querying processes are then used to construct a graph-based interface for searching and browsing entity networks extracted from articles, as well as articles relevant to the networks being browsed. Last but not least, we describe a methodology for automated experimental evaluation of the presented approach. The method uses formal comparison of the graphs generated by our tool to relevant gold standards based on manually curated PubMed, TREC challenge and MeSH data. Results. We provide a web-based prototype (called ‘SKIMMR’) that generates a network of inter-related entities from a set of documents which a user may explore through our interface. When a particular area of the entity network looks interesting to a user, the tool displays the documents that are the most relevant to those entities of interest currently shown in the network. We present this as a methodology for browsing a collection of research articles. To illustrate the practical applicability of SKIMMR, we present examples of its use in the domains of Spinal Muscular Atrophy and Parkinson’s Disease. Finally, we report on the results of experimental evaluation using the two domains and one additional dataset based on the TREC challenge. The results show how the presented method for machine-aided skim reading outperforms tools like PubMed regarding focused browsing and informativeness of the browsing context. PMID:25097821
To the Question about the Quality of Economic Education

ERIC Educational Resources Information Center

Dyshaeva, Lyudmila

2015-01-01

The article discusses the shortcomings of the methodology of neoclassical theory as a basic theory determining the content of contemporary economic theory course at Russian educational institutions namely unrealistic conditions of perfect competition, rationality of economic behavior of business entities, completeness and authenticity of…
47 CFR 27.1170 - Payment Issues.

Code of Federal Regulations, 2010 CFR

2010-10-01

... is required to file a notice containing site-specific data with the clearinghouse. The notice... name of the transmitting base station, the geographic coordinates corresponding to that base station... the site-data filing requirement by submitting a copy of their PCN to the clearinghouse. AWS entities...
78 FR 52553 - Privacy Act of 1974; Department of Homeland Security/ALL-035 Common Entity Index Prototype System...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-23

... data elements: Full Name; Alias(es); Gender; Date of Birth; Country of Birth; Country of Citizenship... locked drawer behind a locked door. The records may be stored on magnetic disc, tape, or digital media...
PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction.

PubMed

Krallinger, Martin; Rodriguez-Penagos, Carlos; Tendulkar, Ashish; Valencia, Alfonso

2009-07-01

There is an increasing interest in using literature mining techniques to complement information extracted from annotation databases or generated by bioinformatics applications. Here we present PLAN2L, a web-based online search system that integrates text mining and information extraction techniques to access systematically information useful for analyzing genetic, cellular and molecular aspects of the plant model organism Arabidopsis thaliana. Our system facilitates a more efficient retrieval of information relevant to heterogeneous biological topics, from implications in biological relationships at the level of protein interactions and gene regulation, to sub-cellular locations of gene products and associations to cellular and developmental processes, i.e. cell cycle, flowering, root, leaf and seed development. Beyond single entities, also predefined pairs of entities can be provided as queries for which literature-derived relations together with textual evidences are returned. PLAN2L does not require registration and is freely accessible at http://zope.bioinfo.cnio.es/plan2l.
DISCRN: A Distributed Storytelling Framework for Intelligence Analysis.

PubMed

Shukla, Manu; Dos Santos, Raimundo; Chen, Feng; Lu, Chang-Tien

2017-09-01

Storytelling connects entities (people, organizations) using their observed relationships to establish meaningful storylines. This can be extended to spatiotemporal storytelling that incorporates locations, time, and graph computations to enhance coherence and meaning. But when performed sequentially these computations become a bottleneck because the massive number of entities make space and time complexity untenable. This article presents DISCRN, or distributed spatiotemporal ConceptSearch-based storytelling, a distributed framework for performing spatiotemporal storytelling. The framework extracts entities from microblogs and event data, and links these entities using a novel ConceptSearch to derive storylines in a distributed fashion utilizing key-value pair paradigm. Performing these operations at scale allows deeper and broader analysis of storylines. The novel parallelization techniques speed up the generation and filtering of storylines on massive datasets. Experiments with microblog posts such as Twitter data and Global Database of Events, Language, and Tone events show the efficiency of the techniques in DISCRN.
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations

PubMed Central

Miyao, Yusuke; Collier, Nigel

2017-01-01

Background Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. Objective This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. Methods We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. Results The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. Conclusions We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. PMID:28468748
TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.

PubMed

Alvaro, Nestor; Miyao, Yusuke; Collier, Nigel

2017-05-03

Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP. ©Nestor Alvaro, Yusuke Miyao, Nigel Collier. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.05.2017.
5 CFR 581.203 - Information minimally required to accompany legal process.

Code of Federal Regulations, 2014 CFR

2014-01-01

... accompany legal process. 581.203 Section 581.203 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... Process § 581.203 Information minimally required to accompany legal process. (a) Sufficient identifying information must accompany the legal process in order to enable processing by the governmental entity named...
5 CFR 581.203 - Information minimally required to accompany legal process.

Code of Federal Regulations, 2011 CFR

2011-01-01

... accompany legal process. 581.203 Section 581.203 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... Process § 581.203 Information minimally required to accompany legal process. (a) Sufficient identifying information must accompany the legal process in order to enable processing by the governmental entity named...
5 CFR 581.203 - Information minimally required to accompany legal process.

Code of Federal Regulations, 2013 CFR

2013-01-01

... accompany legal process. 581.203 Section 581.203 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... Process § 581.203 Information minimally required to accompany legal process. (a) Sufficient identifying information must accompany the legal process in order to enable processing by the governmental entity named...
5 CFR 581.203 - Information minimally required to accompany legal process.

Code of Federal Regulations, 2012 CFR

2012-01-01

... accompany legal process. 581.203 Section 581.203 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... Process § 581.203 Information minimally required to accompany legal process. (a) Sufficient identifying information must accompany the legal process in order to enable processing by the governmental entity named...
5 CFR 581.203 - Information minimally required to accompany legal process.

Code of Federal Regulations, 2010 CFR

2010-01-01

... accompany legal process. 581.203 Section 581.203 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT... Process § 581.203 Information minimally required to accompany legal process. (a) Sufficient identifying information must accompany the legal process in order to enable processing by the governmental entity named...
25 CFR 141.23 - Posted statement of ownership.

Code of Federal Regulations, 2010 CFR

2010-04-01

... legible to customers stating the form of the business entity, the names and addresses of all other... Indians BUREAU OF INDIAN AFFAIRS, DEPARTMENT OF THE INTERIOR FINANCIAL ACTIVITIES BUSINESS PRACTICES ON THE NAVAJO, HOPI AND ZUNI RESERVATIONS General Business Practices § 141.23 Posted statement of...
78 FR 50396 - Common Format for Federal Entity Transition Plans

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-19

..., Associate Administrator, Office of Spectrum Management. [FR Doc. 2013-20149 Filed 8-16-13; 8:45 am] BILLING..., Office of Spectrum Management. Each commenter should include the name of the person or organization... Spectrum Management, National Telecommunications and Information Administration, U.S. Department of...
BVDV: Detection, Risk Management and Control

USDA-ARS?s Scientific Manuscript database

The terms bovine viral diarrhea (BVD) and bovine viral diarrhea viruses (BVDV) are difficult to define in simple straightforward statements because both are umbrella terms covering a wide range of observations and entities. While diarrhea is in the name, BVD, it is used in reference to a number of ...

Relation extraction for biological pathway construction using node2vec.

PubMed

Kim, Munui; Baek, Seung Han; Song, Min

2018-06-13

Systems biology is an important field for understanding whole biological mechanisms composed of interactions between biological components. One approach for understanding complex and diverse mechanisms is to analyze biological pathways. However, because these pathways consist of important interactions and information on these interactions is disseminated in a large number of biomedical reports, text-mining techniques are essential for extracting these relationships automatically. In this study, we applied node2vec, an algorithmic framework for feature learning in networks, for relationship extraction. To this end, we extracted genes from paper abstracts using pkde4j, a text-mining tool for detecting entities and relationships. Using the extracted genes, a co-occurrence network was constructed and node2vec was used with the network to generate a latent representation. To demonstrate the efficacy of node2vec in extracting relationships between genes, performance was evaluated for gene-gene interactions involved in a type 2 diabetes pathway. Moreover, we compared the results of node2vec to those of baseline methods such as co-occurrence and DeepWalk. Node2vec outperformed existing methods in detecting relationships in the type 2 diabetes pathway, demonstrating that this method is appropriate for capturing the relatedness between pairs of biological entities involved in biological pathways. The results demonstrated that node2vec is useful for automatic pathway construction.
Construction of a database for published phase II/III drug intervention clinical trials for the period 2009-2014 comprising 2,326 records, 90 disease categories, and 939 drug entities.

PubMed

Jeong, Sohyun; Han, Nayoung; Choi, Boyoon; Sohn, Minji; Song, Yun-Kyoung; Chung, Myeon-Woo; Na, Han-Sung; Ji, Eunhee; Kim, Hyunah; Rhew, Ki Yon; Kim, Therasa; Kim, In-Wha; Oh, Jung Mi

2016-06-01

To construct a database of published clinical drug trials suitable for use 1) as a research tool in accessing clinical trial information and 2) in evidence-based decision-making by regulatory professionals, clinical research investigators, and medical practitioners. Comprehensive information obtained from a search of design elements and results of clinical trials in peer reviewed journals using PubMed (http://www.ncbi.nlm.ih.gov/pubmed). The methodology to develop a structured database was devised by a panel composed of experts in medical, pharmaceutical, information technology, and members of Ministry of Food and Drug Safety (MFDS) using a step by step approach. A double-sided system consisting of user mode and manager mode served as the framework for the database; elements of interest from each trial were entered via secure manager mode enabling the input information to be accessed in a user-friendly manner (user mode). Information regarding methodology used and results of drug treatment were extracted as detail elements of each data set and then inputted into the web-based database system. Comprehensive information comprising 2,326 clinical trial records, 90 disease states, and 939 drugs entities and concerning study objectives, background, methods used, results, and conclusion could be extracted from published information on phase II/III drug intervention clinical trials appearing in SCI journals within the last 10 years. The extracted data was successfully assembled into a clinical drug trial database with easy access suitable for use as a research tool. The clinically most important therapeutic categories, i.e., cancer, cardiovascular, respiratory, neurological, metabolic, urogenital, gastrointestinal, psychological, and infectious diseases were covered by the database. Names of test and control drugs, details on primary and secondary outcomes and indexed keywords could also be retrieved and built into the database. The construction used in the database enables the user to sort and download targeted information as a Microsoft Excel spreadsheet. Because of the comprehensive and standardized nature of the clinical drug trial database and its ease of access it should serve as valuable information repository and research tool for accessing clinical trial information and making evidence-based decisions by regulatory professionals, clinical research investigators, and medical practitioners.
Beneficial Effects of Different Flavonoids on Vascular and Renal Function in L-NAME Hypertensive Rats

PubMed Central

Paredes, M. Dolores; Romecín, Paola; Castillo, Julián; Ortiz, M. Clara

2018-01-01

Background: we have evaluated the antihypertensive effect of several flavonoid extracts in a rat model of arterial hypertension caused by chronic administration (6 weeks) of the nitric oxide synthesis inhibitor, L-NAME. Methods: Sprague Dawley rats received L-NAME alone or L-NAME plus flavonoid-rich vegetal extracts (Lemon, Grapefruit + Bitter Orange, and Cocoa) or purified flavonoids (Apigenin and Diosmin) for 6 weeks. Results: L-NAME treatment resulted in a marked elevation of blood pressure, and treatment with Apigenin, Lemon Extract, and Grapefruit + Bitter Orange extracts significantly reduced the elevated blood pressure of these animals. Apigenin and some of these flavonoids also ameliorated nitric oxide-dependent and -independent aortic vasodilation and elevated nitrite urinary excretion. End-organ abnormalities such as cardiac infarcts, hyaline arteriopathy and fibrinoid necrosis in coronary arteries and aorta were improved by these treatments, reducing the end-organ vascular damage. Conclusions: the flavonoids included in this study, specially apigenin, may be used as functional food ingredients with potential therapeutic benefit in arterial hypertension. PMID:29652818
Beneficial Effects of Different Flavonoids on Vascular and Renal Function in L-NAME Hypertensive Rats.

PubMed

Paredes, M Dolores; Romecín, Paola; Atucha, Noemí M; O'Valle, Francisco; Castillo, Julián; Ortiz, M Clara; García-Estañ, Joaquín

2018-04-13

we have evaluated the antihypertensive effect of several flavonoid extracts in a rat model of arterial hypertension caused by chronic administration (6 weeks) of the nitric oxide synthesis inhibitor, L-NAME. Sprague Dawley rats received L-NAME alone or L-NAME plus flavonoid-rich vegetal extracts (Lemon, Grapefruit + Bitter Orange, and Cocoa) or purified flavonoids (Apigenin and Diosmin) for 6 weeks. L-NAME treatment resulted in a marked elevation of blood pressure, and treatment with Apigenin, Lemon Extract, and Grapefruit + Bitter Orange extracts significantly reduced the elevated blood pressure of these animals. Apigenin and some of these flavonoids also ameliorated nitric oxide-dependent and -independent aortic vasodilation and elevated nitrite urinary excretion. End-organ abnormalities such as cardiac infarcts, hyaline arteriopathy and fibrinoid necrosis in coronary arteries and aorta were improved by these treatments, reducing the end-organ vascular damage. the flavonoids included in this study, specially apigenin, may be used as functional food ingredients with potential therapeutic benefit in arterial hypertension.
Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data.

PubMed

Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J

2017-05-01

Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and "off the shelf" tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing "off the shelf" approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
MMKG: An approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia

NASA Astrophysics Data System (ADS)

Zhang, Xiaoming; Liu, Xin; Li, Xin; Pan, Dongyu

2017-02-01

The research and development of metallic materials are playing an important role in today's society, and in the meanwhile lots of metallic materials knowledge is generated and available on the Web (e.g., Wikipedia) for materials experts. However, due to the diversity and complexity of metallic materials knowledge, the knowledge utilization may encounter much inconvenience. The idea of knowledge graph (e.g., DBpedia) provides a good way to organize the knowledge into a comprehensive entity network. Therefore, the motivation of our work is to generate a metallic materials knowledge graph (MMKG) using available knowledge on the Web. In this paper, an approach is proposed to build MMKG based on DBpedia and Wikipedia. First, we use an algorithm based on directly linked sub-graph semantic distance (DLSSD) to preliminarily extract metallic materials entities from DBpedia according to some predefined seed entities; then based on the results of the preliminary extraction, we use an algorithm, which considers both semantic distance and string similarity (SDSS), to achieve the further extraction. Second, due to the absence of materials properties in DBpedia, we use an ontology-based method to extract properties knowledge from the HTML tables of corresponding Wikipedia Web pages for enriching MMKG. Materials ontology is used to locate materials properties tables as well as to identify the structure of the tables. The proposed approach is evaluated by precision, recall, F1 and time performance, and meanwhile the appropriate thresholds for the algorithms in our approach are determined through experiments. The experimental results show that our approach returns expected performance. A tool prototype is also designed to facilitate the process of building the MMKG as well as to demonstrate the effectiveness of our approach.
12 CFR 612.2145 - Director reporting.

Code of Federal Regulations, 2010 CFR

2010-01-01

...) The name and the nature of the business of any entity in which the director has a material financial... activity that is required to be reported under this section or could constitute a conflict of interest... determination of whether the relationship, transaction, or activity is, in fact, a conflict of interest. (d...
Proposal to conserve Tamarix ramosissima against T. pentandra Tamaricaceae)

USDA-ARS?s Scientific Manuscript database

Ledebour described Tamarix ramosissima in 1829 from plants collected in Kazakhstan (Lake Noor Zaisan). In the protologue he overlooked T. pentandra Pall. (l.c.) and T. pallasii Desv. (l.c.), two earlier names which apply to the same biological entity, also widespread through Central and Western Asia...
78 FR 23194 - Federal Acquisition Regulation; Commercial and Government Entity Code

Federal Register 2010, 2011, 2012, 2013, 2014

2013-04-18

... Award Management Name Change, Phase 1 Implementation) which will make a global update to all of the... outside the United States; and Support supply chain traceability and integrity efforts. II. Discussion and.... For Contractors registered in the System for Award Management (SAM), the DLA Logistics Information...
ALDOL REACTION VIA IN SITU OLEFIN MIGRATION IN WATER. (R828129)

EPA Science Inventory

Mingwen Wang and Chao-Jun Li

Department of Chemistry, Tulane University, Ne...
The Role of Instruments in Three Chemical Revolutions

ERIC Educational Resources Information Center

Chamizo, José Antonio

2014-01-01

This paper attempts to show one of the ways history of chemistry can be teachable for chemistry teachers, it means something more than an undifferentiated mass of names and dates, establishing a temporal framework based on chemical entities that all students use. Represents a difficult equilibrium between over-simplification versus…
47 CFR 52.15 - Central office code administration.

Code of Federal Regulations, 2010 CFR

2010-10-01

... forecast data to the NANPA. (ii) Reporting shall be by separate legal entity and must include company name, company headquarters address, Operating Company Number (OCN), parent company OCN, and the primary type of... headquarters address, OCN, parent company's OCN(s), and the primary type of business in which the numbering...
40 CFR 59.501 - Am I subject to this subpart?

Code of Federal Regulations, 2010 CFR

2010-07-01

... (CONTINUED) NATIONAL VOLATILE ORGANIC COMPOUND EMISSION STANDARDS FOR CONSUMER AND COMMERCIAL PRODUCTS... subpart? (a) The regulated entities for an aerosol coating product are the manufacturer or importer of an aerosol coating product and a distributor of an aerosol coating product if it is named on the label or if...
Americans With Disabilities Act (ADA) Accessibility Guidelines for Transportation Vehicles. Final rule.

PubMed

2016-12-14

The Architectural and Transportation Barriers Compliance Board (Access Board or Board) is issuing a final rule that revises its existing accessibility guidelines for non-rail vehicles--namely, buses, over-the-road buses, and vans--acquired or remanufactured by entities covered by the Americans with Disabilities Act. The revised guidelines ensure that such vehicles are readily accessible to, and usable by, individuals with disabilities. The U.S. Department of Transportation (DOT) is required to revise its accessibility standards for transportation vehicles acquired or remanufactured by entities covered by the Americans with Disabilities Act (ADA) to be consistent with the final rule.
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

PubMed Central

Deleger, Louise; Li, Qi; Kaiser, Megan; Stoutenborough, Laura

2013-01-01

Background A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Objective Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. Methods To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. Results The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. Conclusions This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches. PMID:23548263
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.

PubMed

Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia

2011-10-03

The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
Determining similarity of scientific entities in annotation datasets

PubMed Central

Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

2015-01-01

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ PMID:25725057
Determining similarity of scientific entities in annotation datasets.

PubMed

Palma, Guillermo; Vidal, Maria-Esther; Haag, Eric; Raschid, Louiqa; Thor, Andreas

2015-01-01

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug-drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called 'AnnSim' that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1-1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ © The Author(s) 2015. Published by Oxford University Press.
Inventorship and Authorship

PubMed Central

Konski, Antoinette F.; Wu, Linda X.

2015-01-01

Ownership of a U.S. patent is based on inventorship. In the United States, an inventor is the owner of the claimed invention unless it is assigned to another entity. The correct naming of inventors is important, and the improper naming of inventors in a patent can be grounds for rendering the patent unenforceable. Each inventor must make an intellectual contribution, solely or jointly, to at least one element of a claim in the patent. This is in contrast to authorship of a research article, where authors may be named to acknowledge contribution to the reported research rather than an intellectual contribution. Thus, identifying inventors for a patent is not the same as identifying authors for a publication. PMID:26253091
A Framework for Classifying and Resolving Semantic Conflicts Using the Enhanced Entity-Relationship Model

DTIC Science & Technology

1992-09-01

rank, social security number, and date of birth, sex , race, etc. It also keeps data on marital status, number of dependents, and whether a member’s...specification as listed in the appendix. OPINS stores similar common personnel information to that in the ADMI database, such as name, rank, sex , etc.. The...34+ NAME (comp) "+ DATE..OF-.BIRTH (comp) "+ SEX "+ BACE-MIHNIC "+ ETHNIC..GROUP "+ PAYýENTRY-.BASE..DATE (comp) "+ SERVICE "+ MOS (comp) "+ DATE-OF

Analysis of a Probabilistic Model of Redundancy in Unsupervised Information Extraction

DTIC Science & Technology

2010-08-25

5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS( ES ) University of Washington,Department of Computer Science and Engineering...Box 352350,Seattle,WA,98195 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS( ES ) 10. SPONSOR/MONITOR’S...approximation, with algebra we have: PUSC(x ∈ C|x appears k times inndraws) ≈ 1 1 + |E||C| ( pE pC )ken(pC−pE) . (2) In general, we expect the extraction
Migraines

MedlinePlus

... name: Excedrin Migraine); ibuprofen (one brand name: Motrin); naproxen (brand name: Aleve); and ketoprofen (brand name: Orudis ... and leaf and root extracts of the butterbur plant. What else can I do to prevent migraines? ...
Neural foundations and functional specificity of number representations.

PubMed

Piazza, Manuela; Eger, Evelyn

2016-03-01

Number is a complex category, as with the word "number" we may refer to different entities. First, it is a perceptual property that characterizes any set of individual items, namely its cardinality. The ability to extract the (approximate) cardinality of sets is almost universal in the animal domain and present in humans since birth. In primates, posterior parietal cortex seems to be a crucial site for this ability, even if the degree of selectivity of numerical representations in parietal cortex reported to date appears much lower compared to that of other semantic categories in the ventral stream. Number can also be intended as a mathematical object, which we humans use to count, measure, and order: a (verbal or visual) symbol that stands for the cardinality of a set, the intensity of a continuous quantity or the position of an item on a list. Evidence points to a convergence towards parietal cortex for the semantic coding of numerical symbols and to the bilateral occipitotemporal cortex for the shape coding of Arabic digits and other number symbols. Copyright © 2015 Elsevier Ltd. All rights reserved.
The College Readiness Data Catalog Tool: User Guide. REL 2014-042

ERIC Educational Resources Information Center

Rodriguez, Sheila M.; Estacion, Angela

2014-01-01

As the name indicates, the College Readiness Data Catalog Tool focuses on identifying data that can indicate a student's college readiness. While college readiness indicators may also signal career readiness, many states, districts, and other entities, including the U.S. Virgin Islands (USVI), do not systematically collect career readiness…
75 FR 5614 - Privacy Act of 1974; Department of Homeland Security/ALL-025 Law Enforcement Authority in Support...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-02-03

...'s or entity's name; Digital photograph; Date of birth, place of birth, and age; Social security number; Duty/work address and telephone number; Alias; Race and ethnicity; Citizenship; Fingerprints; Sex... servers, magnetic disc, tape, digital media, and CD-ROM. RETRIEVABILITY: Records may be retrieved by...
26 CFR 1.1445-2 - Situations in which withholding is not required under section 1445(a).

Code of Federal Regulations, 2013 CFR

2013-04-01

... Revenue Code and Income Tax Regulations); 2. [Name of transferor] is not a disregarded entity as defined... Revenue Service when requested in accordance with the requirements of section 6001 and regulations... established securities market. This exemption shall apply if the disposition is incident to an initial public...
Set Size, Individuation, and Attention to Shape

ERIC Educational Resources Information Center

Cantrell, Lisa; Smith, Linda B.

2013-01-01

Much research has demonstrated a shape bias in categorizing and naming solid objects. This research has shown that when an entity is conceptualized as an individual object, adults and children attend to the object's shape. Separate research in the domain of numerical cognition suggest that there are distinct processes for quantifying small and…
76 FR 2268 - Viruses, Serums, Toxins, and Analogous Products; Packaging and Labeling

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-13

.... Effects on Small Entities The RFA requires agencies to evaluate the potential effects of their proposed... and placed giving equal emphasis to each word composing it. Descriptive terms used in the true name on the product license or permit shall also appear. Abbreviations of the descriptive terms may be used on...
Investigating the Importance of Relating with God for School Students' Spiritual Well-Being

ERIC Educational Resources Information Center

Fisher, John W.

2010-01-01

Fisher's spiritual well-being (SWB) questionnaires assessed students' levels of relationship in four domains, namely with themselves, others, the environment and with a Transcendent Other (commonly called God). Students also reported the extent to which different entities helped them develop relationships in the four domains of SWB. However,…
78 FR 41959 - State, Local, Tribal, and Private Sector Policy Advisory Committee (SLTPS-PAC); Notice of Meeting

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-12

...] State, Local, Tribal, and Private Sector Policy Advisory Committee (SLTPS-PAC); Notice of Meeting AGENCY... Classified National Security Information Program for State, Local, Tribal, and Private Sector Entities. FOR..., announcement is made for the following committee meeting. Name of Committee: State, Local, Tribal, and Private...
10 CFR 420.2 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... or cooling system, or both, or for a hot water system. Carpool means the sharing of a ride by two or... or other entity named in the notice of grant award as the recipient. HVAC means heating, ventilating... equipment or facility which is used in connection with, or as part of, any process or system for industrial...
10 CFR 420.2 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... or cooling system, or both, or for a hot water system. Carpool means the sharing of a ride by two or... or other entity named in the notice of grant award as the recipient. HVAC means heating, ventilating... equipment or facility which is used in connection with, or as part of, any process or system for industrial...
10 CFR 420.2 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-01-01

... or cooling system, or both, or for a hot water system. Carpool means the sharing of a ride by two or... or other entity named in the notice of grant award as the recipient. HVAC means heating, ventilating... equipment or facility which is used in connection with, or as part of, any process or system for industrial...
11 CFR 104.8 - Uniform reporting of receipts.

Code of Federal Regulations, 2010 CFR

2010-01-01

... reported during the calendar year (or during the election cycle, in the case of an authorized committee... the aggregate exceeds $200 in a calendar year (or in an election cycle, in the case of an authorized... donating individual's or entity's name, mailing address, occupation or type of business, and the date of...
75 FR 39420 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-44; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-08

... Federal Acquisition Circular (FAC) 2005-44 which amends the Federal Acquisition Regulation (FAR). Interested parties may obtain further information regarding this rule by referring to FAC 2005-44 which... . FOR FURTHER INFORMATION CONTACT: The analyst whose name appears in the table below. Please cite FAC...
76 FR 4191 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-49; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2011-01-24

... Federal Acquisition Circular (FAC) 2005-49, which amend the Federal Acquisition Regulation (FAR). Interested parties may obtain further information regarding these rules by referring to FAC 2005-49, which... analyst whose name appears in the table below. Please cite FAC 2005-49 and the specific FAR case number...
5 CFR 843.205 - Designation of beneficiary-form and execution.

Code of Federal Regulations, 2010 CFR

2010-01-01

... execution. 843.205 Section 843.205 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT (CONTINUED) CIVIL... One-time Payments § 843.205 Designation of beneficiary—form and execution. (a) A designation of..., corporation, or legal entity may be named as beneficiary. (e) A change of beneficiary may be made at any time...
Combining Open-domain and Biomedical Knowledge for Topic Recognition in Consumer Health Questions.

PubMed

Mrabet, Yassine; Kilicoglu, Halil; Roberts, Kirk; Demner-Fushman, Dina

2016-01-01

Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context. In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases. In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia. In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not. We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD). The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance. Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content.
Adapting Web content for low-literacy readers by using lexical elaboration and named entities labeling

NASA Astrophysics Data System (ADS)

Watanabe, W. M.; Candido, A.; Amâncio, M. A.; De Oliveira, M.; Pardo, T. A. S.; Fortes, R. P. M.; Aluísio, S. M.

2010-12-01

This paper presents an approach for assisting low-literacy readers in accessing Web online information. The "Educational FACILITA" tool is a Web content adaptation tool that provides innovative features and follows more intuitive interaction models regarding accessibility concerns. Especially, we propose an interaction model and a Web application that explore the natural language processing tasks of lexical elaboration and named entity labeling for improving Web accessibility. We report on the results obtained from a pilot study on usability analysis carried out with low-literacy users. The preliminary results show that "Educational FACILITA" improves the comprehension of text elements, although the assistance mechanisms might also confuse users when word sense ambiguity is introduced, by gathering, for a complex word, a list of synonyms with multiple meanings. This fact evokes a future solution in which the correct sense for a complex word in a sentence is identified, solving this pervasive characteristic of natural languages. The pilot study also identified that experienced computer users find the tool to be more useful than novice computer users do.
A bioinformatics knowledge discovery in text application for grid computing

PubMed Central

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-01-01

Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749

A bioinformatics knowledge discovery in text application for grid computing.

PubMed

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-06-16

A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
Antihypertensive and vasodilator effects of methanolic extract of Inula viscosa: Biological evaluation and POM analysis of cynarin, chlorogenic acid as potential hypertensive.

PubMed

Hakkou, Zineb; Maciuk, Alexandre; Leblais, Veronique; Bouanani, Nour Elhouda; Mekhfi, Hassane; Bnouham, Mohammed; Aziz, Mohammed; Ziyyat, Abderrahime; Rauf, Abdur; Hadda, Taibi Ben; Shaheen, Usama; Patel, Seema; Fischmeister, Rodolphe; Legssyer, Abdelkhaleq

2017-09-01

Inula viscosa L. (Asteraceae) is a medicinal plant widely used as a folk medicine in oriental Morocco, to treat hypertension. The antihypertensive effect of the methanolic extract obtained from I. viscosa leaves was evaluated in hypertensive L-NAME rats. Systolic blood pressure (SBP) was measured using a non-invasive indirect tail-cuff plethysmographic method. Four groups of rats were used: a control group; a hypertensive group treated with L-NAME (32mg/kg/day); a positive control group treated with L-NAME plus enalapril (15mg/kg/day) as a reference antihypertensive agent; and a group treated with L-NAME plus MeOH-extract (40mg/kg/day). Treatment with L-NAME alone caused a progressive increase in SBP. After 4 weeks, the value of SBP reached 160±2mmHg which shows the installation of hypertension. Enalapril prevented the increase in SBP, which remained normal at 123±1mmHg after 4 weeks of treatment. The administration of MeOH-extract along with L-NAME prevented the increase in SBP as well, which remained constant at 115±1mmHg after 4 weeks of treatment. In ex-vivo models, MeOH-extract produced a relaxation of pre-contracted ring aorta (54 ± 2% of relaxation at 3g/L). But, when the rings were denuded, MeOH-extract failed to relax pre-contracted rings of aorta. Phytochemical study of I. viscosa MeOH-extract revealed the presence of phenolic compounds, such as cynarin and chlorogenic acid. The present results suggest that I. viscosa MeOH-extract has an antihypertensive, predominantly mediated by an endothelium-dependent vasodilatory effect. Cynarin and chlorogenic acid, which have a strong vasorelaxant effect may be involved in the antihypertensive effect of the plant extract. The bioinformatic POM analysis confirms the therapeutic potential of cynarin and chlorogenic acids as inhibitors of various biotargets. Based on the results, the coordination of these phytochemicals with calcium and transition metals should be studied, for better scope at antihypertensive drug development. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Inferring Group Processes from Computer-Mediated Affective Text Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schryver, Jack C; Begoli, Edmon; Jose, Ajith

2011-02-01

Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Severalmore » useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.« less
Phacomatosis pigmentokeratotica: another epidermal nevus syndrome and a distinctive type of twin spotting.

PubMed

Boente, M C; Pizzi de Parra, N; Larralde de Luna, M; Bonet, H B; Santos Muñoz, A; Parra, V; Gramajo, P; Moreno, S; Asial, R A

2000-01-01

The name epidermal nevus syndrome could be applied to a group of clinically and histopathologically different entities as has been pointed out by Happle. Phacomatosis pigmentokeratotica is a further type of epidermal nevus syndrome distinguished by the presence of a sebaceous nevus and a contralateral speckled lentiginous nevus of the papular type, associated with skeletal or neurological abnormalities. Three new cases of this recently delineated syndrome are presented. A common origin may account for the temporal and spatial relationship between the epidermal and the speckled lentiginous nevus. The concept of melanocytic-epidermal twin spotting similar to the interpretation of vascular twin spotting could explain the pathogenesis of this entity.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.

PubMed

Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J

2011-12-01

BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
MedXN: an open source medication extraction and normalization tool for clinical text

PubMed Central

Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang

2014-01-01

Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954
Automated anatomical labeling of bronchial branches extracted from CT datasets based on machine learning and combination optimization and its application to bronchoscope guidance.

PubMed

Mori, Kensaku; Ota, Shunsuke; Deguchi, Daisuke; Kitasaka, Takayuki; Suenaga, Yasuhito; Iwano, Shingo; Hasegawa, Yosihnori; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi

2009-01-01

This paper presents a method for the automated anatomical labeling of bronchial branches extracted from 3D CT images based on machine learning and combination optimization. We also show applications of anatomical labeling on a bronchoscopy guidance system. This paper performs automated labeling by using machine learning and combination optimization. The actual procedure consists of four steps: (a) extraction of tree structures of the bronchus regions extracted from CT images, (b) construction of AdaBoost classifiers, (c) computation of candidate names for all branches by using the classifiers, (d) selection of best combination of anatomical names. We applied the proposed method to 90 cases of 3D CT datasets. The experimental results showed that the proposed method can assign correct anatomical names to 86.9% of the bronchial branches up to the sub-segmental lobe branches. Also, we overlaid the anatomical names of bronchial branches on real bronchoscopic views to guide real bronchoscopy.
Synergistic Antihypertensive Effect of Carthamus tinctorius L. Extract and Captopril in l-NAME-Induced Hypertensive Rats via Restoration of eNOS and AT1R Expression

PubMed Central

Maneesai, Putcharawipa; Prasarttong, Patoomporn; Bunbupha, Sarawoot; Kukongviriyapan, Upa; Kukongviriyapan, Veerapol; Tangsucharit, Panot; Prachaney, Parichat; Pakdeechote, Poungrat

2016-01-01

This study examined the effect of Carthamus tinctorius (CT) extract plus captopril treatment on blood pressure, vascular function, nitric oxide (NO) bioavailability, oxidative stress and renin-angiotensin system (RAS) in Nω-Nitro-l-arginine methyl ester (l-NAME)-induced hypertension. Rats were treated with l-NAME (40 mg/kg/day) for five weeks and given CT extract (75 or 150 or 300 or 500 mg/kg/day): captopril (5 mg/kg/day) or CT extract (300 mg/kg/day) plus captopril (5 mg/kg/day) for two consecutive weeks. CT extract reduced blood pressure dose-dependently, and the most effective dose was 300 mg/kg/day. l-NAME-induced hypertensive rats showed abnormalities including high blood pressure, high vascular resistance, impairment of acetylcholine-induced vasorelaxation in isolated aortic rings and mesenteric vascular beds, increased vascular superoxide production and plasma malondialdehyde levels, downregulation of eNOS, low level of plasma nitric oxide metabolites, upregulation of angiotensin II type 1 receptor and increased plasma angiotensin II. These abnormalities were alleviated by treatment with either CT extract or captopril. Combination treatment of CT extract and captopril normalized all the abnormalities found in hypertensive rats except endothelial dysfunction. These data indicate that there are synergistic antihypertensive effects of CT extract and captopril. These effects are likely mediated by their anti-oxidative properties and their inhibition of RAS. PMID:26938552
NEMO: Extraction and normalization of organization names from PubMed affiliations.

PubMed

Jonnalagadda, Siddhartha Reddy; Topham, Philip

2010-10-04

Today, there are more than 18 million articles related to biomedical research indexed in MEDLINE, and information derived from them could be used effectively to save the great amount of time and resources spent by government agencies in understanding the scientific landscape, including key opinion leaders and centers of excellence. Associating biomedical articles with organization names could significantly benefit the pharmaceutical marketing industry, health care funding agencies and public health officials and be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or collaborators. Large amount of extracted information helps in disambiguating organization names using machine-learning algorithms. We propose NEMO, a system for extracting organization names in the affiliation and normalizing them to a canonical organization name. Our parsing process involves multi-layered rule matching with multiple dictionaries. The system achieves more than 98% f-score in extracting organization names. Our process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. A high precision was also observed in normalization. NEMO is the missing link in associating each biomedical paper and its authors to an organization name in its canonical form and the Geopolitical location of the organization. This research could potentially help in analyzing large social networks of organizations for landscaping a particular topic, improving performance of author disambiguation, adding weak links in the co-author network of authors, augmenting NLM's MARS system for correcting errors in OCR output of affiliation field, and automatically indexing the PubMed citations with the normalized organization name and country. Our system is available as a graphical user interface available for download along with this paper.
The Developer’s Guide to Cursor on Target

DTIC Science & Technology

2005-08-01

attribute’s name once it’s in common use. This has resulted in a number of “ warts ” with CoT’s entity names. For example, why did we use ce, le and...this may be (completely fictitious type) “a-n-p-d-t-o-r” which would be short hand for an object class: atoms::neutral:: plant ::deciduous::tree...100% certain it’s a plant I’m 98% certain it’s a tree I’m 90% certain it’s an oak I’m 70% certain it’s a red oak So, what one confidence
Feasibility of feature-based indexing, clustering, and search of clinical trials: A case study of breast cancer trials from ClinicalTrials.gov

PubMed Central

Boland, Mary Regina; Miotto, Riccardo; Gao, Junfeng; Weng, Chunhua

2013-01-01

Summary Background When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space. Objectives This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes. Methods We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively. Results We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction. Conclusions It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency. PMID:23666475
Feasibility of feature-based indexing, clustering, and search of clinical trials. A case study of breast cancer trials from ClinicalTrials.gov.

PubMed

Boland, M R; Miotto, R; Gao, J; Weng, C

2013-01-01

When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space. This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes. We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively. We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction. It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.
Large-scale extraction of brain connectivity from the neuroscientific literature

PubMed Central

Richardet, Renaud; Chappelier, Jean-Cédric; Telefont, Martin; Hill, Sean

2015-01-01

Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity. Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against in vivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100 000 (ABA) and 122 000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists. Availability and implementation: The resulting models are publicly available at github.com/BlueBrain/bluima. Contact: renaud.richardet@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25609795
Symbolic emblems of the Levantine Aurignacians as a regional entity identifier (Hayonim Cave, Lower Galilee, Israel).

PubMed

Tejero, José-Miguel; Belfer-Cohen, Anna; Bar-Yosef, Ofer; Gutkin, Vitaly; Rabinovich, Rivka

2018-05-15

The Levantine Aurignacian is a unique phenomenon in the local Upper Paleolithic sequence, showing greater similarity to the West European classic Aurignacian than to the local Levantine archaeological entities preceding and following it. Herewith we highlight another unique characteristic of this entity, namely, the presence of symbolic objects in the form of notched bones (mostly gazelle scapulae) from the Aurignacian levels of Hayonim Cave, Lower Galilee, Israel. Through both macroscopic and microscopic analyses of the items, we suggest that they are not mere cut marks but rather are intentional (decorative?) human-made markings. The significance of this evidence for symbolic behavior is discussed in its chrono-cultural and geographical contexts. Notched bones are among the oldest symbolic expressions of anatomically modern humans. However, unlike other Paleolithic sites where such findings were reported in single numbers, the number of these items recovered at Hayonim Cave is sufficient to assume they possibly served as an emblem of the Levantine Aurignacian.
77 FR 71575 - Initiation of Antidumping and Countervailing Duty Administrative Reviews and Request for...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-03

... No Sales If a producer or exporter named in this notice of initiation had no exports, sales, or... from government control of its export activities to be entitled to a separate rate, the Department analyzes each entity exporting the subject merchandise under a test arising from the Final Determination of...
34 CFR 682.302 - Payment of special allowance on FFEL loans.

Code of Federal Regulations, 2010 CFR

2010-07-01

... State or non-profit entity's Chief Executive Officer (CEO) which— (1) Includes the name and lender... commercial paper (financial) rates in effect for each of the days in such quarter as reported by the Federal...) Determining the average of the bond equivalent rates of the quotes of the 3-month commercial paper (financial...
The Role of Sensory-Motor Information in Object Recognition: Evidence from Category-Specific Visual Agnosia

ERIC Educational Resources Information Center

Wolk, D.A.; Coslett, H.B.; Glosser, G.

2005-01-01

The role of sensory-motor representations in object recognition was investigated in experiments involving AD, a patient with mild visual agnosia who was impaired in the recognition of visually presented living as compared to non-living entities. AD named visually presented items for which sensory-motor information was available significantly more…
19 CFR 122.49b - Electronic manifest requirement for crew members and non-crew members onboard commercial aircraft...

Code of Federal Regulations, 2013 CFR

2013-04-01

... Name Record locator, if available; (xvi) International Air Transport Association (IATA) code of foreign... HOMELAND SECURITY; DEPARTMENT OF THE TREASURY AIR COMMERCE REGULATIONS Aircraft Entry and Entry Documents...” includes each entity that is an “aircraft operator” or “foreign air carrier” with a security program under...
77 FR 43413 - Imposition of Nonproliferation Measures on Five Syrian Entities

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-24

... Program Management, Office of Foreign Assets Control, Department of the Treasury (202-622-2500). On U.S... imposition of measures pursuant to sections 4(b), 4(c), and 4(d) of Executive Order 12938: Business Lab...) transfers of U.S.-origin defense articles and defense services from foreign destinations to the above-named...
76 FR 65781 - Request for Applications for the IRS Advisory Committee on Tax Exempt and Government Entities

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-24

... Date(s) (required for FBI check); Date of Birth (required for FBI check); City and State of Birth (required for FBI Check); Current Address; Telephone and Fax Numbers; and e-mail address, if any... things, pre-appointment and annual tax checks, and an FBI criminal and subversive name check, fingerprint...

77 FR 35462 - Self-Regulatory Organizations; ICE Clear Credit LLC; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-13

... Organizations; ICE Clear Credit LLC; Notice of Filing and Immediate Effectiveness of Proposed Rule Change To Amend Schedule 502 of the ICE Clear Credit LLC Rules To Amend the Reference Entity Name for Three Credit..., 2012, ICE Clear Credit LLC (``ICC'') filed with the Securities and Exchange Commission (``Commission...
78 FR 34646 - Pure Magnesium from the People's Republic of China: Preliminary Results of 2011-2012 Antidumping...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-10

...(f),\\8\\ due to the high level of common ownership, interlocking boards and managers, and intertwined... will be unable to assign the collapsed entity a joint cash deposit rate under both company names, and may determine the cash deposit rate for TMM by relying upon adverse facts available. Methodology The...
77 FR 23371 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-58; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-18

... Federal Acquisition Circular (FAC) 2005-58, which amends the Federal Acquisition Regulation (FAR). An... parties may obtain further information regarding this rule by referring to FAC 2005-58, which precedes... analyst whose name appears in the table below. Please cite FAC 2005-58 and the FAR case number. For...
78 FR 13769 - Federal Acquisition Regulation; Federal Acquisition Circular 2005-66; Small Entity Compliance Guide

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-28

... Federal Acquisition Circular (FAC) 2005-66, which amends the Federal Acquisition Regulation (FAR). An... parties may obtain further information regarding this rule by referring to FAC 2005-66, which precedes... analyst whose name appears in the table below. Please cite FAC 2005-66 and the FAR case number. For...
Forgetting of Prose as a Function of Interpolated Passage Content and Organization.

ERIC Educational Resources Information Center

Andre, Thomas; And Others

In three studies subjects read two successive passages and then were tested for retention of the first. Each passage described the characteristics of a series of entities (diseases or countries) along a series of dimensions (symptoms, cause, etc., or climate, soil type, etc.). The first passage described five diseases and was organized by name;…
Justices challenge notion that prisons are exempt from ADA.

PubMed

1998-05-15

The U. S. Supreme Court questioned Paul Tufano, a Pennsylvania general council member, in a case involving whether prisoners are exempt from the Americans with Disabilities Act (ADA). The case, brought by former inmate [name removed] [name removed] against the Pennsylvania Department of Corrections, claimed that because [name removed] suffered from hypertension he was prevented from participating in a boot camp program or other programs that might have led to an earlier release. As a result, [name removed] was incarcerated a year longer than he might have been. Pennsylvania's position is that prisoners are exempt from the ADA. However, under sharp questioning by several justices, Tufano agreed that the statute does apply to prison employees and visitors. The verdict could have wide-ranging implications for prisoners with HIV. Circuit courts have been divided on the issue of what a public entity is and whether the ADA applies. A decision is expected by June 30.
Processing new and repeated names: Effects of coreference on repetition priming with speech and fast RSVP

PubMed Central

Camblin, C. Christine; Ledoux, Kerry; Boudewyn, Megan; Gordon, Peter C.; Swaab, Tamara Y.

2006-01-01

Previous research has shown that the process of establishing coreference with a repeated name can affect basic repetition priming. Specifically, repetition priming on some measures can be eliminated for repeated names that corefer with an entity that is prominent in the discourse model. However, the exact nature and timing of this modulating effect of discourse are not yet understood. Here, we present two ERP studies that further probe the nature of repeated name coreference by using naturally produced connected speech and fast-rate RSVP methods of presentation. With speech we found that repetition priming was eliminated for repeated names that coreferred with a prominent antecedent. In contrast, with fast-rate RSVP, we found a main effect of repetition that did not interact with sentence context. This indicates that the creation of a discourse model during comprehension can affect repetition priming, but the nature of this effect may depend on input speed. PMID:16904078
Collaborative biocuration--text-mining development task for document prioritization for curation.

PubMed

Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

2012-01-01

The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems for the biological domain. The 'BioCreative Workshop 2012' subcommittee identified three areas, or tracks, that comprised independent, but complementary aspects of data curation in which they sought community input: literature triage (Track I); curation workflow (Track II) and text mining/natural language processing (NLP) systems (Track III). Track I participants were invited to develop tools or systems that would effectively triage and prioritize articles for curation and present results in a prototype web interface. Training and test datasets were derived from the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) and consisted of manuscripts from which chemical-gene-disease data were manually curated. A total of seven groups participated in Track I. For the triage component, the effectiveness of participant systems was measured by aggregate gene, disease and chemical 'named-entity recognition' (NER) across articles; the effectiveness of 'information retrieval' (IR) was also measured based on 'mean average precision' (MAP). Top recall scores for gene, disease and chemical NER were 49, 65 and 82%, respectively; the top MAP score was 80%. Each participating group also developed a prototype web interface; these interfaces were evaluated based on functionality and ease-of-use by CTD's biocuration project manager. In this article, we present a detailed description of the challenge and a summary of the results.
History of Diabetes Insipidus.

PubMed

Valenti, Giovanna; Tamma, Grazia

2016-02-01

Under physiological conditions, fluid and electrolyte homoeostasis is maintained by the kidney adjusting urine volume and composition according to body needs. Diabetes Insipidus is a complex and heterogeneous clinical syndrome affecting water balance and characterized by constant diuresis, resulting in large volumes of dilute urine. With respect to the similarly named Diabetes Mellitus, a disease already known in ancient Egypt, Greece and Asia, Diabetes Insipidus has been described several thousand years later. In 1670s Thomas Willis, noted the difference in taste of urine from polyuric subjects compared with healthy individuals and started the differentiation of Diabetes Mellitus from the more rare entity of Diabetes Insipidus. In 1794, Johann Peter Frank described polyuric patients excreting nonsaccharine urine and introduced the term of Diabetes Insipidus. An hystorical milestone was the in 1913, when Farini successfully used posterior pituitary extracts to treat Diabetes Insipidus. Until 1920s the available evidence indicated Diabetes Insipidus as a disorder of the pituitary gland. In the early 1928, De Lange first observed that some patients with Diabetes Insipidus did not respond to posterior pituitary extracts and subsequently Forssman and Waring in 1945 established that the kidney had a critical role for these forms of Diabetes Insipidus resistant to this treatment. In 1947 Williams and Henry introduced the term Nephrogenic Diabetes Insipidus for the congenital syndrome characterized by polyuria and renal concentrating defect resistant to vasopressin. In 1955, du Vigneaud received the 1955 Nobel Prize in chemistry for the first synthesis of the hormone vasopressin representing a milestone for the treatment of Central Diabetes Insipidus.
A review of criticisms of phylogenetic nomenclature: is taxonomic freedom the fundamental issue?

PubMed

Bryant, Harold N; Cantino, Philip D

2002-02-01

The proposal to implement a phylogenetic nomenclatural system governed by the PhyloCode), in which taxon names are defined by explicit reference to common descent, has met with strong criticism from some proponents of phylogenetic taxonomy (taxonomy based on the principle of common descent in which only clades and species are recognized). We examine these criticisms and find that some of the perceived problems with phylogenetic nomenclature are based on misconceptions, some are equally true of the current rank-based nomenclatural system, and some will be eliminated by implementation of the PhyloCode. Most of the criticisms are related to an overriding concern that, because the meanings of names are associated with phylogenetic pattern which is subject to change, the adoption of phylogenetic nomenclature will lead to increased instability in the content of taxa. This concern is associated with the fact that, despite the widespread adoption of the view that taxa are historical entities that are conceptualized based on ancestry, many taxonomists also conceptualize taxa based on their content. As a result, critics of phylogenetic nomenclature have argued that taxonomists should be free to emend the content of taxa without constraints imposed by nomenclatural decisions. However, in phylogenetic nomenclature the contents of taxa are determined, not by the taxonomist, but by the combination of the phylogenetic definition of the name and a phylogenetic hypothesis. Because the contents of taxa, once their names are defined, can no longer be freely modified by taxonomists, phylogenetic nomenclature is perceived as limiting taxonomic freedom. We argue that the form of taxonomic freedom inherent to phylogenetic nomenclature is appropriate to phylogenetic taxonomy in which taxa are considered historical entities that are discovered through phylogenetic analysis and are not human constructs.
Vascular and antioxidant effects of an aqueous Mentha cordifolia extract in experimental N(G)-nitro-L-arginine methyl ester-induced hypertension.

PubMed

Pakdeechote, Poungrat; Prachaney, Parichat; Berkban, Warinee; Kukongviriyapan, Upa; Kukongviriyapan, Veerapol; Khrisanapant, Wilaiwan; Phirawatthakul, Yada

2014-01-01

The effect of an aqueous Mentha cordifolia (MC) extract on the haemodynamic status, vascular remodeling, function, and oxidative status in NG-nitro-L-arginine methyl ester (L-NAME)-induced hypertension was investigated. Male Sprague-Dawley rats were given L-NAME [50 mg/(kg body weight (BW) d)] in their drinking water for 5 weeks and were treated by intragastric administration with the MC extract [200 mg/(kgBWd)] for 2 consecutive weeks. Quercetin [25 mg/(kg BW d)] was used as a positive control. The effects of the MC extract on the haemodynamic status, thoracic aortic wall thickness, and oxidative stress markers were determined, and the vasorelaxant activity of the MC extract was tested in isolated mesenteric vascular beds in rats. Significant increases in the mean arterial pressure (MAP), heart rate (HR), hind limb vascular resistance (HVR), wall thickness, and cross-sectional area of the thoracic aorta, as well as oxidative stress markers were found in the L-NAME-treated group compared to the control (P < 0.05). MAP, HVR, wall thickness, cross-sectional area of the thoracic aorta, plasma malondialdehyde (MDA), and vascular superoxide anion production were significantly reduced in L-NAME hypersensitive rats treated with the MC extract or quercetin. Furthermore, the MC extract induced vasorelaxation in the pre-constricted mesenteric vascular bed with intact and denuded endothelium of normotensive and hypertensive rats. Our results suggest that the MC extract exhibits an antihypertensive effect via its antioxidant capacity, vasodilator property, and reduced vascular remodeling.
Germ cell neoplasia in situ (GCNIS): evolution of the current nomenclature for testicular pre-invasive germ cell malignancy.

PubMed

Berney, Daniel M; Looijenga, Leendert H J; Idrees, Muhammad; Oosterhuis, J Wolter; Rajpert-De Meyts, Ewa; Ulbright, Thomas M; Skakkebaek, Niels E

2016-07-01

The pre-invasive lesion associated with post-pubertal malignant germ cell tumours of the testis was first recognized in the early 1970s and confirmed by a number of observational and follow-up studies. Until this year, this scientific story has been confused by resistance to the entity and disagreement on its name. Initially termed 'carcinoma in situ' (CIS), it has also been known as 'intratubular germ cell neoplasia, unclassified' (IGCNU) and 'testicular intraepithelial neoplasia' (TIN). In this paper, we review the history of discovery and controversy concerning these names and introduce the reasoning for uniting behind a new name, endorsed unanimously at the World Health Organization (WHO) consensus classification 2016: germ cell neoplasia in situ (GCNIS). © 2016 John Wiley & Sons Ltd.
Machine Reading for Extraction of Bacteria and Habitat Taxonomies

PubMed Central

Kordjamshidi, Parisa; Massa, Wouter; Provoost, Thomas; Moens, Marie-Francine

2015-01-01

There is a vast amount of scientific literature available from various resources such as the internet. Automating the extraction of knowledge from these resources is very helpful for biologists to easily access this information. This paper presents a system to extract the bacteria and their habitats, as well as the relations between them. We investigate to what extent current techniques are suited for this task and test a variety of models in this regard. We detect entities in a biological text and map the habitats into a given taxonomy. Our model uses a linear chain Conditional Random Field (CRF). For the prediction of relations between the entities, a model based on logistic regression is built. Designing a system upon these techniques, we explore several improvements for both the generation and selection of good candidates. One contribution to this lies in the extended exibility of our ontology mapper that uses an advanced boundary detection and assigns the taxonomy elements to the detected habitats. Furthermore, we discover value in the combination of several distinct candidate generation rules. Using these techniques, we show results that are significantly improving upon the state of art for the BioNLP Bacteria Biotopes task. PMID:27077141
Sorting the Alphabet Soup of Renal Pathology: A Review.

PubMed

Curran-Melendez, Sheilah M; Hartman, Matthew S; Heller, Matthew T; Okechukwu, Nancy

2016-01-28

Diseases of the kidney often have their names shortened, creating an arcane set of acronyms which can be confusing to both radiologists and clinicians. This review of renal pathology aims to explain some of the most commonly used acronyms within the field. For each entity, a summary of the clinical features, pathophysiology, and radiological findings is included to aid in the understanding and differentiation of these entities. Discussed topics include acute cortical necrosis, autosomal dominant polycystic kidney disease, angiomyolipoma, autosomal recessive polycystic kidney disease, acute tubular necrosis, localized cystic renal disease, multicystic dysplastic kidney, multilocular cystic nephroma, multilocular cystic renal cell carcinoma, medullary sponge kidney, paroxysmal nocturnal hemoglobinuria, renal papillary necrosis, transitional cell carcinoma, and xanthogranulomatous pyelonephritis. Copyright © 2016 Mosby, Inc. All rights reserved.
Network Information System

DOE Office of Scientific and Technical Information (OSTI.GOV)

1996-05-01

The Network Information System (NWIS) was initially implemented in May 1996 as a system in which computing devices could be recorded so that unique names could be generated for each device. Since then the system has grown to be an enterprise wide information system which is integrated with other systems to provide the seamless flow of data through the enterprise. The system Iracks data for two main entities: people and computing devices. The following are the type of functions performed by NWIS for these two entities: People Provides source information to the enterprise person data repository for select contractors andmore » visitors Generates and tracks unique usernames and Unix user IDs for every individual granted cyber access Tracks accounts for centrally managed computing resources, and monitors and controls the reauthorization of the accounts in accordance with the DOE mandated interval Computing Devices Generates unique names for all computing devices registered in the system Tracks the following information for each computing device: manufacturer, make, model, Sandia property number, vendor serial number, operating system and operating system version, owner, device location, amount of memory, amount of disk space, and level of support provided for the machine Tracks the hardware address for network cards Tracks the P address registered to computing devices along with the canonical and alias names for each address Updates the Dynamic Domain Name Service (DDNS) for canonical and alias names Creates the configuration files for DHCP to control the DHCP ranges and allow access to only properly registered computers Tracks and monitors classified security plans for stand-alone computers Tracks the configuration requirements used to setup the machine Tracks the roles people have on machines (system administrator, administrative access, user, etc...) Allows systems administrators to track changes made on the machine (both hardware and software) Generates an adjustment history of changes on selected fields« less
Soil CO2 flux from three ecosystems in tropical peatland of Sarawak, Malaysia

NASA Astrophysics Data System (ADS)

Melling, Lulie; Hatano, Ryusuke; Goh, Kah Joo

2005-02-01

Soil CO2 flux was measured monthly over a year from tropical peatland of Sarawak, Malaysia using a closed-chamber technique. The soil CO2 flux ranged from 100 to 533 mg C m2 h1 for the forest ecosystem, 63 to 245 mg C m2 h1 for the sago and 46 to 335 mg C m2 h1 for the oil palm. Based on principal component analysis (PCA), the environmental variables over all sites could be classified into three components, namely, climate, soil moisture and soil bulk density, which accounted for 86% of the seasonal variability. A regression tree approach showed that CO2 flux in each ecosystem was related to different underlying environmental factors. They were relative humidity for forest, soil temperature at 5 cm for sago and water-filled pore space for oil palm. On an annual basis, the soil CO2 flux was highest in the forest ecosystem with an estimated production of 2.1 kg C m2 yr1 followed by oil palm at 1.5 kg C m2 yr1 and sago at 1.1 kg C m2 yr1. The different dominant controlling factors in CO2 flux among the studied ecosystems suggested that land use affected the exchange of CO2 between tropical peatland and the atmosphere.
Open-Source Data Collection Techniques for Weapons Transfer Information

DTIC Science & Technology

2012-03-01

IR Infrared ISO International Organization for Standardization ITAR International Traffic in Arms Regulations NER Named Entity Recognition NLP ...Control Protocol UAE United Arab Emirates URI Uniform Resource Identifier URL Uniform Resource Locator USSR Union of Soviet Socialist Republics UTF...KOREA, DEMOCRATIC PEOPLE’S REPUBLIC OF North Korea KOREA, REPUBLIC OF South Korea LIBYAN ARAB JAMAHIRIYA Libya RUSSIAN FEDERATION Russia Table 3
Dissecting spontaneous cerebrospinal fluid collection.

PubMed

Champagne, Pierre-Olivier; Decarie, Jean-Claude; Crevier, Louis; Weil, Alexander G

2018-04-01

Hydrocephalus is a common condition in the pediatric population known to have many causes and presentation patterns. We report from the analysis of 2 cases the existence of a new complication of pediatric hydrocephalus. Naming this entity "dissecting intraparenchymal cerebrospinal fluid collection", we advance a hypothesis regarding its pathophysiology and discuss its clinical implications and management. Copyright © 2018 Elsevier Ltd. All rights reserved.
77 FR 36599 - Self-Regulatory Organizations; ICE Clear Credit LLC; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-19

...-Regulatory Organizations; ICE Clear Credit LLC; Notice of Filing and Immediate Effectiveness of Proposed Rule Change To Amend Schedule 502 of the ICE Clear Credit LLC Rules to Amend the Reference Entity Name for... Effectiveness of Proposed Rule Change to Amend Schedule 502 of the ICE Clear Credit LLC Rules to Amend the...
26 CFR 1.6696-1 - Claims for credit or refund by tax return preparers or appraisers.

Code of Federal Regulations, 2010 CFR

2010-04-01

... social security account number (or such alternative number as may be prescribed by the IRS in forms... the form title or number, by the taxpayer's (or nontaxable entity's) name and taxpayer identification... based; and (ii) Facts sufficient to apprise the IRS of the exact basis of each such claim. (e) Form for...

A system for de-identifying medical message board text.

PubMed

Benton, Adrian; Hill, Shawndra; Ungar, Lyle; Chung, Annie; Leonard, Charles; Freeman, Cristin; Holmes, John H

2011-06-09

There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients' experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors' personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.
A corpus for plant-chemical relationships in the biomedical domain.

PubMed

Choi, Wonjun; Kim, Baeksoo; Cho, Hyejin; Lee, Doheon; Lee, Hyunju

2016-09-20

Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals. In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively. We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .
A compilation of safety impact information for extractables associated with materials used in pharmaceutical packaging, delivery, administration, and manufacturing systems.

PubMed

Jenke, Dennis; Carlson, Tage

2014-01-01

Demonstrating suitability for intended use is necessary to register packaging, delivery/administration, or manufacturing systems for pharmaceutical products. During their use, such systems may interact with the pharmaceutical product, potentially adding extraneous entities to those products. These extraneous entities, termed leachables, have the potential to affect the product's performance and/or safety. To establish the potential safety impact, drug products and their packaging, delivery, or manufacturing systems are tested for leachables or extractables, respectively. This generally involves testing a sample (either the extract or the drug product) by a means that produces a test method response and then correlating the test method response with the identity and concentration of the entity causing the response. Oftentimes, analytical tests produce responses that cannot readily establish the associated entity's identity. Entities associated with un-interpretable responses are termed unknowns. Scientifically justifiable thresholds are used to establish those individual unknowns that represent an acceptable patient safety risk and thus which do not require further identification and, conversely, those unknowns whose potential safety impact require that they be identified. Such thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article documents toxicological information for over 540 extractables identified in laboratory testing of polymeric materials used in pharmaceutical applications. Relevant toxicological endpoints, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others were collated for these extractables or their structurally similar surrogates and were systematically assessed to produce a risk index, which represents a daily intake value for life-long intravenous administration. This systematic approach uses four uncertainty factors, each assigned a factor of 10, which consider the quality and relevance of the data, differences in route of administration, non-human species to human extrapolations, and inter-individual variation among humans. In addition to the risk index values, all extractables and most of their surrogates were classified for structural safety alerts using Cramer rules and for mutagenicity alerts using an in silico approach (Benigni/Bossa rule base for mutagenicity via Toxtree). Lastly, in vitro mutagenicity data (Ames Salmonella typimurium and Mouse Lymphoma tests) were collected from available databases (Chemical Carcinogenesis Research Information and Carcinogenic Potency Database). The frequency distributions of the resulting data were established; in general risk index values were normally distributed around a band ranging from 5 to 20 mg/day. The risk index associated with 95% level of the cumulative distribution plot was approximately 0.1 mg/day. Thirteen extractables in the dataset had individual risk index values less than 0.1 mg/day, although four of these had additional risk indices, based on multiple different toxicological endpoints, above 0.1 mg/day. Additionally, approximately 50% of the extractables were classified in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (no basis to assume safety). Lastly, roughly 20% of the extractables triggered either an in vitro or in silico alert for mutagenicity. When Cramer classifications and the mutagenicity alerts were compared to the risk indices, extractables with safety alerts generally had lower risk index values, although the differences in the risk index data distributions, extractables with or without alerts, were small and subtle. Leachables from packaging systems, manufacturing systems, or delivery devices can accumulate in drug products and potentially affect the drug product. Although drug products can be analyzed for leachables (and material extracts can be analyzed for extractables), not all leachables or extractables can be fully identified. Safety thresholds can be used to establish whether the unidentified substances can be deemed to be safe or whether additional analytical efforts need to be made to secure the identities. These thresholds are typically based on the statistical analysis of datasets containing toxicological information for more or less relevant compounds. This article contains safety data for over 500 extractables that were identified in laboratory characterizations of polymers used in pharmaceutical applications. The safety data consists of structural toxicity classifications of the extractables as well as calculated risk indices, where the risk indices were obtained by subjecting toxicological safety data, such as NOELs (no observed effects), NOAELs (no adverse effects), TDLOs (lowest published toxic dose), and others to a systematic evaluation process using appropriate uncertainty factors. Thus the risk index values represent daily exposures for the lifetime intravenous administration of drugs. The frequency distributions of the risk indices and Cramer classifications were examined. The risk index values were normally distributed around a range of 5 to 20 mg/day, and the risk index associated with the 95% level of the cumulative frequency plot was 0.1 mg/day. Approximately 50% of the extractables were in Cramer Class 1 (low risk of toxicity) and approximately 35% were in Cramer Class 3 (high risk of toxicity). Approximately 20% of the extractables produced an in vitro or in silico mutagenicity alert. In general, the distribution of risk index values was not strongly correlated with the either extractables' Cramer classification or by mutagenicity alerts. However, extractables with either in vitro or in silico alerts were somewhat more likely to have low risk index values. © PDA, Inc. 2014.
A review on biogenic synthesis of ZnO nanoparticles using plant extracts and microbes: A prospect towards green chemistry.

PubMed

Ahmed, Shakeel; Annu; Chaudhry, Saif Ali; Ikram, Saiqa

2017-01-01

Nanotechnology is emerging as an important area of research with its tremendous applications in all fields of science, engineering, medicine, pharmacy, etc. It involves the materials and their applications having one dimension in the range of 1-100nm. Generally, various techniques are used for syntheses of nanoparticles (NPs) viz. laser ablation, chemical reduction, milling, sputtering, etc. These conventional techniques e.g. chemical reduction method, in which various hazardous chemicals are used for the synthesis of NPs later become liable for innumerable health risks due to their toxicity and endangering serious concerns for environment, while other approaches are expensive, need high energy for the synthesis of NPs. However, biogenic synthesis method to produce NPs is eco-friendly and free of chemical contaminants for biological applications where purity is of concerns. In biological method, different biological entities such as extract, enzymes or proteins of a natural product are used to reduce and stabilised formation of NPs. The nature of these biological entities also influence the structure, shape, size and morphology of synthesized NPs. In this review, biogenic synthesis of zinc oxide (ZnO) NPs, procedures of syntheses, mechanism of formation and their various applications have been discussed. Various entities such as proteins, enzymes, phytochemicals, etc. available in the natural reductants are responsible for synthesis of ZnO NPs. Copyright © 2016 Elsevier B.V. All rights reserved.
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

PubMed Central

2011-01-01

Background The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. Results We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Conclusions Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. PMID:22151769
Phrase-based Multimedia Information Extraction

DTIC Science & Technology

2006-07-01

names with periods — J. K. Ramirez, T. Grant Smith, Lita S. Jones; names with commas — Hector Jones, Jr.; and conjoined names, such as Sherlock and Judy... Holmes . Using both the type and token metrics (described above), we tested these extensions and improvements to the name identification module on
nala: text mining natural language mutation mentions

PubMed Central

Cejuela, Juan Miguel; Bojchevski, Aleksandar; Uhlig, Carsten; Bekmukhametov, Rustem; Kumar Karn, Sanjeev; Mahmuti, Shpend; Baghudana, Ashish; Dubey, Ankit; Satagopam, Venkata P.; Rost, Burkhard

2017-01-01

Abstract Motivation: The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’). Results: We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only. Availability and Implementation: Source code, API and corpora freely available at: http://tagtog.net/-corpora/IDP4+. Contact: nala@rostlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28200120
Information extraction from multi-institutional radiology reports.

PubMed

Hassanpour, Saeed; Langlotz, Curtis P

2016-01-01

The radiology report is the most important source of clinical imaging information. It documents critical information about the patient's health and the radiologist's interpretation of medical findings. It also communicates information to the referring physicians and records that information for future clinical and research use. Although efforts to structure some radiology report information through predefined templates are beginning to bear fruit, a large portion of radiology report information is entered in free text. The free text format is a major obstacle for rapid extraction and subsequent use of information by clinicians, researchers, and healthcare information systems. This difficulty is due to the ambiguity and subtlety of natural language, complexity of described images, and variations among different radiologists and healthcare organizations. As a result, radiology reports are used only once by the clinician who ordered the study and rarely are used again for research and data mining. In this work, machine learning techniques and a large multi-institutional radiology report repository are used to extract the semantics of the radiology report and overcome the barriers to the re-use of radiology report information in clinical research and other healthcare applications. We describe a machine learning system to annotate radiology reports and extract report contents according to an information model. This information model covers the majority of clinically significant contents in radiology reports and is applicable to a wide variety of radiology study types. Our automated approach uses discriminative sequence classifiers for named-entity recognition to extract and organize clinically significant terms and phrases consistent with the information model. We evaluated our information extraction system on 150 radiology reports from three major healthcare organizations and compared its results to a commonly used non-machine learning information extraction method. We also evaluated the generalizability of our approach across different organizations by training and testing our system on data from different organizations. Our results show the efficacy of our machine learning approach in extracting the information model's elements (10-fold cross-validation average performance: precision: 87%, recall: 84%, F1 score: 85%) and its superiority and generalizability compared to the common non-machine learning approach (p-value<0.05). Our machine learning information extraction approach provides an effective automatic method to annotate and extract clinically significant information from a large collection of free text radiology reports. This information extraction system can help clinicians better understand the radiology reports and prioritize their review process. In addition, the extracted information can be used by researchers to link radiology reports to information from other data sources such as electronic health records and the patient's genome. Extracted information also can facilitate disease surveillance, real-time clinical decision support for the radiologist, and content-based image retrieval. Copyright © 2015 Elsevier B.V. All rights reserved.
Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.

PubMed

Bakal, Gokhan; Talari, Preetham; Kakani, Elijah V; Kavuluru, Ramakanth

2018-06-01

Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying different causal relations between biomedical entities is also critical to understand biomedical processes. Generally, natural language processing (NLP) and machine learning are used to predict specific relations between any given pair of entities using the distant supervision approach. To build high accuracy supervised predictive models to predict previously unknown treatment and causative relations between biomedical entities based only on semantic graph pattern features extracted from biomedical knowledge graphs. We used 7000 treats and 2918 causes hand-curated relations from the UMLS Metathesaurus to train and test our models. Our graph pattern features are extracted from simple paths connecting biomedical entities in the SemMedDB graph (based on the well-known SemMedDB database made available by the U.S. National Library of Medicine). Using these graph patterns connecting biomedical entities as features of logistic regression and decision tree models, we computed mean performance measures (precision, recall, F-score) over 100 distinct 80-20% train-test splits of the datasets. For all experiments, we used a positive:negative class imbalance of 1:10 in the test set to model relatively more realistic scenarios. Our models predict treats and causes relations with high F-scores of 99% and 90% respectively. Logistic regression model coefficients also help us identify highly discriminative patterns that have an intuitive interpretation. We are also able to predict some new plausible relations based on false positives that our models scored highly based on our collaborations with two physician co-authors. Finally, our decision tree models are able to retrieve over 50% of treatment relations from a recently created external dataset. We employed semantic graph patterns connecting pairs of candidate biomedical entities in a knowledge graph as features to predict treatment/causative relations between them. We provide what we believe is the first evidence in direct prediction of biomedical relations based on graph features. Our work complements lexical pattern based approaches in that the graph patterns can be used as additional features for weakly supervised relation prediction. Copyright © 2018 Elsevier Inc. All rights reserved.
Playing biology's name game: identifying protein names in scientific text.

PubMed

Hanisch, Daniel; Fluck, Juliane; Mevissen, Heinz-Theodor; Zimmer, Ralf

2003-01-01

A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.
76 FR 37349 - Agency Information Collection Activities; Submission to OMB for Review and Approval; Comment...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-27

... Vegetable Oil Production (Renewal) AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY... for Solvent Extraction for Vegetable Oil Production (Renewal). ICR Numbers: EPA ICR Number 1947.05... disclose the information. Respondents/Affected Entities: Owners or operators of vegetable oil production...
Acquired bilateral telangiectatic macules: a distinct clinical entity.

PubMed

Park, Ji-Hye; Lee, Dong Jun; Lee, Yoo-Jung; Jang, Yong Hyun; Kang, Hee Young; Kim, You Chan

2014-09-01

We evaluated 13 distinct patients with multiple telangiectatic pigmented macules confined mostly to the upper arms to determine if the clinical and histopathological features of these cases might represent a specific clinical entity. We retrospectively investigated the clinical, histopathologic, and immunohistochemical features of 13 patients with multiple telangiectatic pigmented macules on the upper arms who presented between January 2003 and December 2012. Epidermal pigmentation, melanogenic activity, melanocyte number, vascularity, epidermal thickness, and perivascular mast cell number of the specimens were evaluated. Clinically, the condition favored middle-aged men. On histopathologic examination, the lesional skin showed capillary proliferation and telangiectasia in the upper dermis. Histochemical and immunohistochemical analysis revealed basal hyperpigmentation and increased melanogenic activity in the lesional skin (P < .05). No significant difference in epidermal thickness or mast cell number was observed between the normal perilesional skin and the lesional skin. The clinical and histopathologic features of these lesions were relatively consistent in all patients. In addition, the features are quite distinct from other diseases. Based on clinical and histologic features, we suggest the name acquired bilateral telangiectatic macules for this new entity.
21 CFR 203.31 - Sample distribution by means other than mail or common carrier (direct delivery by a...

Code of Federal Regulations, 2012 CFR

2012-04-01

... practitioner, to the pharmacy of a hospital or other health care entity, provided that: (1) The manufacturer or... request. (2) A written request for delivery of a drug sample by a representative to the pharmacy of a... paragraph (b) of this section, the name and address of the pharmacy of the hospital or other health care...
21 CFR 203.31 - Sample distribution by means other than mail or common carrier (direct delivery by a...

Code of Federal Regulations, 2013 CFR

2013-04-01

... practitioner, to the pharmacy of a hospital or other health care entity, provided that: (1) The manufacturer or... request. (2) A written request for delivery of a drug sample by a representative to the pharmacy of a... paragraph (b) of this section, the name and address of the pharmacy of the hospital or other health care...
21 CFR 203.31 - Sample distribution by means other than mail or common carrier (direct delivery by a...

Code of Federal Regulations, 2011 CFR

2011-04-01

... practitioner, to the pharmacy of a hospital or other health care entity, provided that: (1) The manufacturer or... request. (2) A written request for delivery of a drug sample by a representative to the pharmacy of a... paragraph (b) of this section, the name and address of the pharmacy of the hospital or other health care...
21 CFR 203.31 - Sample distribution by means other than mail or common carrier (direct delivery by a...

Code of Federal Regulations, 2014 CFR

2014-04-01

... practitioner, to the pharmacy of a hospital or other health care entity, provided that: (1) The manufacturer or... request. (2) A written request for delivery of a drug sample by a representative to the pharmacy of a... paragraph (b) of this section, the name and address of the pharmacy of the hospital or other health care...
USSR Report, Military Affairs

DTIC Science & Technology

1984-12-12

extracted - Unfamiliar names rendered phonetically or transliterated are enclosed in parentheses. Words or names preceded by a question mark and...warrant officers, and six NCO’s and soldiers raised their scores. Moreover, St Lieutenants Ye. Shavrov, S. Boldin , and V. Radionov became specialists...absence—a scientific essay on the selected specialty; notarized copies of the VUZ graduation diploma and an extract from the academic record; service
The Novel Object and Unusual Name (NOUN) Database: A collection of novel images for use in experimental research.

PubMed

Horst, Jessica S; Hout, Michael C

2016-12-01

Many experimental research designs require images of novel objects. Here we introduce the Novel Object and Unusual Name (NOUN) Database. This database contains 64 primary novel object images and additional novel exemplars for ten basic- and nine global-level object categories. The objects' novelty was confirmed by both self-report and a lack of consensus on questions that required participants to name and identify the objects. We also found that object novelty correlated with qualifying naming responses pertaining to the objects' colors. The results from a similarity sorting task (and a subsequent multidimensional scaling analysis on the similarity ratings) demonstrated that the objects are complex and distinct entities that vary along several featural dimensions beyond simply shape and color. A final experiment confirmed that additional item exemplars comprised both sub- and superordinate categories. These images may be useful in a variety of settings, particularly for developmental psychology and other research in the language, categorization, perception, visual memory, and related domains.
GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

PubMed

Zhu, Qile; Li, Xiaolin; Conesa, Ana; Pereira, Cécile

2018-05-01

Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. andyli@ece.ufl.edu or aconesa@ufl.edu. Supplementary data are available at Bioinformatics online.
GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text

PubMed Central

Zhu, Qile; Li, Xiaolin; Conesa, Ana; Pereira, Cécile

2018-01-01

Abstract Motivation Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models. Results We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems. Availability and implementation The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN. Contact andyli@ece.ufl.edu or aconesa@ufl.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:29272325

Naming and recognizing famous faces in temporal lobe epilepsy.

PubMed

Glosser, G; Salvucci, A E; Chiaravalloti, N D

2003-07-08

To assess naming and recognition of faces of familiar famous people in patients with epilepsy before and after anterior temporal lobectomy (ATL). Color photographs of famous people were presented for naming and description to 63 patients with temporal lobe epilepsy (TLE) either before or after ATL and to 10 healthy age- and education-matched controls. Spontaneous naming of photographed famous people was impaired in all patient groups, but was most abnormal in patients who had undergone left ATL. When allowed to demonstrate knowledge of the famous faces through verbal descriptions, rather than naming, patients with left TLE, left ATL, and right TLE improved to normal levels, but patients with right ATL were still impaired, suggesting a new deficit in identifying famous faces. Naming of famous people was related to naming of other common objects, verbal memory, and perceptual discrimination of faces. Recognition of the identity of pictured famous people was more related to visuospatial perception and memory. Lesions in anterior regions of the right temporal lobe impair recognition of the identities of familiar faces, as well as the learning of new faces. Lesions in the left temporal lobe, especially in anterior regions, disrupt access to the names of known people, but do not affect recognition of the identities of famous faces. Results are consistent with the hypothesized role of lateralized anterior temporal lobe structures in facial recognition and naming of unique entities.
33 CFR 164.33 - Charts and publications.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Ocean Service, U.S. Army Corps of Engineers, or a river authority that— (i) Are of a large enough scale...) For the area to be transited, the current edition of, or applicable current extract from: (i) Tide tables published by private entities using data provided by the National Ocean Service. (ii) Tidal...
76 FR 76541 - Medicare Program; Availability of Medicare Data for Performance Measurement

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-07

... Centers for Medicare & Medicaid Services 42 CFR Part 401 Medicare Program; Availability of Medicare Data...; Availability of Medicare Data for Performance Measurement AGENCY: Centers for Medicare & Medicaid Services (CMS... regarding the release and use of standardized extracts of Medicare claims data for qualified entities to...
Efficient Execution Methods of Pivoting for Bulk Extraction of Entity-Attribute-Value-Modeled Data

PubMed Central

Luo, Gang; Frey, Lewis J.

2017-01-01

Entity-attribute-value (EAV) tables are widely used to store data in electronic medical records and clinical study data management systems. Before they can be used by various analytical (e.g., data mining and machine learning) programs, EAV-modeled data usually must be transformed into conventional relational table format through pivot operations. This time-consuming and resource-intensive process is often performed repeatedly on a regular basis, e.g., to provide a daily refresh of the content in a clinical data warehouse. Thus, it would be beneficial to make pivot operations as efficient as possible. In this paper, we present three techniques for improving the efficiency of pivot operations: 1) filtering out EAV tuples related to unneeded clinical parameters early on; 2) supporting pivoting across multiple EAV tables; and 3) conducting multi-query optimization. We demonstrate the effectiveness of our techniques through implementation. We show that our optimized execution method of pivoting using these techniques significantly outperforms the current basic execution method of pivoting. Our techniques can be used to build a data extraction tool to simplify the specification of and improve the efficiency of extracting data from the EAV tables in electronic medical records and clinical study data management systems. PMID:25608318
The contribution of the left anterior ventrolateral temporal lobe to the retrieval of personal semantics.

PubMed

Grilli, Matthew D; Bercel, John J; Wank, Aubrey A; Rapcsak, Steven Z

2018-06-04

Autobiographical facts and personal trait knowledge are conceptualized as distinct types of personal semantics, but the cognitive and neural mechanisms that separate them remain underspecified. One distinction may be their level of specificity, with autobiographical facts reflecting idiosyncratic conceptual knowledge and personal traits representing basic level category knowledge about the self. Given the critical role of the left anterior ventrolateral temporal lobe (AVTL) in the storage and retrieval of semantic information about unique entities, we hypothesized that knowledge of autobiographical facts may depend on the integrity of this region to a greater extent than personal traits. To provide neuropsychological evidence relevant to this issue, we investigated personal semantics, semantic knowledge of non-personal unique entities, and episodic memory in two individuals with well-defined left (MK) versus right (DW) AVTL lesions. Relative to controls, MK demonstrated preserved personal trait knowledge but impaired "experience-far" (i.e., spatiotemporal independent) autobiographical fact knowledge, semantic memory for non-personal unique entities, and episodic memory. In contrast, both experience-far autobiographical facts and personal traits were spared in DW, whereas episodic memory and aspects of semantic memory for non-personal unique entities were impaired. These findings support the notion that autobiographical facts and personal traits have distinct cognitive features and neural mechanisms. They also suggest a common organizing principle for personal and non-personal semantics, namely the specificity of such knowledge to an entity, which is reflected in the contribution of the left AVTL to retrieval. Copyright © 2018 Elsevier Ltd. All rights reserved.
Biological network extraction from scientific literature: state of the art and challenges.

PubMed

Li, Chen; Liakata, Maria; Rebholz-Schuhmann, Dietrich

2014-09-01

Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Prescription Drug Benefits: Cost Management Issues for Medicare

PubMed Central

Fox, Peter D.

2003-01-01

Little attention has been devoted in policy circles as to how Medicare would manage an outpatient prescription drug benefit. This article, first, discusses the role of the pharmacy benefits manager (PBM), the entity that processes claims and otherwise helps administer the benefit. It then discusses the major decisions that will be necessary regarding such matters as: which drugs should be covered; how broad should the pharmacy network be; whether there should be incentives to obtain generic rather than brand-name drugs when available; for drugs with no generic equivalent, should there be incentives to obtain less expensive, medically appropriate brand-name drugs; and how should prescription drug utilization be managed. PMID:15124374
Discovering body site and severity modifiers in clinical texts

PubMed Central

Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K

2014-01-01

Objective To research computational methods for discovering body site and severity modifiers in clinical texts. Methods We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES). PMID:24091648
Discovering body site and severity modifiers in clinical texts.

PubMed

Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K

2014-01-01

To research computational methods for discovering body site and severity modifiers in clinical texts. We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929. Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
Protective effects of long-term administration of Ziziphus jujuba fruit extract on cardiovascular responses in L-NAME hypertensive rats.

PubMed

Mohebbati, Reza; Bavarsad, Kosar; Rahimi, Maryam; Rakhshandeh, Hasan; Khajavi Rad, Abolfazl; Shafei, Mohammad Naser

2018-01-01

Ziziphus jujuba stimulates the release of nitric oxide (NO). Because NO is involved in cardiovascular regulations, in this study the effects of hydroalcoholic extract of Z. jujuba on cardiovascular responses in acute NG-nitro-L-arginine methyl ester (L-NAME) hypertensive rats were evaluated. Rats were divided into 6 group (n=6): 1) saline, 2) L-NAME received (10mg/kg) intravenously, 3) sodium nitroprusside (SNP) (50µg/kg)+L-NAME group received SNP before L-NAME and 4-6) three groups of Z. jujuba (100, 200 and 400mg/kg) that treated for four weeks and on the 28 th day, L-NAME was injected. Femoral artery and vein were cannulated for recording cardiovascular responses and drug injection, respectively. Systolic blood pressure (SBP), Mean arterial pressure (MAP) and heart rate (HR) were recorded continuously. Maximal changes (∆) of SBP, MAP and HR were calculated and compared to control and L-NAME groups. In L-NAME group, maximal ΔSBP (L-NAME: 44.15±4.0 mmHg vs control: 0.71±2.1 mmHg) and ΔMAP (L-NAME: 40.8±4.0 mmHg vs control: 0.57±1.6 mmHg) significantly increased (p<0.001 in both) but ∆HR was not significant as compared to control (p>0.05). All doses of Z. jujuba attenuated maximal ∆SBP and ∆MAP induced by L-NAME but only the lowest dose (100 mg/kg) had significant effects (ΔSBP: 20.36±5.6 mmHg vs L-NAME: 44.1±4.0 mmHg and ΔMAP: 20.8±4.5 mmHg vs L-NAME: 40.8±3.8 mmHg (p<0.05 to p<0.01)). The ∆HR at three doses was not significantly different from that of L-NAME group (p>0.05). Because long-term consumption of Z. jujuba extract, especially its lowest dose, attenuated cardiovascular responses induced by L-NAME, we suggest that Z. jujuba has potential beneficial effects in prevention of hypertension induced by NO deficiency.
Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.

PubMed

Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng; Kocher, Jean-Pierre; Liu, Hongfang

2015-06-06

Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3% for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10% in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5%. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.
[Optimization of supercritical fluid extraction of bioactive components in Ligusticum chuanxiong by orthogonal array design].

PubMed

Hu, Li-Cui; Wu, Xun; Yang, Xue-Dong

2013-10-01

With the yields of ferulic acid, coniferylferulate, Z-ligustilide, senkyunolide A, butylidenephthalide, butylphthalide, senkyunolide I, senkyunolide H, riligustilide, levistolide A, and total pharmacologically active ingredient as evaluation indexes, the extraction of Ligusticum chuanxiong by supercritical fluid technology was investigated through an orthogonal experiment L9 (3(4)). Four factors, namely temperature, pressure, flow rate of carbon dioxide, co-solvent concentration of the supercritical fluid, were investigated and optimized. Under the optimized conditions, namely 65 degrees C of temperature, 35 MPa of pressure, 1 L x min(-1) of CO2 flow rate, 8% of co-solvent concetration, supercritical fluid extraction could achieve a better yield than the conventional reflux extraction using methanol. And the supercritical fluid extraction process was validated to be stable and reliable.
Geographic names of the Antarctic

USGS Publications Warehouse

,; ,; ,; ,; Alberts, Fred G.

1995-01-01

This gazetteer contains 12,710 names approved by the United States Board on Geographic Names and the Secretary of the Interior for features in Antarctica and the area extending northward to the Antarctic Convergence. Included in this geographic area, the Antarctic region, are the off-lying South Shetland Islands, the South Orkney Islands, the South Sandwich Islands, South Georgia, Bouvetøya, Heard Island, and the Balleny Islands. These names have been approved for use by U.S. Government agencies. Their use by the Antarctic specialist and the public is highly recommended for the sake of accuracy and uniformity. This publication, which supersedes previous Board gazetteers or lists for the area, contains names approved as recently as December 1994. The basic name coverage of this gazetteer corresponds to that of maps at the scale of 1:250,000 or larger for coastal Antarctica, the off-lying islands, and isolated mountains and ranges of the continent. Much of the interior of Antarctica is a featureless ice plateau. That area has been mapped at a smaller scale and is nearly devoid of toponyms. All of the names are for natural features, such as mountains, glaciers, peninsulas, capes, bays, islands, and subglacial entities. The names of scientific stations have not been listed alphabetically, but they may appear in the texts of some decisions. For the names of submarine features, reference should be made to the Gazetteer of Undersea Features, 4th edition, U.S. Board on Geographic Names, 1990.
Detection of IUPAC and IUPAC-like chemical names.

PubMed

Klinger, Roman; Kolárik, Corinna; Fluck, Juliane; Hofmann-Apitius, Martin; Friedrich, Christoph M

2008-07-01

Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools. We present an IUPAC name recognizer with an F(1) measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases (F(1) measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run. We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.
Naming, labeling, and packaging of pharmaceuticals.

PubMed

Kenagy, J W; Stein, G C

2001-11-01

The problem of medical errors associated with the naming, labeling, and packaging of pharmaceuticals is discussed. Sound-alike and look-alike drug names and packages can lead pharmacists and nurses to unintended interchanges of drugs that can result in patient injury or death. The existing medication-use system is flawed because its safety depends on human perfection. Simplicity, standardization, differentiation, lack of duplication, and unambiguous communication are human factors concepts that are relevant to the medication-use process. These principles have often been ignored in drug naming, labeling, and packaging. Instead, current methods are based on long-standing commercial considerations and bureaucratic procedures. The process for naming a marketable drug is lengthy and complex and involves submission of a new chemical entity and patent application, generic naming, brand naming, FDA review, and final approval. Drug companies seek the fastest possible approval and may believe that the incremental benefit of human factors evaluation is small. "Trade dress" is the concept that underlies labeling and packaging issues for the drug industry. Drug companies are resistant to changing trade dress and brand names. Although a variety of private-sector organizations have called for reforms in drug naming, labeling, and packaging standards have been proposed, the problem remains. Drug names, labels, and packages are not selected and designed in accordance with human factors principles. FDA standards do not require application of these principles, the drug industry has struggled with change, and private-sector initiatives have had only limited success.
Iran Sanctions

DTIC Science & Technology

2012-02-10

Korea-Syria Non- Proliferation Act (INKSNA) authorizes sanctions on foreign persons (individuals or corporations , not countries or governments ) that...financial system and had asked the German government to order it closed.42 On May 23, 2011, the EU named EIH and about 100 other entities as Iran...The energy sector provides nearly 70% of Iran’s government revenues. Iran’s alarm stems from the potential loss of oil sales as a result of: • A
Actinic comedonal plaque.

PubMed

Eastern, J S; Martin, S

1980-12-01

Solitary plaques developed on the sun-exposed and damaged skin of five elderly, fair-skinned individuals. The lesions, erythematous to bluish confluent nodules and plaques with a cribriform appearance and comedone-like structures, presented a distinctive histologic picture of dilated, keratin-filled follicles within a matrix of amorphous, damaged collagen. We believe these cases demonstrate a distinct entity within the realm of actinic dermatoses, for which the name "actinic comedonal plaque" seems appropriate.
Input that Contradicts Young Children's Strategy for Mapping Novel Words Affects Their Phonological and Semantic Interpretation of Other Novel Words.

ERIC Educational Resources Information Center

Jarvis, Lorna Hernandez; Merriman, William E.; Barnett, Michelle; Hanba, Jessica; Van Haitsma, Kylee S.

2004-01-01

Children tend to choose an entity they cannot already label, rather than one they can, as the likely referent of a novel noun. The effect of input that contradicts this strategy on the interpretation of other novel nouns was investigated. In pre- and posttests, 4-year-olds were asked to judge whether novel nouns referred to "name-similar" familiar…
JPRS Report, East Europe

DTIC Science & Technology

1990-05-04

played a decisive role in preserving Romanian revolution’s victory. This is because the forces with an national identity . But now they are branding ...economy into a prosperous eco- up against private entities offering virtually identical nomic system will ultimately be a success. services, perhaps at a...tures in light industry and in brand -name consumer For CEMA countries this is a relatively new format, and goods production (the West-German Salamander
Energy concern list. [List of 22,900 names of persons, businesses, companies, corporations, etc. engaged in energy-related activities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1978-03-01

Subsection 603(a) of the Department of Energy Organization Act (P.L. 95-91, August 4, 1977) requires non-exempt employees to disclose the amount and source of income received from energy concerns (as defined in subsection 602(b) of the Act) by themselves, their spouses, or dependents and the identity and value of interests knowingly held in such concerns. In addition, supervisory employees (as defined in subsection 601(a) of the Act) are prohibited by subsection 601(a) of the Act from knowingly receiving compensation from or holding any official relation with any energy concern, or owning stock or bonds of any energy concern, or havingmore » any pecuniary interest therein. Subsection 601(c)(1) of the DOE Organization Act requires that a list of entities determined to be energy concerns be prepared and periodically published. This listing was prepared and is published to comply with that provision of law. The approximately 22,900 names appearing in this list are persons, businesses, companies, corporations and other entities, engaged in energy related activities, as described in section 601(b) of the DOE Organization Act. This list is based on information available as of February 24, 1978.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.