Sample records for processing nlp tasks

  1. Semi-Automated Methods for Refining a Domain-Specific Terminology Base

    DTIC Science & Technology

    2011-02-01

    only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National

  2. Development and evaluation of task-specific NLP framework in China.

    PubMed

    Ge, Caixia; Zhang, Yinsheng; Huang, Zhenzhen; Jia, Zheng; Ju, Meizhi; Duan, Huilong; Li, Haomin

    2015-01-01

    Natural language processing (NLP) has been designed to convert narrative text into structured data. Although some general NLP architectures have been developed, a task-specific NLP framework to facilitate the effective use of data is still a challenge in lexical resource limited regions, such as China. The purpose of this study is to design and develop a task-specific NLP framework to extract targeted information from particular documents by adopting dedicated algorithms on current limited lexical resources. In this framework, a shared and evolving ontology mechanism was designed. The result has shown that such a free text driven platform will accelerate the NLP technology acceptance in China.

  3. Community challenges in biomedical text mining over 10 years: success, failure and the future

    PubMed Central

    Huang, Chung-Chi

    2016-01-01

    One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations. PMID:25935162

  4. Usability Evaluation of an Unstructured Clinical Document Query Tool for Researchers.

    PubMed

    Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B

    2018-01-01

    Natural Language Processing - Patient Information Extraction for Researchers (NLP-PIER) was developed for clinical researchers for self-service Natural Language Processing (NLP) queries with clinical notes. This study was to conduct a user-centered analysis with clinical researchers to gain insight into NLP-PIER's usability and to gain an understanding of the needs of clinical researchers when using an application for searching clinical notes. Clinical researcher participants (n=11) completed tasks using the system's two existing search interfaces and completed a set of surveys and an exit interview. Quantitative data including time on task, task completion rate, and survey responses were collected. Interviews were analyzed qualitatively. Survey scores, time on task and task completion proportions varied widely. Qualitative analysis indicated that participants found the system to be useful and usable in specific projects. This study identified several usability challenges and our findings will guide the improvement of NLP-PIER 's interfaces.

  5. Community challenges in biomedical text mining over 10 years: success, failure and the future.

    PubMed

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

  6. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

    PubMed

    Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis

    2017-09-01

    We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.

    PubMed

    Demner-Fushman, Dina; Mork, James G; Shooshan, Sonya E; Aronson, Alan R

    2010-08-01

    Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications. Published by Elsevier Inc.

  8. Automating curation using a natural language processing pipeline

    PubMed Central

    Alex, Beatrice; Grover, Claire; Haddow, Barry; Kabadjov, Mijail; Klein, Ewan; Matthews, Michael; Tobin, Richard; Wang, Xinglong

    2008-01-01

    Background: The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general. Results: Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average. Conclusion: The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems. PMID:18834488

  9. Rapid Training of Information Extraction with Local and Global Data Views

    DTIC Science & Technology

    2012-05-01

    56 xiii 4.1 An example of words and their bit string representations. Bold ones are transliterated Arabic words...Natural Language Processing ( NLP ) community faces new tasks and new domains all the time. Without enough labeled data of a new task or a new domain to...conduct supervised learning, semi-supervised learning is particularly attractive to NLP researchers since it only requires a handful of labeled examples

  10. Building gold standard corpora for medical natural language processing tasks.

    PubMed

    Deleger, Louise; Li, Qi; Lingren, Todd; Kaiser, Megan; Molnar, Katalin; Stoutenborough, Laura; Kouril, Michal; Marsolo, Keith; Solti, Imre

    2012-01-01

    We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.

  11. A Hybrid Approach to Clinical Question Answering

    DTIC Science & Technology

    2014-11-01

    participation in TREC, we submitted a single run using a hybrid Natural Language Processing ( NLP )-driven approach to accomplish the given task. Evaluation re...for the CDS track uses a variety of NLP - based techniques to address the clinical questions provided. We present a description of our approach, and...discuss our experimental setup, results and eval- uation in the subsequent sections. 2 Description of Our Approach Our hybrid NLP -driven method presents a

  12. Ground Truth Creation for Complex Clinical NLP Tasks - an Iterative Vetting Approach and Lessons Learned.

    PubMed

    Liang, Jennifer J; Tsou, Ching-Huei; Devarakonda, Murthy V

    2017-01-01

    Natural language processing (NLP) holds the promise of effectively analyzing patient record data to reduce cognitive load on physicians and clinicians in patient care, clinical research, and hospital operations management. A critical need in developing such methods is the "ground truth" dataset needed for training and testing the algorithms. Beyond localizable, relatively simple tasks, ground truth creation is a significant challenge because medical experts, just as physicians in patient care, have to assimilate vast amounts of data in EHR systems. To mitigate potential inaccuracies of the cognitive challenges, we present an iterative vetting approach for creating the ground truth for complex NLP tasks. In this paper, we present the methodology, and report on its use for an automated problem list generation task, its effect on the ground truth quality and system accuracy, and lessons learned from the effort.

  13. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

    PubMed

    Soysal, Ergin; Wang, Jingqi; Jiang, Min; Wu, Yonghui; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua

    2017-11-24

    Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.

    PubMed

    Tseytlin, Eugene; Mitchell, Kevin; Legowski, Elizabeth; Corrigan, Julia; Chavan, Girish; Jacobson, Rebecca S

    2016-01-14

    Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.

  15. Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study.

    PubMed

    Kaufman, David R; Sheehan, Barbara; Stetson, Peter; Bhatt, Ashish R; Field, Adele I; Patel, Chirag; Maisel, James Mark

    2016-10-28

    The process of documentation in electronic health records (EHRs) is known to be time consuming, inefficient, and cumbersome. The use of dictation coupled with manual transcription has become an increasingly common practice. In recent years, natural language processing (NLP)-enabled data capture has become a viable alternative for data entry. It enables the clinician to maintain control of the process and potentially reduce the documentation burden. The question remains how this NLP-enabled workflow will impact EHR usability and whether it can meet the structured data and other EHR requirements while enhancing the user's experience. The objective of this study is evaluate the comparative effectiveness of an NLP-enabled data capture method using dictation and data extraction from transcribed documents (NLP Entry) in terms of documentation time, documentation quality, and usability versus standard EHR keyboard-and-mouse data entry. This formative study investigated the results of using 4 combinations of NLP Entry and Standard Entry methods ("protocols") of EHR data capture. We compared a novel dictation-based protocol using MediSapien NLP (NLP-NLP) for structured data capture against a standard structured data capture protocol (Standard-Standard) as well as 2 novel hybrid protocols (NLP-Standard and Standard-NLP). The 31 participants included neurologists, cardiologists, and nephrologists. Participants generated 4 consultation or admission notes using 4 documentation protocols. We recorded the time on task, documentation quality (using the Physician Documentation Quality Instrument, PDQI-9), and usability of the documentation processes. A total of 118 notes were documented across the 3 subject areas. The NLP-NLP protocol required a median of 5.2 minutes per cardiology note, 7.3 minutes per nephrology note, and 8.5 minutes per neurology note compared with 16.9, 20.7, and 21.2 minutes, respectively, using the Standard-Standard protocol and 13.8, 21.3, and 18.7 minutes using the Standard-NLP protocol (1 of 2 hybrid methods). Using 8 out of 9 characteristics measured by the PDQI-9 instrument, the NLP-NLP protocol received a median quality score sum of 24.5; the Standard-Standard protocol received a median sum of 29; and the Standard-NLP protocol received a median sum of 29.5. The mean total score of the usability measure was 36.7 when the participants used the NLP-NLP protocol compared with 30.3 when they used the Standard-Standard protocol. In this study, the feasibility of an approach to EHR data capture involving the application of NLP to transcribed dictation was demonstrated. This novel dictation-based approach has the potential to reduce the time required for documentation and improve usability while maintaining documentation quality. Future research will evaluate the NLP-based EHR data capture approach in a clinical setting. It is reasonable to assert that EHRs will increasingly use NLP-enabled data entry tools such as MediSapien NLP because they hold promise for enhancing the documentation process and end-user experience. ©David R. Kaufman, Barbara Sheehan, Peter Stetson, Ashish R. Bhatt, Adele I. Field, Chirag Patel, James Mark Maisel. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 28.10.2016.

  16. Natural Language Processing in Radiology: A Systematic Review.

    PubMed

    Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A

    2016-05-01

    Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.

  17. Construct Validity in TOEFL iBT Speaking Tasks: Insights from Natural Language Processing

    ERIC Educational Resources Information Center

    Kyle, Kristopher; Crossley, Scott A.; McNamara, Danielle S.

    2016-01-01

    This study explores the construct validity of speaking tasks included in the TOEFL iBT (e.g., integrated and independent speaking tasks). Specifically, advanced natural language processing (NLP) tools, MANOVA difference statistics, and discriminant function analyses (DFA) are used to assess the degree to which and in what ways responses to these…

  18. Natural Language Processing–Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study

    PubMed Central

    Sheehan, Barbara; Stetson, Peter; Bhatt, Ashish R; Field, Adele I; Patel, Chirag; Maisel, James Mark

    2016-01-01

    Background The process of documentation in electronic health records (EHRs) is known to be time consuming, inefficient, and cumbersome. The use of dictation coupled with manual transcription has become an increasingly common practice. In recent years, natural language processing (NLP)–enabled data capture has become a viable alternative for data entry. It enables the clinician to maintain control of the process and potentially reduce the documentation burden. The question remains how this NLP-enabled workflow will impact EHR usability and whether it can meet the structured data and other EHR requirements while enhancing the user’s experience. Objective The objective of this study is evaluate the comparative effectiveness of an NLP-enabled data capture method using dictation and data extraction from transcribed documents (NLP Entry) in terms of documentation time, documentation quality, and usability versus standard EHR keyboard-and-mouse data entry. Methods This formative study investigated the results of using 4 combinations of NLP Entry and Standard Entry methods (“protocols”) of EHR data capture. We compared a novel dictation-based protocol using MediSapien NLP (NLP-NLP) for structured data capture against a standard structured data capture protocol (Standard-Standard) as well as 2 novel hybrid protocols (NLP-Standard and Standard-NLP). The 31 participants included neurologists, cardiologists, and nephrologists. Participants generated 4 consultation or admission notes using 4 documentation protocols. We recorded the time on task, documentation quality (using the Physician Documentation Quality Instrument, PDQI-9), and usability of the documentation processes. Results A total of 118 notes were documented across the 3 subject areas. The NLP-NLP protocol required a median of 5.2 minutes per cardiology note, 7.3 minutes per nephrology note, and 8.5 minutes per neurology note compared with 16.9, 20.7, and 21.2 minutes, respectively, using the Standard-Standard protocol and 13.8, 21.3, and 18.7 minutes using the Standard-NLP protocol (1 of 2 hybrid methods). Using 8 out of 9 characteristics measured by the PDQI-9 instrument, the NLP-NLP protocol received a median quality score sum of 24.5; the Standard-Standard protocol received a median sum of 29; and the Standard-NLP protocol received a median sum of 29.5. The mean total score of the usability measure was 36.7 when the participants used the NLP-NLP protocol compared with 30.3 when they used the Standard-Standard protocol. Conclusions In this study, the feasibility of an approach to EHR data capture involving the application of NLP to transcribed dictation was demonstrated. This novel dictation-based approach has the potential to reduce the time required for documentation and improve usability while maintaining documentation quality. Future research will evaluate the NLP-based EHR data capture approach in a clinical setting. It is reasonable to assert that EHRs will increasingly use NLP-enabled data entry tools such as MediSapien NLP because they hold promise for enhancing the documentation process and end-user experience. PMID:27793791

  19. Natural Language Processing in aid of FlyBase curators

    PubMed Central

    Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian; McQuilton, Peter; Vlachos, Andreas; Gasperin, Caroline; Drysdale, Rachel; Briscoe, Ted

    2008-01-01

    Background Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. Results PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. Conclusion We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully. PMID:18410678

  20. Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013

    PubMed Central

    2015-01-01

    Background Since their introduction in 2009, the BioNLP Shared Task events have been instrumental in advancing the development of methods and resources for the automatic extraction of information from the biomedical literature. In this paper, we present the Cancer Genetics (CG) and Pathway Curation (PC) tasks, two event extraction tasks introduced in the BioNLP Shared Task 2013. The CG task focuses on cancer, emphasizing the extraction of physiological and pathological processes at various levels of biological organization, and the PC task targets reactions relevant to the development of biomolecular pathway models, defining its extraction targets on the basis of established pathway representations and ontologies. Results Six groups participated in the CG task and two groups in the PC task, together applying a wide range of extraction approaches including both established state-of-the-art systems and newly introduced extraction methods. The best-performing systems achieved F-scores of 55% on the CG task and 53% on the PC task, demonstrating a level of performance comparable to the best results achieved in similar previously proposed tasks. Conclusions The results indicate that existing event extraction technology can generalize to meet the novel challenges represented by the CG and PC task settings, suggesting that extraction methods are capable of supporting the construction of knowledge bases on the molecular mechanisms of cancer and the curation of biomolecular pathway models. The CG and PC tasks continue as open challenges for all interested parties, with data, tools and resources available from the shared task homepage. PMID:26202570

  1. Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

    PubMed

    Velupillai, S; Mowery, D; South, B R; Kvist, M; Dalianis, H

    2015-08-13

    We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.

  2. v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text

    PubMed Central

    Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.

    2016-01-01

    Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667

  3. Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis

    PubMed Central

    Mowery, D.; South, B. R.; Kvist, M.; Dalianis, H.

    2015-01-01

    Summary Objectives We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. Methods We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. Results Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. Conclusions There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices. PMID:26293867

  4. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

    PubMed Central

    Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H

    2015-01-01

    Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications. Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. PMID:25336595

  5. Implicitly-Defined Neural Networks for Sequence Labeling

    DTIC Science & Technology

    2016-09-09

    this is to improve performance on long-range dependencies, and to improve stability (solution drift) in NLP tasks. We choose an implicit neural network...there have been NLP tasks, and there are many effective approaches to dealing with them. In the context of HMMs, there are the “Forward-Backward...Malyska for interesting discussion of related work, and Liz Salesky for NLP application suggestions! Tagger WSJ Accuracy Word vectors only 0.9626 Single

  6. A Natural Language Processing-based Model to Automate MRI Brain Protocol Selection and Prioritization.

    PubMed

    Brown, Andrew D; Marotta, Thomas R

    2017-02-01

    Incorrect imaging protocol selection can contribute to increased healthcare cost and waste. To help healthcare providers improve the quality and safety of medical imaging services, we developed and evaluated three natural language processing (NLP) models to determine whether NLP techniques could be employed to aid in clinical decision support for protocoling and prioritization of magnetic resonance imaging (MRI) brain examinations. To test the feasibility of using an NLP model to support clinical decision making for MRI brain examinations, we designed three different medical imaging prediction tasks, each with a unique outcome: selecting an examination protocol, evaluating the need for contrast administration, and determining priority. We created three models for each prediction task, each using a different classification algorithm-random forest, support vector machine, or k-nearest neighbor-to predict outcomes based on the narrative clinical indications and demographic data associated with 13,982 MRI brain examinations performed from January 1, 2013 to June 30, 2015. Test datasets were used to calculate the accuracy, sensitivity and specificity, predictive values, and the area under the curve. Our optimal results show an accuracy of 82.9%, 83.0%, and 88.2% for the protocol selection, contrast administration, and prioritization tasks, respectively, demonstrating that predictive algorithms can be used to aid in clinical decision support for examination protocoling. NLP models developed from the narrative clinical information provided by referring clinicians and demographic data are feasible methods to predict the protocol and priority of MRI brain examinations. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  7. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing.

    PubMed

    Demner-Fushman, D; Elhadad, N

    2016-11-10

    This paper reviews work over the past two years in Natural Language Processing (NLP) applied to clinical and consumer-generated texts. We included any application or methodological publication that leverages text to facilitate healthcare and address the health-related needs of consumers and populations. Many important developments in clinical text processing, both foundational and task-oriented, were addressed in community- wide evaluations and discussed in corresponding special issues that are referenced in this review. These focused issues and in-depth reviews of several other active research areas, such as pharmacovigilance and summarization, allowed us to discuss in greater depth disease modeling and predictive analytics using clinical texts, and text analysis in social media for healthcare quality assessment, trends towards online interventions based on rapid analysis of health-related posts, and consumer health question answering, among other issues. Our analysis shows that although clinical NLP continues to advance towards practical applications and more NLP methods are used in large-scale live health information applications, more needs to be done to make NLP use in clinical applications a routine widespread reality. Progress in clinical NLP is mirrored by developments in social media text analysis: the research is moving from capturing trends to addressing individual health-related posts, thus showing potential to become a tool for precision medicine and a valuable addition to the standard healthcare quality evaluation tools.

  8. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

    PubMed

    Khalifa, Abdulrahman; Meystre, Stéphane

    2015-12-01

    The 2014 i2b2 natural language processing shared task focused on identifying cardiovascular risk factors such as high blood pressure, high cholesterol levels, obesity and smoking status among other factors found in health records of diabetic patients. In addition, the task involved detecting medications, and time information associated with the extracted data. This paper presents the development and evaluation of a natural language processing (NLP) application conceived for this i2b2 shared task. For increased efficiency, the application main components were adapted from two existing NLP tools implemented in the Apache UIMA framework: Textractor (for dictionary-based lookup) and cTAKES (for preprocessing and smoking status detection). The application achieved a final (micro-averaged) F1-measure of 87.5% on the final evaluation test set. Our attempt was mostly based on existing tools adapted with minimal changes and allowed for satisfying performance with limited development efforts. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

    PubMed

    Divita, G; Carter, M; Redd, A; Zeng, Q; Gupta, K; Trautner, B; Samore, M; Gundlapalli, A

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". This paper describes the scale-up efforts at the VA Salt Lake City Health Care System to address processing large corpora of clinical notes through a natural language processing (NLP) pipeline. The use case described is a current project focused on detecting the presence of an indwelling urinary catheter in hospitalized patients and subsequent catheter-associated urinary tract infections. An NLP algorithm using v3NLP was developed to detect the presence of an indwelling urinary catheter in hospitalized patients. The algorithm was tested on a small corpus of notes on patients for whom the presence or absence of a catheter was already known (reference standard). In planning for a scale-up, we estimated that the original algorithm would have taken 2.4 days to run on a larger corpus of notes for this project (550,000 notes), and 27 days for a corpus of 6 million records representative of a national sample of notes. We approached scaling-up NLP pipelines through three techniques: pipeline replication via multi-threading, intra-annotator threading for tasks that can be further decomposed, and remote annotator services which enable annotator scale-out. The scale-up resulted in reducing the average time to process a record from 206 milliseconds to 17 milliseconds or a 12- fold increase in performance when applied to a corpus of 550,000 notes. Purposely simplistic in nature, these scale-up efforts are the straight forward evolution from small scale NLP processing to larger scale extraction without incurring associated complexities that are inherited by the use of the underlying UIMA framework. These efforts represent generalizable and widely applicable techniques that will aid other computationally complex NLP pipelines that are of need to be scaled out for processing and analyzing big data.

  10. Natural Language Processing Technologies in Radiology Research and Clinical Applications.

    PubMed

    Cai, Tianrun; Giannopoulos, Andreas A; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K; Rybicki, Frank J; Mitsouras, Dimitrios

    2016-01-01

    The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016.

  11. Natural Language Processing Technologies in Radiology Research and Clinical Applications

    PubMed Central

    Cai, Tianrun; Giannopoulos, Andreas A.; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K.; Rybicki, Frank J.

    2016-01-01

    The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively “mine” these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. “Intelligent” search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016 PMID:26761536

  12. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).

    PubMed

    Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S

    2016-10-01

    Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

  13. GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

    PubMed

    Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

    2013-01-28

    Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.

  14. Functional evaluation of out-of-the-box text-mining tools for data-mining tasks.

    PubMed

    Jung, Kenneth; LePendu, Paea; Iyer, Srinivasan; Bauer-Mehren, Anna; Percha, Bethany; Shah, Nigam H

    2015-01-01

    The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug-drug interactions, and learning used-to-treat relationships between drugs and indications. We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  15. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing

    PubMed Central

    Elhadad, N.

    2016-01-01

    Summary Objectives This paper reviews work over the past two years in Natural Language Processing (NLP) applied to clinical and consumer-generated texts. Methods We included any application or methodological publication that leverages text to facilitate healthcare and address the health-related needs of consumers and populations. Results Many important developments in clinical text processing, both foundational and task-oriented, were addressed in community-wide evaluations and discussed in corresponding special issues that are referenced in this review. These focused issues and in-depth reviews of several other active research areas, such as pharmacovigilance and summarization, allowed us to discuss in greater depth disease modeling and predictive analytics using clinical texts, and text analysis in social media for healthcare quality assessment, trends towards online interventions based on rapid analysis of health-related posts, and consumer health question answering, among other issues. Conclusions Our analysis shows that although clinical NLP continues to advance towards practical applications and more NLP methods are used in large-scale live health information applications, more needs to be done to make NLP use in clinical applications a routine widespread reality. Progress in clinical NLP is mirrored by developments in social media text analysis: the research is moving from capturing trends to addressing individual health-related posts, thus showing potential to become a tool for precision medicine and a valuable addition to the standard healthcare quality evaluation tools. PMID:27830255

  16. Adaptable, high recall, event extraction system with minimal configuration.

    PubMed

    Miwa, Makoto; Ananiadou, Sophia

    2015-01-01

    Biomedical event extraction has been a major focus of biomedical natural language processing (BioNLP) research since the first BioNLP shared task was held in 2009. Accordingly, a large number of event extraction systems have been developed. Most such systems, however, have been developed for specific tasks and/or incorporated task specific settings, making their application to new corpora and tasks problematic without modification of the systems themselves. There is thus a need for event extraction systems that can achieve high levels of accuracy when applied to corpora in new domains, without the need for exhaustive tuning or modification, whilst retaining competitive levels of performance. We have enhanced our state-of-the-art event extraction system, EventMine, to alleviate the need for task-specific tuning. Task-specific details are specified in a configuration file, while extensive task-specific parameter tuning is avoided through the integration of a weighting method, a covariate shift method, and their combination. The task-specific configuration and weighting method have been employed within the context of two different sub-tasks of BioNLP shared task 2013, i.e. Cancer Genetics (CG) and Pathway Curation (PC), removing the need to modify the system specifically for each task. With minimal task specific configuration and tuning, EventMine achieved the 1st place in the PC task, and 2nd in the CG, achieving the highest recall for both tasks. The system has been further enhanced following the shared task by incorporating the covariate shift method and entity generalisations based on the task definitions, leading to further performance improvements. We have shown that it is possible to apply a state-of-the-art event extraction system to new tasks with high levels of performance, without having to modify the system internally. Both covariate shift and weighting methods are useful in facilitating the production of high recall systems. These methods and their combination can adapt a model to the target data with no deep tuning and little manual configuration.

  17. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management

    PubMed Central

    Keselman, Alla; Rosemblat, Graciela; Kilicoglu, Halil; Fiszman, Marcelo; Jin, Honglan; Shin, Dongwook; Rindflesch, Thomas C.

    2013-01-01

    Explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot-test in which two information specialists use the adapted application for a realistic information seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design. PMID:24311971

  18. Biological event composition

    PubMed Central

    2012-01-01

    Background In recent years, biological event extraction has emerged as a key natural language processing task, aiming to address the information overload problem in accessing the molecular biology literature. The BioNLP shared task competitions have contributed to this recent interest considerably. The first competition (BioNLP'09) focused on extracting biological events from Medline abstracts from a narrow domain, while the theme of the latest competition (BioNLP-ST'11) was generalization and a wider range of text types, event types, and subject domains were considered. We view event extraction as a building block in larger discourse interpretation and propose a two-phase, linguistically-grounded, rule-based methodology. In the first phase, a general, underspecified semantic interpretation is composed from syntactic dependency relations in a bottom-up manner. The notion of embedding underpins this phase and it is informed by a trigger dictionary and argument identification rules. Coreference resolution is also performed at this step, allowing extraction of inter-sentential relations. The second phase is concerned with constraining the resulting semantic interpretation by shared task specifications. We evaluated our general methodology on core biological event extraction and speculation/negation tasks in three main tracks of BioNLP-ST'11 (GENIA, EPI, and ID). Results We achieved competitive results in GENIA and ID tracks, while our results in the EPI track leave room for improvement. One notable feature of our system is that its performance across abstracts and articles bodies is stable. Coreference resolution results in minor improvement in system performance. Due to our interest in discourse-level elements, such as speculation/negation and coreference, we provide a more detailed analysis of our system performance in these subtasks. Conclusions The results demonstrate the viability of a robust, linguistically-oriented methodology, which clearly distinguishes general semantic interpretation from shared task specific aspects, for biological event extraction. Our error analysis pinpoints some shortcomings, which we plan to address in future work within our incremental system development methodology. PMID:22759461

  19. Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges.

    PubMed

    Wong, Adrian; Plasek, Joseph M; Montecalvo, Steven P; Zhou, Li

    2018-06-09

    The safety of medication use has been a priority in the United States since the late 1930s. Recently, it has gained prominence due to the increasing amount of data suggesting that a large amount of patient harm is preventable and can be mitigated with effective risk strategies that have not been sufficiently adopted. Adverse events from medications are part of clinical practice, but the ability to identify a patient's risk and to minimize that risk must be a priority. The ability to identify adverse events has been a challenge due to limitations of available data sources, which are often free text. The use of natural language processing (NLP) may help to address these limitations. NLP is the artificial intelligence domain of computer science that uses computers to manipulate unstructured data (i.e., narrative text or speech data) in the context of a specific task. In this narrative review, we illustrate the fundamentals of NLP and discuss NLP's application to medication safety in four data sources: electronic health records, Internet-based data, published literature, and reporting systems. Given the magnitude of available data from these sources, a growing area is the use of computer algorithms to help automatically detect associations between medications and adverse effects. The main benefit of NLP is in the time savings associated with automation of various medication safety tasks such as the medication reconciliation process facilitated by computers, as well as the potential for near-real time identification of adverse events for postmarketing surveillance such as those posted on social media that would otherwise go unanalyzed. NLP is limited by a lack of data sharing between health care organizations due to insufficient interoperability capabilities, inhibiting large-scale adverse event monitoring across populations. We anticipate that future work in this area will focus on the integration of data sources from different domains to improve the ability to identify potential adverse events more quickly and to improve clinical decision support with regard to a patient's estimated risk for specific adverse events at the time of medication prescription or review. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  20. Task-Driven Dynamic Text Summarization

    ERIC Educational Resources Information Center

    Workman, Terri Elizabeth

    2011-01-01

    The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…

  1. Adaptable, high recall, event extraction system with minimal configuration

    PubMed Central

    2015-01-01

    Background Biomedical event extraction has been a major focus of biomedical natural language processing (BioNLP) research since the first BioNLP shared task was held in 2009. Accordingly, a large number of event extraction systems have been developed. Most such systems, however, have been developed for specific tasks and/or incorporated task specific settings, making their application to new corpora and tasks problematic without modification of the systems themselves. There is thus a need for event extraction systems that can achieve high levels of accuracy when applied to corpora in new domains, without the need for exhaustive tuning or modification, whilst retaining competitive levels of performance. Results We have enhanced our state-of-the-art event extraction system, EventMine, to alleviate the need for task-specific tuning. Task-specific details are specified in a configuration file, while extensive task-specific parameter tuning is avoided through the integration of a weighting method, a covariate shift method, and their combination. The task-specific configuration and weighting method have been employed within the context of two different sub-tasks of BioNLP shared task 2013, i.e. Cancer Genetics (CG) and Pathway Curation (PC), removing the need to modify the system specifically for each task. With minimal task specific configuration and tuning, EventMine achieved the 1st place in the PC task, and 2nd in the CG, achieving the highest recall for both tasks. The system has been further enhanced following the shared task by incorporating the covariate shift method and entity generalisations based on the task definitions, leading to further performance improvements. Conclusions We have shown that it is possible to apply a state-of-the-art event extraction system to new tasks with high levels of performance, without having to modify the system internally. Both covariate shift and weighting methods are useful in facilitating the production of high recall systems. These methods and their combination can adapt a model to the target data with no deep tuning and little manual configuration. PMID:26201408

  2. Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability.

    PubMed

    Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

    2015-01-01

    In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.

  3. Analyzing Discourse Processing Using a Simple Natural Language Processing Tool

    ERIC Educational Resources Information Center

    Crossley, Scott A.; Allen, Laura K.; Kyle, Kristopher; McNamara, Danielle S.

    2014-01-01

    Natural language processing (NLP) provides a powerful approach for discourse processing researchers. However, there remains a notable degree of hesitation by some researchers to consider using NLP, at least on their own. The purpose of this article is to introduce and make available a "simple" NLP (SiNLP) tool. The overarching goal of…

  4. On Dataless Hierarchical Text Classification (Author’s Manuscript)

    DTIC Science & Technology

    2014-07-27

    compound talk.politics.mideast politics mideast israel arab jews jewish muslim talk.politics.misc politics gay homosexual sexual alt.atheism atheism...tion in NLP tasks; it was further used in several NLP works, such as by Liang (2005), to measure words’ distributional similarity. This method...embedding trained by neural networks has been used widely in the NLP community and has become a hot trend recently. In this pa- per, we test the suitability

  5. Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

    PubMed

    Ferraro, Jeffrey P; Daumé, Hal; Duvall, Scott L; Chapman, Wendy W; Harkema, Henk; Haug, Peter J

    2013-01-01

    Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives. Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%. ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.

  6. Open Source Clinical NLP - More than Any Single System.

    PubMed

    Masanz, James; Pakhomov, Serguei V; Xu, Hua; Wu, Stephen T; Chute, Christopher G; Liu, Hongfang

    2014-01-01

    The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP's mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice.

  7. Transfer Learning for Adaptive Relation Extraction

    DTIC Science & Technology

    2011-09-13

    other NLP tasks, however, supervised learning approach fails when there is not a sufficient amount of labeled data for training, which is often the case...always 12 Syntactic Pattern Relation Instance Relation Type (Subtype) arg-2 arg-1 Arab leaders OTHER-AFF (Ethnic) his father PER-SOC (Family) South...for x. For sequence labeling tasks in NLP , linear-chain conditional random field has been rather suc- cessful. It is an undirected graphical model in

  8. Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study

    PubMed Central

    Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

    2016-01-01

    Background Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Objective Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Methods Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. Results One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). Conclusions The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow. PMID:27066806

  9. Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.

    PubMed

    Hochheiser, Harry; Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

    2016-04-11

    Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow.

  10. Generating a Spanish Affective Dictionary with Supervised Learning Techniques

    ERIC Educational Resources Information Center

    Bermudez-Gonzalez, Daniel; Miranda-Jiménez, Sabino; García-Moreno, Raúl-Ulises; Calderón-Nepamuceno, Dora

    2016-01-01

    Nowadays, machine learning techniques are being used in several Natural Language Processing (NLP) tasks such as Opinion Mining (OM). OM is used to analyse and determine the affective orientation of texts. Usually, OM approaches use affective dictionaries in order to conduct sentiment analysis. These lexicons are labeled manually with affective…

  11. Natural language processing: an introduction.

    PubMed

    Nadkarni, Prakash M; Ohno-Machado, Lucila; Chapman, Wendy W

    2011-01-01

    To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.

  12. Natural language processing: an introduction

    PubMed Central

    Ohno-Machado, Lucila; Chapman, Wendy W

    2011-01-01

    Objectives To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design. Target audience This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art. Scope We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field. PMID:21846786

  13. Acquiring Information from Wider Scope to Improve Event Extraction

    DTIC Science & Technology

    2012-05-01

    solve all the problems might be hard or even impossible: Word sense disambiguation is already a hard NLP task, and normalizing different expressions...blindfolded woman seen being shot in the head by a hooded militant on a video obtained but not aired by the Arab television station Al-Jazeera. She...imbalance Why are we interested in unsupervised topic features? There is a problem that arises in the evaluation of almost all the tasks in NLP , concerning

  14. Open Source Clinical NLP – More than Any Single System

    PubMed Central

    Masanz, James; Pakhomov, Serguei V.; Xu, Hua; Wu, Stephen T.; Chute, Christopher G.; Liu, Hongfang

    2014-01-01

    The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP’s mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice. PMID:25954581

  15. Evaluation of natural language processing systems: Issues and approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guida, G.; Mauri, G.

    This paper encompasses two main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NLP systems evaluation. Then, after introducing the notion of the life cycle of an NLP system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems of the investigation. The tools generally used within computer science to assess the qualitymore » of a software system are briefly reviewed, and their applicability to the task of evaluation of NLP systems is discussed. Particular attention is devoted to the concepts of efficiency, correctness, reliability, and adequacy, and how all of them basically fail in capturing the peculiar features of performance evaluation of an NLP system is discussed. Two main approaches to performance evaluation are later introduced; namely, black-box- and model-based, and their most important characteristics are presented. Finally, a specific model for performance evaluation proposed by the authors is illustrated, and the results of an experiment with a sample application are reported. The paper concludes with a discussion on research perspective, open problems, and importance of performance evaluation to industrial applications.« less

  16. Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: A feasibility study.

    PubMed

    Sung, Sheng-Feng; Chen, Kuanchin; Wu, Darren Philbert; Hung, Ling-Chien; Su, Yu-Hsiang; Hu, Ya-Han

    2018-04-01

    To reduce errors in determining eligibility for intravenous thrombolytic therapy (IVT) in stroke patients through use of an enhanced task-specific electronic medical record (EMR) interface powered by natural language processing (NLP) techniques. The information processing algorithm utilized MetaMap to extract medical concepts from IVT eligibility criteria and expanded the concepts using the Unified Medical Language System Metathesaurus. Concepts identified from clinical notes by MetaMap were compared to those from IVT eligibility criteria. The task-specific EMR interface displays IVT-relevant information by highlighting phrases that contain matched concepts. Clinical usability was assessed with clinicians staffing the acute stroke team by comparing user performance while using the task-specific and the current EMR interfaces. The algorithm identified IVT-relevant concepts with micro-averaged precisions, recalls, and F1 measures of 0.998, 0.812, and 0.895 at the phrase level and of 1, 0.972, and 0.986 at the document level. Users using the task-specific interface achieved a higher accuracy score than those using the current interface (91% versus 80%, p = 0.016) in assessing the IVT eligibility criteria. The completion time between the interfaces was statistically similar (2.46 min versus 1.70 min, p = 0.754). Although the information processing algorithm had room for improvement, the task-specific EMR interface significantly reduced errors in assessing IVT eligibility criteria. The study findings provide evidence to support an NLP enhanced EMR system to facilitate IVT decision-making by presenting meaningful and timely information to clinicians, thereby offering a new avenue for improvements in acute stroke care. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.

    PubMed

    Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B

    2017-01-01

    NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.

  18. Recognition of medication information from discharge summaries using ensembles of classifiers.

    PubMed

    Doan, Son; Collier, Nigel; Xu, Hua; Pham, Hoang Duy; Tu, Minh Phuong

    2012-05-07

    Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.

  19. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

    PubMed

    Ferraro, Jeffrey P; Ye, Ye; Gesteland, Per H; Haug, Peter J; Tsui, Fuchiang Rich; Cooper, Gregory F; Van Bree, Rudy; Ginter, Thomas; Nowalk, Andrew J; Wagner, Michael

    2017-05-31

    This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.

  20. Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability

    PubMed Central

    Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

    2018-01-01

    In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance. PMID:26262122

  1. Syntactic dependency parsers for biomedical-NLP.

    PubMed

    Cohen, Raphael; Elhadad, Michael

    2012-01-01

    Syntactic parsers have made a leap in accuracy and speed in recent years. The high order structural information provided by dependency parsers is useful for a variety of NLP applications. We present a biomedical model for the EasyFirst parser, a fast and accurate parser for creating Stanford Dependencies. We evaluate the models trained in the biomedical domains of EasyFirst and Clear-Parser in a number of task oriented metrics. Both parsers provide stat of the art speed and accuracy in the Genia of over 89%. We show that Clear-Parser excels at tasks relating to negation identification while EasyFirst excels at tasks relating to Named Entities and is more robust to changes in domain.

  2. Clinical Natural Language Processing in languages other than English: opportunities and challenges.

    PubMed

    Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre

    2018-03-30

    Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  3. Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

    PubMed Central

    Wu, Stephen; Miller, Timothy; Masanz, James; Coarr, Matt; Halgrim, Scott; Carrell, David; Clark, Cheryl

    2014-01-01

    A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP. PMID:25393544

  4. Bag-of-visual-ngrams for histopathology image classification

    NASA Astrophysics Data System (ADS)

    López-Monroy, A. Pastor; Montes-y-Gómez, Manuel; Escalante, Hugo Jair; Cruz-Roa, Angel; González, Fabio A.

    2013-11-01

    This paper describes an extension of the Bag-of-Visual-Words (BoVW) representation for image categorization (IC) of histophatology images. This representation is one of the most used approaches in several high-level computer vision tasks. However, the BoVW representation has an important limitation: the disregarding of spatial information among visual words. This information may be useful to capture discriminative visual-patterns in specific computer vision tasks. In order to overcome this problem we propose the use of visual n-grams. N-grams based-representations are very popular in the field of natural language processing (NLP), in particular within text mining and information retrieval. We propose building a codebook of n-grams and then representing images by histograms of visual n-grams. We evaluate our proposal in the challenging task of classifying histopathology images. The novelty of our proposal lies in the fact that we use n-grams as attributes for a classification model (together with visual-words, i.e., 1-grams). This is common practice within NLP, although, to the best of our knowledge, this idea has not been explored yet within computer vision. We report experimental results in a database of histopathology images where our proposed method outperforms the traditional BoVWs formulation.

  5. Natural Language Processing: Toward Large-Scale, Robust Systems.

    ERIC Educational Resources Information Center

    Haas, Stephanie W.

    1996-01-01

    Natural language processing (NLP) is concerned with getting computers to do useful things with natural language. Major applications include machine translation, text generation, information retrieval, and natural language interfaces. Reviews important developments since 1987 that have led to advances in NLP; current NLP applications; and problems…

  6. Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

    PubMed Central

    Deleger, Louise; Li, Qi; Kaiser, Megan; Stoutenborough, Laura

    2013-01-01

    Background A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Objective Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. Methods To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. Results The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. Conclusions This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches. PMID:23548263

  7. Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

    PubMed

    Névéol, A; Zweigenbaum, P

    2016-11-10

    To summarize recent research and present a selection of the best papers published in 2015 in the field of clinical Natural Language Processing (NLP). A systematic review of the literature was performed by the two section editors of the IMIA Yearbook NLP section by searching bibliographic databases with a focus on NLP efforts applied to clinical texts or aimed at a clinical outcome. Section editors first selected a shortlist of candidate best papers that were then peer-reviewed by independent external reviewers. The clinical NLP best paper selection shows that clinical NLP is making use of a variety of texts of clinical interest to contribute to the analysis of clinical information and the building of a body of clinical knowledge. The full review process highlighted five papers analyzing patient-authored texts or seeking to connect and aggregate multiple sources of information. They provide a contribution to the development of methods, resources, applications, and sometimes a combination of these aspects. The field of clinical NLP continues to thrive through the contributions of both NLP researchers and healthcare professionals interested in applying NLP techniques to impact clinical practice. Foundational progress in the field makes it possible to leverage a larger variety of texts of clinical interest for healthcare purposes.

  8. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.

    PubMed

    Gehrmann, Sebastian; Dernoncourt, Franck; Li, Yeran; Carlson, Eric T; Wu, Joy T; Welt, Jonathan; Foote, John; Moseley, Edward T; Grant, David W; Tyler, Patrick D; Celi, Leo A

    2018-01-01

    In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.

  9. Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study.

    PubMed

    Guetterman, Timothy C; Chang, Tammy; DeJonckheere, Melissa; Basu, Tanmay; Scruggs, Elizabeth; Vydiswaran, V G Vinod

    2018-06-29

    Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context. ©Timothy C Guetterman, Tammy Chang, Melissa DeJonckheere, Tanmay Basu, Elizabeth Scruggs, VG Vinod Vydiswaran. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 29.06.2018.

  10. Validation of Eye Movements Model of NLP through Stressed Recalls.

    ERIC Educational Resources Information Center

    Sandhu, Daya S.

    Neurolinguistic Progamming (NLP) has emerged as a new approach to counseling and psychotherapy. Though not to be confused with computer programming, NLP does claim to program, deprogram, and reprogram clients' behaviors with the precision and expedition akin to computer processes. It is as a tool for therapeutic communication that NLP has rapidly…

  11. BioNLP Shared Task--The Bacteria Track.

    PubMed

    Bossy, Robert; Jourde, Julien; Manine, Alain-Pierre; Veber, Philippe; Alphonse, Erick; van de Guchte, Maarten; Bessières, Philippe; Nédellec, Claire

    2012-06-26

    We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found common trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence.

  12. Automated concept-level information extraction to reduce the need for custom software and rules development.

    PubMed

    D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D

    2011-01-01

    Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.

  13. Dependency-based Siamese long short-term memory network for learning sentence representations

    PubMed Central

    Zhu, Wenhao; Ni, Jianyue; Wei, Baogang; Lu, Zhiguo

    2018-01-01

    Textual representations play an important role in the field of natural language processing (NLP). The efficiency of NLP tasks, such as text comprehension and information extraction, can be significantly improved with proper textual representations. As neural networks are gradually applied to learn the representation of words and phrases, fairly efficient models of learning short text representations have been developed, such as the continuous bag of words (CBOW) and skip-gram models, and they have been extensively employed in a variety of NLP tasks. Because of the complex structure generated by the longer text lengths, such as sentences, algorithms appropriate for learning short textual representations are not applicable for learning long textual representations. One method of learning long textual representations is the Long Short-Term Memory (LSTM) network, which is suitable for processing sequences. However, the standard LSTM does not adequately address the primary sentence structure (subject, predicate and object), which is an important factor for producing appropriate sentence representations. To resolve this issue, this paper proposes the dependency-based LSTM model (D-LSTM). The D-LSTM divides a sentence representation into two parts: a basic component and a supporting component. The D-LSTM uses a pre-trained dependency parser to obtain the primary sentence information and generate supporting components, and it also uses a standard LSTM model to generate the basic sentence components. A weight factor that can adjust the ratio of the basic and supporting components in a sentence is introduced to generate the sentence representation. Compared with the representation learned by the standard LSTM, the sentence representation learned by the D-LSTM contains a greater amount of useful information. The experimental results show that the D-LSTM is superior to the standard LSTM for sentences involving compositional knowledge (SICK) data. PMID:29513748

  14. Computing Accurate Grammatical Feedback in a Virtual Writing Conference for German-Speaking Elementary-School Children: An Approach Based on Natural Language Generation

    ERIC Educational Resources Information Center

    Harbusch, Karin; Itsova, Gergana; Koch, Ulrich; Kuhner, Christine

    2009-01-01

    We built a natural language processing (NLP) system implementing a "virtual writing conference" for elementary-school children, with German as the target language. Currently, state-of-the-art computer support for writing tasks is restricted to multiple-choice questions or quizzes because automatic parsing of the often ambiguous and fragmentary…

  15. Applying Active Learning to Assertion Classification of Concepts in Clinical Text

    PubMed Central

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-01-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105

  16. Natural Language Processing As an Alternative to Manual Reporting of Colonoscopy Quality Metrics

    PubMed Central

    RAJU, GOTTUMUKKALA S.; LUM, PHILLIP J.; SLACK, REBECCA; THIRUMURTHI, SELVI; LYNCH, PATRICK M.; MILLER, ETHAN; WESTON, BRIAN R.; DAVILA, MARTA L.; BHUTANI, MANOOP S.; SHAFI, MEHNAZ A.; BRESALIER, ROBERT S.; DEKOVICH, ALEXANDER A.; LEE, JEFFREY H.; GUHA, SUSHOVAN; PANDE, MALA; BLECHACZ, BORIS; RASHID, ASIF; ROUTBORT, MARK; SHUTTLESWORTH, GLADIS; MISHRA, LOPA; STROEHLEIN, JOHN R.; ROSS, WILLIAM A.

    2015-01-01

    BACKGROUND & AIMS The adenoma detection rate (ADR) is a quality metric tied to interval colon cancer occurrence. However, manual extraction of data to calculate and track the ADR in clinical practice is labor-intensive. To overcome this difficulty, we developed a natural language processing (NLP) method to identify patients, who underwent their first screening colonoscopy, identify adenomas and sessile serrated adenomas (SSA). We compared the NLP generated results with that of manual data extraction to test the accuracy of NLP, and report on colonoscopy quality metrics using NLP. METHODS Identification of screening colonoscopies using NLP was compared with that using the manual method for 12,748 patients who underwent colonoscopies from July 2010 to February 2013. Also, identification of adenomas and SSAs using NLP was compared with that using the manual method with 2259 matched patient records. Colonoscopy ADRs using these methods were generated for each physician. RESULTS NLP correctly identified 91.3% of the screening examinations, whereas the manual method identified 87.8% of them. Both the manual method and NLP correctly identified examinations of patients with adenomas and SSAs in the matched records almost perfectly. Both NLP and manual method produce comparable values for ADR for each endoscopist as well as the group as a whole. CONCLUSIONS NLP can correctly identify screening colonoscopies, accurately identify adenomas and SSAs in a pathology database, and provide real-time quality metrics for colonoscopy. PMID:25910665

  17. Internship Abstract and Final Reflection

    NASA Technical Reports Server (NTRS)

    Sandor, Edward

    2016-01-01

    The primary objective for this internship is the evaluation of an embedded natural language processor (NLP) as a way to introduce voice control into future space suits. An embedded natural language processor would provide an astronaut hands-free control for making adjustments to the environment of the space suit and checking status of consumables procedures and navigation. Additionally, the use of an embedded NLP could potentially reduce crew fatigue, increase the crewmember's situational awareness during extravehicular activity (EVA) and improve the ability to focus on mission critical details. The use of an embedded NLP may be valuable for other human spaceflight applications desiring hands-free control as well. An embedded NLP is unique because it is a small device that performs language tasks, including speech recognition, which normally require powerful processors. The dedicated device could perform speech recognition locally with a smaller form-factor and lower power consumption than traditional methods.

  18. Natural Language Processing in Game Studies Research: An Overview

    ERIC Educational Resources Information Center

    Zagal, Jose P.; Tomuro, Noriko; Shepitsen, Andriy

    2012-01-01

    Natural language processing (NLP) is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language as input and/or output. The authors propose that NLP can also be used for game studies research. In this article, the authors provide an overview of NLP and describe some research possibilities…

  19. Video to Text (V2T) in Wide Area Motion Imagery

    DTIC Science & Technology

    2015-09-01

    microtext) or a document (e.g., using Sphinx or Apache NLP ) as an automated approach [102]. Previous work in natural language full-text searching...language processing ( NLP ) based module. The heart of the structured text processing module includes the following seven key word banks...Features Tracker MHT Multiple Hypothesis Tracking MIL Multiple Instance Learning NLP Natural Language Processing OAB Online AdaBoost OF Optic Flow

  20. TEES 2.2: Biomedical Event Extraction for Diverse Corpora

    PubMed Central

    2015-01-01

    Background The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. Results The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. Conclusions The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. PMID:26551925

  1. TEES 2.2: Biomedical Event Extraction for Diverse Corpora.

    PubMed

    Björne, Jari; Salakoski, Tapio

    2015-01-01

    The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.

  2. A study of the transferability of influenza case detection systems between two large healthcare systems

    PubMed Central

    Wagner, Michael M.; Cooper, Gregory F.; Ferraro, Jeffrey P.; Su, Howard; Gesteland, Per H.; Haug, Peter J.; Millett, Nicholas E.; Aronis, John M.; Nowalk, Andrew J.; Ruiz, Victor M.; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang

    2017-01-01

    Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser. PMID:28380048

  3. A study of the transferability of influenza case detection systems between two large healthcare systems.

    PubMed

    Ye, Ye; Wagner, Michael M; Cooper, Gregory F; Ferraro, Jeffrey P; Su, Howard; Gesteland, Per H; Haug, Peter J; Millett, Nicholas E; Aronis, John M; Nowalk, Andrew J; Ruiz, Victor M; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang

    2017-01-01

    This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients' diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution's cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.

  4. Natural language processing of clinical notes for identification of critical limb ischemia.

    PubMed

    Afzal, Naveed; Mallipeddi, Vishnu Priya; Sohn, Sunghwan; Liu, Hongfang; Chaudhry, Rajeev; Scott, Christopher G; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

    2018-03-01

    Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. However, automated identification of cases from electronic health records (EHRs) is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. In this study, we extend a previously validated natural language processing (NLP) algorithm for PAD identification to develop and validate a subphenotyping NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. We compared performance of the CLI-NLP algorithm with CLI-related ICD-9 billing codes. The gold standard for validation was human abstraction of clinical notes from EHRs. Compared to billing codes the CLI-NLP algorithm had higher positive predictive value (PPV) (CLI-NLP 96%, billing codes 67%, p < 0.001), specificity (CLI-NLP 98%, billing codes 74%, p < 0.001) and F1-score (CLI-NLP 90%, billing codes 76%, p < 0.001). The sensitivity of these two methods was similar (CLI-NLP 84%; billing codes 88%; p < 0.12). The CLI-NLP algorithm for identification of CLI from narrative clinical notes in an EHR had excellent PPV and has potential for translation to patient care as it will enable automated identification of CLI cases for quality projects, clinical decision support tools and support a learning healthcare system. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Phylogenetic, expression and functional characterizations of the maize NLP transcription factor family reveal a role in nitrate assimilation and signaling.

    PubMed

    Wang, Zhangkui; Zhang, Lei; Sun, Ci; Gu, Riliang; Mi, Guohua; Yuan, Lixing

    2018-01-24

    Although nitrate represents an important nitrogen (N) source for maize, a major crop of dryland areas, the molecular mechanisms of nitrate uptake and assimilation remain poorly understood. Here, we identified nine maize NIN-like protein (ZmNLP) genes and analyzed the function of one member, ZmNLP3.1, in nitrate nutrition and signaling. The NLP family genes were clustered into three clades in a phylogenic tree. Comparative genomic analysis showed that most ZmNLP genes had collinear relationships to the corresponding NLPs in rice, and that the expansion of the ZmNLP family resulted from segmental duplications in the maize genome. Quantitative PCR analysis revealed the expression of ZmNLP2.1, ZmNLP2.2, ZmNLP3.1, ZmNLP3.2, ZmNLP3.3, and ZmNLP3.4 was induced by nitrate in maize roots. The function of ZmNLP3.1 was investigated by overexpressing it in the Arabidopsis nlp7-1 mutant, which is defective in the AtNLP7 gene for nitrate signaling and assimilation. Ectopic expression of ZmNLP3.1 restored the N-deficient phenotypes of nlp7-1 under nitrate-replete conditions in terms of shoot biomass, root morphology and nitrate assimilation. Furthermore, the nitrate induction of NRT2.1, NIA1, and NiR1 gene expression was recovered in the 35S::ZmNLP3.1/nlp7-1 transgenic lines, indicating that ZmNLP3.1 plays essential roles in nitrate signaling. Taken together, these results suggest that ZmNLP3.1 plays an essential role in regulating nitrate signaling and assimilation processes, and represents a valuable candidate for developing transgenic maize cultivars with high N-use efficiency. This article is protected by copyright. All rights reserved.

  6. Exploring Social Meaning in Online Bilingual Text through Social Network Analysis

    DTIC Science & Technology

    2015-09-01

    p. 1). 30 GATE development began in 1995. As techniques for natural language processing ( NLP ) are investigated by the research community and...become part of the NLP repetoire, developers incorporate them with wrappers, which allow the output from GATE processes to be recognized as input by...University NEE Named Entity Extraction NLP natural language processing OSD Office of the Secretary of Defense POS parts of speech SBIR Small Business

  7. Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.

    PubMed

    Zhang, Shaodian; Kang, Tian; Zhang, Xingting; Wen, Dong; Elhadad, Noémie; Lei, Jianbo

    2016-04-01

    Speculations represent uncertainty toward certain facts. In clinical texts, identifying speculations is a critical step of natural language processing (NLP). While it is a nontrivial task in many languages, detecting speculations in Chinese clinical notes can be particularly challenging because word segmentation may be necessary as an upstream operation. The objective of this paper is to construct a state-of-the-art speculation detection system for Chinese clinical notes and to investigate whether embedding features and word segmentations are worth exploiting toward this overall task. We propose a sequence labeling based system for speculation detection, which relies on features from bag of characters, bag of words, character embedding, and word embedding. We experiment on a novel dataset of 36,828 clinical notes with 5103 gold-standard speculation annotations on 2000 notes, and compare the systems in which word embeddings are calculated based on word segmentations given by general and by domain specific segmenters respectively. Our systems are able to reach performance as high as 92.2% measured by F score. We demonstrate that word segmentation is critical to produce high quality word embedding to facilitate downstream information extraction applications, and suggest that a domain dependent word segmenter can be vital to such a clinical NLP task in Chinese language. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Feature generation and representations for protein-protein interaction classification.

    PubMed

    Lan, Man; Tan, Chew Lim; Su, Jian

    2009-10-01

    Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.

  9. From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.

    PubMed

    Johnson, Stephen B; Adekkanattu, Prakash; Campion, Thomas R; Flory, James; Pathak, Jyotishman; Patterson, Olga V; DuVall, Scott L; Major, Vincent; Aphinyanaphongs, Yindalon

    2018-01-01

    Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.

  10. A common type system for clinical natural language processing

    PubMed Central

    2013-01-01

    Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types. PMID:23286462

  11. A common type system for clinical natural language processing.

    PubMed

    Wu, Stephen T; Kaggal, Vinod C; Dligach, Dmitriy; Masanz, James J; Chen, Pei; Becker, Lee; Chapman, Wendy W; Savova, Guergana K; Liu, Hongfang; Chute, Christopher G

    2013-01-03

    One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

  12. What can Natural Language Processing do for Clinical Decision Support?

    PubMed Central

    Demner-Fushman, Dina; Chapman, Wendy W.; McDonald, Clement J.

    2009-01-01

    Computerized Clinical Decision Support (CDS) aims to aid decision making of health care providers and the public by providing easily accessible health-related information at the point and time it is needed. Natural Language Processing (NLP) is instrumental in using free-text information to drive CDS, representing clinical knowledge and CDS interventions in standardized formats, and leveraging clinical narrative. The early innovative NLP research of clinical narrative was followed by a period of stable research conducted at the major clinical centers and a shift of mainstream interest to biomedical NLP. This review primarily focuses on the recently renewed interest in development of fundamental NLP methods and advances in the NLP systems for CDS. The current solutions to challenges posed by distinct sublanguages, intended user groups, and support goals are discussed. PMID:19683066

  13. Mining Peripheral Arterial Disease Cases from Narrative Clinical Notes Using Natural Language Processing

    PubMed Central

    Afzal, Naveed; Sohn, Sunghwan; Abram, Sara; Scott, Christopher G.; Chaudhry, Rajeev; Liu, Hongfang; Kullo, Iftikhar J.; Arruda-Olson, Adelaide M.

    2016-01-01

    Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm to billing code algorithms, using ankle-brachial index (ABI) test results as the gold standard. Methods We compared the performance of the NLP algorithm to 1) results of gold standard ABI; 2) previously validated algorithms based on relevant ICD-9 diagnostic codes (simple model) and 3) a combination of ICD-9 codes with procedural codes (full model). A dataset of 1,569 PAD patients and controls was randomly divided into training (n= 935) and testing (n= 634) subsets. Results We iteratively refined the NLP algorithm in the training set including narrative note sections, note types and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP: 91.8%, full model: 81.8%, simple model: 83%, P<.001), PPV (NLP: 92.9%, full model: 74.3%, simple model: 79.9%, P<.001), and specificity (NLP: 92.5%, full model: 64.2%, simple model: 75.9%, P<.001). Conclusions A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support. PMID:28189359

  14. Characterization of necrosis-inducing NLP proteins in Phytophthora capsici

    PubMed Central

    2014-01-01

    Background Effector proteins function not only as toxins to induce plant cell death, but also enable pathogens to suppress or evade plant defense responses. NLP-like proteins are considered to be effector proteins, and they have been isolated from bacteria, fungi, and oomycete plant pathogens. There is increasing evidence that NLPs have the ability to induce cell death and ethylene accumulation in plants. Results We evaluated the expression patterns of 11 targeted PcNLP genes by qRT-PCR at different time points after infection by P. capsici. Several PcNLP genes were strongly expressed at the early stages in the infection process, but the expression of other PcNLP genes gradually increased to a maximum at late stages of infection. The genes PcNLP2, PcNLP6 and PcNLP14 showed the highest expression levels during infection by P. capsici. The necrosis-inducing activity of all targeted PcNLP genes was evaluated using heterologous expression by PVX agroinfection of Capsicum annuum and Nicotiana benthamiana and by Western blot analysis. The members of the PcNLP family can induce chlorosis or necrosis during infection of pepper and tobacco leaves, but the chlorotic or necrotic response caused by PcNLP genes was stronger in pepper leaves than in tobacco leaves. Moreover, PcNLP2, PcNLP6, and PcNLP14 caused the largest chlorotic or necrotic areas in both host plants, indicating that these three genes contribute to strong virulence during infection by P. capsici. This was confirmed through functional evaluation of their silenced transformants. In addition, we further verified that four conserved residues are putatively active sites in PcNLP1 by site-directed mutagenesis. Conclusions Each targeted PcNLP gene affects cells or tissues differently depending upon the stage of infection. Most PcNLP genes could trigger necrotic or chlorotic responses when expressed in the host C. annuum and the non-host N. benthamiana. Individual PcNLP genes have different phytotoxic effects, and PcNLP2, PcNLP6, and PcNLP14 may play important roles in symptom development and may be crucial for virulence, necrosis-inducing activity, or cell death during infection by P. capsici. PMID:24886309

  15. Characterization of necrosis-inducing NLP proteins in Phytophthora capsici.

    PubMed

    Feng, Bao-Zhen; Zhu, Xiao-Ping; Fu, Li; Lv, Rong-Fei; Storey, Dylan; Tooley, Paul; Zhang, Xiu-Guo

    2014-05-08

    Effector proteins function not only as toxins to induce plant cell death, but also enable pathogens to suppress or evade plant defense responses. NLP-like proteins are considered to be effector proteins, and they have been isolated from bacteria, fungi, and oomycete plant pathogens. There is increasing evidence that NLPs have the ability to induce cell death and ethylene accumulation in plants. We evaluated the expression patterns of 11 targeted PcNLP genes by qRT-PCR at different time points after infection by P. capsici. Several PcNLP genes were strongly expressed at the early stages in the infection process, but the expression of other PcNLP genes gradually increased to a maximum at late stages of infection. The genes PcNLP2, PcNLP6 and PcNLP14 showed the highest expression levels during infection by P. capsici. The necrosis-inducing activity of all targeted PcNLP genes was evaluated using heterologous expression by PVX agroinfection of Capsicum annuum and Nicotiana benthamiana and by Western blot analysis. The members of the PcNLP family can induce chlorosis or necrosis during infection of pepper and tobacco leaves, but the chlorotic or necrotic response caused by PcNLP genes was stronger in pepper leaves than in tobacco leaves. Moreover, PcNLP2, PcNLP6, and PcNLP14 caused the largest chlorotic or necrotic areas in both host plants, indicating that these three genes contribute to strong virulence during infection by P. capsici. This was confirmed through functional evaluation of their silenced transformants. In addition, we further verified that four conserved residues are putatively active sites in PcNLP1 by site-directed mutagenesis. Each targeted PcNLP gene affects cells or tissues differently depending upon the stage of infection. Most PcNLP genes could trigger necrotic or chlorotic responses when expressed in the host C. annuum and the non-host N. benthamiana. Individual PcNLP genes have different phytotoxic effects, and PcNLP2, PcNLP6, and PcNLP14 may play important roles in symptom development and may be crucial for virulence, necrosis-inducing activity, or cell death during infection by P. capsici.

  16. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes

    PubMed Central

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D.; Day, Michele E.; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort. PMID:28269947

  17. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.

    PubMed

    Kuo, Tsung-Ting; Rao, Pallavi; Maehara, Cleo; Doan, Son; Chaparro, Juan D; Day, Michele E; Farcas, Claudiu; Ohno-Machado, Lucila; Hsu, Chun-Nan

    2016-01-01

    Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.

  18. A hybrid model for automatic identification of risk factors for heart disease.

    PubMed

    Yang, Hui; Garibaldi, Jonathan M

    2015-12-01

    Coronary artery disease (CAD) is the leading cause of death in both the UK and worldwide. The detection of related risk factors and tracking their progress over time is of great importance for early prevention and treatment of CAD. This paper describes an information extraction system that was developed to automatically identify risk factors for heart disease in medical records while the authors participated in the 2014 i2b2/UTHealth NLP Challenge. Our approaches rely on several nature language processing (NLP) techniques such as machine learning, rule-based methods, and dictionary-based keyword spotting to cope with complicated clinical contexts inherent in a wide variety of risk factors. Our system achieved encouraging performance on the challenge test data with an overall micro-averaged F-measure of 0.915, which was competitive to the best system (F-measure of 0.927) of this challenge task. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. The NLP toxin family in Phytophthora sojae includes rapidly evolving groups that lack necrosis-inducing activity.

    PubMed

    Dong, Suomeng; Kong, Guanghui; Qutob, Dinah; Yu, Xiaoli; Tang, Junli; Kang, Jixiong; Dai, Tingting; Wang, Hai; Gijzen, Mark; Wang, Yuanchao

    2012-07-01

    Necrosis- and ethylene-inducing-like proteins (NLP) are widely distributed in eukaryotic and prokaryotic plant pathogens and are considered to be important virulence factors. We identified, in total, 70 potential Phytophthora sojae NLP genes but 37 were designated as pseudogenes. Sequence alignment of the remaining 33 NLP delineated six groups. Three of these groups include proteins with an intact heptapeptide (Gly-His-Arg-His-Asp-Trp-Glu) motif, which is important for necrosis-inducing activity, whereas the motif is not conserved in the other groups. In total, 19 representative NLP genes were assessed for necrosis-inducing activity by heterologous expression in Nicotiana benthamiana. Surprisingly, only eight genes triggered cell death. The expression of the NLP genes in P. sojae was examined, distinguishing 20 expressed and 13 nonexpressed NLP genes. Real-time reverse-transcriptase polymerase chain reaction results indicate that most NLP are highly expressed during cyst germination and infection stages. Amino acid substitution ratios (Ka/Ks) of 33 NLP sequences from four different P. sojae strains resulted in identification of positive selection sites in a distinct NLP group. Overall, our study indicates that expansion and pseudogenization of the P. sojae NLP family results from an ongoing birth-and-death process, and that varying patterns of expression, necrosis-inducing activity, and positive selection suggest that NLP have diversified in function.

  20. Robo-Sensei's NLP-Based Error Detection and Feedback Generation

    ERIC Educational Resources Information Center

    Nagata, Noriko

    2009-01-01

    This paper presents a new version of Robo-Sensei's NLP (Natural Language Processing) system which updates the version currently available as the software package "ROBO-SENSEI: Personal Japanese Tutor" (Nagata, 2004). Robo-Sensei's NLP system includes a lexicon, a morphological generator, a word segmentor, a morphological parser, a syntactic…

  1. Mining peripheral arterial disease cases from narrative clinical notes using natural language processing.

    PubMed

    Afzal, Naveed; Sohn, Sunghwan; Abram, Sara; Scott, Christopher G; Chaudhry, Rajeev; Liu, Hongfang; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

    2017-06-01

    Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  2. HTP-NLP: A New NLP System for High Throughput Phenotyping.

    PubMed

    Schlegel, Daniel R; Crowner, Chris; Lehoullier, Frank; Elkin, Peter L

    2017-01-01

    Secondary use of clinical data for research requires a method to quickly process the data so that researchers can quickly extract cohorts. We present two advances in the High Throughput Phenotyping NLP system which support the aim of truly high throughput processing of clinical data, inspired by a characterization of the linguistic properties of such data. Semantic indexing to store and generalize partially-processed results and the use of compositional expressions for ungrammatical text are discussed, along with a set of initial timing results for the system.

  3. Using Natural Language Processing to Extract Abnormal Results From Cancer Screening Reports.

    PubMed

    Moore, Carlton R; Farrag, Ashraf; Ashkin, Evan

    2017-09-01

    Numerous studies show that follow-up of abnormal cancer screening results, such as mammography and Papanicolaou (Pap) smears, is frequently not performed in a timely manner. A contributing factor is that abnormal results may go unrecognized because they are buried in free-text documents in electronic medical records (EMRs), and, as a result, patients are lost to follow-up. By identifying abnormal results from free-text reports in EMRs and generating alerts to clinicians, natural language processing (NLP) technology has the potential for improving patient care. The goal of the current study was to evaluate the performance of NLP software for extracting abnormal results from free-text mammography and Pap smear reports stored in an EMR. A sample of 421 and 500 free-text mammography and Pap reports, respectively, were manually reviewed by a physician, and the results were categorized for each report. We tested the performance of NLP to extract results from the reports. The 2 assessments (criterion standard versus NLP) were compared to determine the precision, recall, and accuracy of NLP. When NLP was compared with manual review for mammography reports, the results were as follows: precision, 98% (96%-99%); recall, 100% (98%-100%); and accuracy, 98% (96%-99%). For Pap smear reports, the precision, recall, and accuracy of NLP were all 100%. Our study developed NLP models that accurately extract abnormal results from mammography and Pap smear reports. Plans include using NLP technology to generate real-time alerts and reminders for providers to facilitate timely follow-up of abnormal results.

  4. Neurolinguistic Programming in the Context of Group Counseling.

    ERIC Educational Resources Information Center

    Childers, John H. Jr.; Saltmarsh, Robert E.

    1986-01-01

    Describes neurolinguistic programming (NLP) in the context of group counseling. NLP is a model of communication that focuses on verbal and nonverbal patterns of behaviors as well as on the structures and processes of human subjectivity. Five stages of group development are described, and specific NLP techniques appropriate to the various stages…

  5. Network Analysis with Stochastic Grammars

    DTIC Science & Technology

    2015-09-17

    Language Processing ( NLP ) domain SCFG...sentence into starting symbol. Figure 2 is an NLP part-of- speech example modified from [38] of an SCFG production rule set that reads a limited set of...English sentences for the purpose of determining grammatical validity and meaning through part-of-speech assignment. In the NLP domain, each word is in

  6. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings.

    PubMed

    Carrell, David S; Schoen, Robert E; Leffler, Daniel A; Morris, Michele; Rose, Sherri; Baer, Andrew; Crockett, Seth D; Gourevitch, Rebecca A; Dean, Katie M; Mehrotra, Ateev

    2017-09-01

    Widespread application of clinical natural language processing (NLP) systems requires taking existing NLP systems and adapting them to diverse and heterogeneous settings. We describe the challenges faced and lessons learned in adapting an existing NLP system for measuring colonoscopy quality. Colonoscopy and pathology reports from 4 settings during 2013-2015, varying by geographic location, practice type, compensation structure, and electronic health record. Though successful, adaptation required considerably more time and effort than anticipated. Typical NLP challenges in assembling corpora, diverse report structures, and idiosyncratic linguistic content were greatly magnified. Strategies for addressing adaptation challenges include assessing site-specific diversity, setting realistic timelines, leveraging local electronic health record expertise, and undertaking extensive iterative development. More research is needed on how to make it easier to adapt NLP systems to new clinical settings. A key challenge in widespread application of NLP is adapting existing systems to new clinical settings. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. Soliton formation from a noise-like pulse during extreme events in a fibre ring laser

    NASA Astrophysics Data System (ADS)

    Pottiez, O.; Ibarra-Villalon, H. E.; Bracamontes-Rodriguez, Y.; Minguela-Gallardo, J. A.; Garcia-Sanchez, E.; Lauterio-Cruz, J. P.; Hernandez-Garcia, J. C.; Bello-Jimenez, M.; Kuzin, E. A.

    2017-10-01

    We study experimentally the interactions between soliton and noise-like pulse (NLP) components in a mode-locked fibre ring laser operating in a hybrid soliton-NLP regime. For proper polarization adjustments, one NLP and multiple packets of solitons coexist in the cavity, at 1530 nm and 1558 nm, respectively. By examining time-domain sequences measured using a 16 GHz real-time oscilloscope, we unveil the process of soliton genesis: they are produced during extreme-intensity episodes affecting the NLP. These extreme events can emerge sporadically, appear in small groups or even form quasi-periodic sequences. Once formed, the wavelength-shifted soliton packet drifts away from the NLP in the dispersive cavity, and eventually vanishes after a variable lifetime. Evidence of the inverse process, through which NLP formation is occasionally seeded by an extreme-intensity event affecting a bunch of solitons, is also provided. The quasi-stationary dynamics described here constitutes an impressive illustration of the connections and interactions between NLPs, extreme events and solitons in passively mode-locked fibre lasers.

  8. Techniques for Automatically Generating Biographical Summaries from News Articles

    DTIC Science & Technology

    2007-09-01

    non-trivial because of the many NLP areas that must be used to efficiently extract the relevant facts. Yet, no study has been done to determine how...also non-trivial because of the many NLP areas that must be used to efficiently extract the relevant facts. Yet, no study has been done to determine...AI) research is called Natural Language Processing ( NLP ). NLP seeks to find ways for computers to read and write documents in as human a way as

  9. Natural language processing to ascertain two key variables from operative reports in ophthalmology.

    PubMed

    Liu, Liyan; Shorstein, Neal H; Amsden, Laura B; Herrinton, Lisa J

    2017-04-01

    Antibiotic prophylaxis is critical to ophthalmology and other surgical specialties. We performed natural language processing (NLP) of 743 838 operative notes recorded for 315 246 surgeries to ascertain two variables needed to study the comparative effectiveness of antibiotic prophylaxis in cataract surgery. The first key variable was an exposure variable, intracameral antibiotic injection. The second was an intraoperative complication, posterior capsular rupture (PCR), which functioned as a potential confounder. To help other researchers use NLP in their settings, we describe our NLP protocol and lessons learned. For each of the two variables, we used SAS Text Miner and other SAS text-processing modules with a training set of 10 000 (1.3%) operative notes to develop a lexicon. The lexica identified misspellings, abbreviations, and negations, and linked words into concepts (e.g. "antibiotic" linked with "injection"). We confirmed the NLP tools by iteratively obtaining random samples of 2000 (0.3%) notes, with replacement. The NLP tools identified approximately 60 000 intracameral antibiotic injections and 3500 cases of PCR. The positive and negative predictive values for intracameral antibiotic injection exceeded 99%. For the intraoperative complication, they exceeded 94%. NLP was a valid and feasible method for obtaining critical variables needed for a research study of surgical safety. These NLP tools were intended for use in the study sample. Use with external datasets or future datasets in our own setting would require further testing. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  10. Natural Language Processing to Ascertain Two Key Variables from Operative Reports in Ophthalmology

    PubMed Central

    Liu, Liyan; Shorstein, Neal H.; Amsden, Laura B; Herrinton, Lisa J.

    2016-01-01

    Purpose Antibiotic prophylaxis is critical to ophthalmology and other surgical specialties. We performed natural language processing (NLP) of 743,838 operative notes recorded for 315,246 surgeries to ascertain two variables needed to study the comparative effectiveness of antibiotic prophylaxis in cataract surgery. The first key variable was an exposure variable, intracameral antibiotic injection. The second was an intraoperative complication, posterior capsular rupture (PCR), that functioned as a potential confounder. To help other researchers use NLP in their settings, we describe our NLP protocol and lessons learned. Methods For each of the two variables, we used SAS Text Miner and other SAS text-processing modules with a training set of 10,000 (1.3%) operative notes to develop a lexicon. The lexica identified misspellings, abbreviations, and negations, and linked words into concepts (e.g., “antibiotic” linked with “injection”). We confirmed the NLP tools by iteratively obtaining random samples of 2,000 (0.3%) notes, with replacement. Results The NLP tools identified approximately 60,000 intracameral antibiotic injections and 3,500 cases of PCR. The positive and negative predictive values for intracameral antibiotic injection exceeded 99%. For the intraoperative complication, they exceeded 94%. Conclusion NLP was a valid and feasible method for obtaining critical variables needed for a research study of surgical safety. These NLP tools were intended for use in the study sample. Use with external datasets or future datasets in our own setting would require further testing. PMID:28052483

  11. The use of natural language processing on pediatric diagnostic radiology reports in the electronic health record to identify deep venous thrombosis in children.

    PubMed

    Gálvez, Jorge A; Pappas, Janine M; Ahumada, Luis; Martin, John N; Simpao, Allan F; Rehman, Mohamed A; Witmer, Char

    2017-10-01

    Venous thromboembolism (VTE) is a potentially life-threatening condition that includes both deep vein thrombosis (DVT) and pulmonary embolism. We sought to improve detection and reporting of children with a new diagnosis of VTE by applying natural language processing (NLP) tools to radiologists' reports. We validated an NLP tool, Reveal NLP (Health Fidelity Inc, San Mateo, CA) and inference rules engine's performance in identifying reports with deep venous thrombosis using a curated set of ultrasound reports. We then configured the NLP tool to scan all available radiology reports on a daily basis for studies that met criteria for VTE between July 1, 2015, and March 31, 2016. The NLP tool and inference rules engine correctly identified 140 out of 144 reports with positive DVT findings and 98 out of 106 negative reports in the validation set. The tool's sensitivity was 97.2% (95% CI 93-99.2%), specificity was 92.5% (95% CI 85.7-96.7%). Subsequently, the NLP tool and inference rules engine processed 6373 radiology reports from 3371 hospital encounters. The NLP tool and inference rules engine identified 178 positive reports and 3193 negative reports with a sensitivity of 82.9% (95% CI 74.8-89.2) and specificity of 97.5% (95% CI 96.9-98). The system functions well as a safety net to screen patients for HA-VTE on a daily basis and offers value as an automated, redundant system. To our knowledge, this is the first pediatric study to apply NLP technology in a prospective manner for HA-VTE identification.

  12. The Use of Systemic-Functional Linguistics in Automated Text Mining

    DTIC Science & Technology

    2009-03-01

    what degree two or more documents are similar in terms of their meaning. Simply put, such a cognitive model aims to link the physical manifestation...These features, both in terms of frequency and their chaining across a text, were taken as salient stylistic features that had a direct relationship to...because SFL attempts to model these cognitive processes, this has the potential to improve NLP tasks by making them more ’human-like’. Secondly

  13. A neuropeptide-mediated stretch response links muscle contraction to changes in neurotransmitter release

    PubMed Central

    Hu, Zhitao; Pym, Edward C.G.; Babu, Kavita; Vashlishan Murray, Amy B.; Kaplan, Joshua M.

    2011-01-01

    Although C. elegans has been utilized extensively to study synapse formation and function, relatively little is known about synaptic plasticity in C. elegans. We show that a brief treatment with the cholinesterase inhibitor aldicarb induces a form of presynaptic potentiation whereby ACh release at neuromuscular junctions (NMJs) is doubled. Aldicarb-induced potentiation was eliminated by mutations that block processing of pro-neuropeptides, by mutations inactivating a single pro-neuropeptide (NLP-12), and by those inactivating an NLP-12 receptor (CKR-2). NLP-12 expression is limited to a single stretch-activated neuron, DVA. Analysis of a YFP-tagged NLP-12 suggests that aldicarb stimulates DVA secretion of NLP-12. Mutations disrupting the DVA mechanoreceptor (TRP-4) decreased aldicarb-induced NLP-12 secretion and blocked aldicarb-induced synaptic potentiation. Mutants lacking NLP-12 or CKR-2 have decreased locomotion rates. Collectively, these results suggest that NLP-12 mediates a mechanosensory feedback loop that couples muscle contraction to changes in presynaptic release, thereby providing a mechanism for proprioceptive control of locomotion. PMID:21745640

  14. Building a Natural Language Processing Tool to Identify Patients With High Clinical Suspicion for Kawasaki Disease from Emergency Department Notes.

    PubMed

    Doan, Son; Maehara, Cleo K; Chaparro, Juan D; Lu, Sisi; Liu, Ruiling; Graham, Amanda; Berry, Erika; Hsu, Chun-Nan; Kanegaye, John T; Lloyd, David D; Ohno-Machado, Lucila; Burns, Jane C; Tremoulet, Adriana H

    2016-05-01

    Delayed diagnosis of Kawasaki disease (KD) may lead to serious cardiac complications. We sought to create and test the performance of a natural language processing (NLP) tool, the KD-NLP, in the identification of emergency department (ED) patients for whom the diagnosis of KD should be considered. We developed an NLP tool that recognizes the KD diagnostic criteria based on standard clinical terms and medical word usage using 22 pediatric ED notes augmented by Unified Medical Language System vocabulary. With high suspicion for KD defined as fever and three or more KD clinical signs, KD-NLP was applied to 253 ED notes from children ultimately diagnosed with either KD or another febrile illness. We evaluated KD-NLP performance against ED notes manually reviewed by clinicians and compared the results to a simple keyword search. KD-NLP identified high-suspicion patients with a sensitivity of 93.6% and specificity of 77.5% compared to notes manually reviewed by clinicians. The tool outperformed a simple keyword search (sensitivity = 41.0%; specificity = 76.3%). KD-NLP showed comparable performance to clinician manual chart review for identification of pediatric ED patients with a high suspicion for KD. This tool could be incorporated into the ED electronic health record system to alert providers to consider the diagnosis of KD. KD-NLP could serve as a model for decision support for other conditions in the ED. © 2016 by the Society for Academic Emergency Medicine.

  15. Natural Language Processing as a Discipline at LLNL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Firpo, M A

    The field of Natural Language Processing (NLP) is described as it applies to the needs of LLNL in handling free-text. The state of the practice is outlined with the emphasis placed on two specific aspects of NLP: Information Extraction and Discourse Integration. A brief description is included of the NLP applications currently being used at LLNL. A gap analysis provides a look at where the technology needs work in order to meet the needs of LLNL. Finally, recommendations are made to meet these needs.

  16. A natural language processing program effectively extracts key pathologic findings from radical prostatectomy reports.

    PubMed

    Kim, Brian J; Merchant, Madhur; Zheng, Chengyi; Thomas, Anil A; Contreras, Richard; Jacobsen, Steven J; Chien, Gary W

    2014-12-01

    Natural language processing (NLP) software programs have been widely developed to transform complex free text into simplified organized data. Potential applications in the field of medicine include automated report summaries, physician alerts, patient repositories, electronic medical record (EMR) billing, and quality metric reports. Despite these prospects and the recent widespread adoption of EMR, NLP has been relatively underutilized. The objective of this study was to evaluate the performance of an internally developed NLP program in extracting select pathologic findings from radical prostatectomy specimen reports in the EMR. An NLP program was generated by a software engineer to extract key variables from prostatectomy reports in the EMR within our healthcare system, which included the TNM stage, Gleason grade, presence of a tertiary Gleason pattern, histologic subtype, size of dominant tumor nodule, seminal vesicle invasion (SVI), perineural invasion (PNI), angiolymphatic invasion (ALI), extracapsular extension (ECE), and surgical margin status (SMS). The program was validated by comparing NLP results to a gold standard compiled by two blinded manual reviewers for 100 random pathology reports. NLP demonstrated 100% accuracy for identifying the Gleason grade, presence of a tertiary Gleason pattern, SVI, ALI, and ECE. It also demonstrated near-perfect accuracy for extracting histologic subtype (99.0%), PNI (98.9%), TNM stage (98.0%), SMS (97.0%), and dominant tumor size (95.7%). The overall accuracy of NLP was 98.7%. NLP generated a result in <1 second, whereas the manual reviewers averaged 3.2 minutes per report. This novel program demonstrated high accuracy and efficiency identifying key pathologic details from the prostatectomy report within an EMR system. NLP has the potential to assist urologists by summarizing and highlighting relevant information from verbose pathology reports. It may also facilitate future urologic research through the rapid and automated creation of large databases.

  17. Natural Language Processing for Asthma Ascertainment in Different Practice Settings.

    PubMed

    Wi, Chung-Il; Sohn, Sunghwan; Ali, Mir; Krusemark, Elizabeth; Ryu, Euijung; Liu, Hongfang; Juhn, Young J

    We developed and validated NLP-PAC, a natural language processing (NLP) algorithm based on predetermined asthma criteria (PAC) for asthma ascertainment using electronic health records at Mayo Clinic. To adapt NLP-PAC in a different health care setting, Sanford Children Hospital, by assessing its external validity. The study was designed as a retrospective cohort study that used a random sample of 2011-2012 Sanford Birth cohort (n = 595). Manual chart review was performed on the cohort for asthma ascertainment on the basis of the PAC. We then used half of the cohort as a training cohort (n = 298) and the other half as a blind test cohort to evaluate the adapted NLP-PAC algorithm. Association of known asthma-related risk factors with the Sanford-NLP algorithm-driven asthma ascertainment was tested. Among the eligible test cohort (n = 297), 160 (53%) were males, 268 (90%) white, and the median age was 2.3 years (range, 1.5-3.1 years). NLP-PAC, after adaptation, and the human abstractor identified 74 (25%) and 72 (24%) subjects, respectively, with 66 subjects identified by both approaches. Sensitivity, specificity, positive predictive value, and negative predictive value for the NLP algorithm in predicting asthma status were 92%, 96%, 89%, and 97%, respectively. The known risk factors for asthma identified by NLP (eg, smoking history) were similar to the ones identified by manual chart review. Successful implementation of NLP-PAC for asthma ascertainment in 2 different practice settings demonstrates the feasibility of automated asthma ascertainment leveraging electronic health record data with a potential to enable large-scale, multisite asthma studies to improve asthma care and research. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  18. Optimizing graph-based patterns to extract biomedical events from the literature

    PubMed Central

    2015-01-01

    In BioNLP-ST 2013 We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3rd) and a 48.93% F-score in the GE task (ranking 4th). After BioNLP-ST 2013 We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall. PMID:26551594

  19. Building an Evaluation Scale using Item Response Theory.

    PubMed

    Lalor, John P; Wu, Hao; Yu, Hong

    2016-11-01

    Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.

  20. Building an Evaluation Scale using Item Response Theory

    PubMed Central

    Lalor, John P.; Wu, Hao; Yu, Hong

    2016-01-01

    Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.1 PMID:28004039

  1. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing.

    PubMed

    Zhong, Qiu-Yue; Karlson, Elizabeth W; Gelaye, Bizu; Finan, Sean; Avillach, Paul; Smoller, Jordan W; Cai, Tianxi; Williams, Michelle A

    2018-05-29

    We examined the comparative performance of structured, diagnostic codes vs. natural language processing (NLP) of unstructured text for screening suicidal behavior among pregnant women in electronic medical records (EMRs). Women aged 10-64 years with at least one diagnostic code related to pregnancy or delivery (N = 275,843) from Partners HealthCare were included as our "datamart." Diagnostic codes related to suicidal behavior were applied to the datamart to screen women for suicidal behavior. Among women without any diagnostic codes related to suicidal behavior (n = 273,410), 5880 women were randomly sampled, of whom 1120 had at least one mention of terms related to suicidal behavior in clinical notes. NLP was then used to process clinical notes for the 1120 women. Chart reviews were performed for subsamples of women. Using diagnostic codes, 196 pregnant women were screened positive for suicidal behavior, among whom 149 (76%) had confirmed suicidal behavior by chart review. Using NLP among those without diagnostic codes, 486 pregnant women were screened positive for suicidal behavior, among whom 146 (30%) had confirmed suicidal behavior by chart review. The use of NLP substantially improves the sensitivity of screening suicidal behavior in EMRs. However, the prevalence of confirmed suicidal behavior was lower among women who did not have diagnostic codes for suicidal behavior but screened positive by NLP. NLP should be used together with diagnostic codes for future EMR-based phenotyping studies for suicidal behavior.

  2. Development and Validation of a Natural Language Processing Tool to Identify Patients Treated for Pneumonia across VA Emergency Departments.

    PubMed

    Jones, B E; South, B R; Shao, Y; Lu, C C; Leng, J; Sauer, B C; Gundlapalli, A V; Samore, M H; Zeng, Q

    2018-01-01

    Identifying pneumonia using diagnosis codes alone may be insufficient for research on clinical decision making. Natural language processing (NLP) may enable the inclusion of cases missed by diagnosis codes. This article (1) develops a NLP tool that identifies the clinical assertion of pneumonia from physician emergency department (ED) notes, and (2) compares classification methods using diagnosis codes versus NLP against a gold standard of manual chart review to identify patients initially treated for pneumonia. Among a national population of ED visits occurring between 2006 and 2012 across the Veterans Affairs health system, we extracted 811 physician documents containing search terms for pneumonia for training, and 100 random documents for validation. Two reviewers annotated span- and document-level classifications of the clinical assertion of pneumonia. An NLP tool using a support vector machine was trained on the enriched documents. We extracted diagnosis codes assigned in the ED and upon hospital discharge and calculated performance characteristics for diagnosis codes, NLP, and NLP plus diagnosis codes against manual review in training and validation sets. Among the training documents, 51% contained clinical assertions of pneumonia; in the validation set, 9% were classified with pneumonia, of which 100% contained pneumonia search terms. After enriching with search terms, the NLP system alone demonstrated a recall/sensitivity of 0.72 (training) and 0.55 (validation), and a precision/positive predictive value (PPV) of 0.89 (training) and 0.71 (validation). ED-assigned diagnostic codes demonstrated lower recall/sensitivity (0.48 and 0.44) but higher precision/PPV (0.95 in training, 1.0 in validation); the NLP system identified more "possible-treated" cases than diagnostic coding. An approach combining NLP and ED-assigned diagnostic coding classification achieved the best performance (sensitivity 0.89 and PPV 0.80). System-wide application of NLP to clinical text can increase capture of initial diagnostic hypotheses, an important inclusion when studying diagnosis and clinical decision-making under uncertainty. Schattauer GmbH Stuttgart.

  3. Computer Assisted Reading in German as a Foreign Language, Developing and Testing an NLP-Based Application

    ERIC Educational Resources Information Center

    Wood, Peter

    2011-01-01

    "QuickAssist," the program presented in this paper, uses natural language processing (NLP) technologies. It places a range of NLP tools at the disposal of learners, intended to enable them to independently read and comprehend a German text of their choice while they extend their vocabulary, learn about different uses of particular words,…

  4. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.

    PubMed

    Park, Albert; Hartzler, Andrea L; Huh, Jina; McDonald, David W; Pratt, Wanda

    2015-08-31

    The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap's commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap's mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.

  5. Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence

    PubMed Central

    Carrell, David S.; Halgrim, Scott; Tran, Diem-Thy; Buist, Diana S. M.; Chubak, Jessica; Chapman, Wendy W.; Savova, Guergana

    2014-01-01

    The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction. PMID:24488511

  6. Hierarchical semantic structures for medical NLP.

    PubMed

    Taira, Ricky K; Arnold, Corey W

    2013-01-01

    We present a framework for building a medical natural language processing (NLP) system capable of deep understanding of clinical text reports. The framework helps developers understand how various NLP-related efforts and knowledge sources can be integrated. The aspects considered include: 1) computational issues dealing with defining layers of intermediate semantic structures to reduce the dimensionality of the NLP problem; 2) algorithmic issues in which we survey the NLP literature and discuss state-of-the-art procedures used to map between various levels of the hierarchy; and 3) implementation issues to software developers with available resources. The objective of this poster is to educate readers to the various levels of semantic representation (e.g., word level concepts, ontological concepts, logical relations, logical frames, discourse structures, etc.). The poster presents an architecture for which diverse efforts and resources in medical NLP can be integrated in a principled way.

  7. The Promise of NLP and Speech Processing Technologies in Language Assessment

    ERIC Educational Resources Information Center

    Chapelle, Carol A.; Chung, Yoo-Ree

    2010-01-01

    Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…

  8. MedEx/J: A One-Scan Simple and Fast NLP Tool for Japanese Clinical Texts.

    PubMed

    Aramaki, Eiji; Yano, Ken; Wakamiya, Shoko

    2017-01-01

    Because of recent replacement of physical documents with electronic medical records (EMR), the importance of information processing in the medical field has increased. In light of this trend, we have been developing MedEx/J, which retrieves important Japanese language information from medical reports. MedEx/J executes two tasks simultaneously: (1) term extraction, and (2) positive and negative event classification. We designate this approach as a one-scan approach, providing simplicity of systems and reasonable accuracy. MedEx/J performance on the two tasks is described herein: (1) term extraction (Fβ = 1 = 0.87) and (2) positive-negative classification (Fβ = 1 = 0.63). This paper also presents discussion and explains remaining issues in the medical natural language processing field.

  9. Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.

    PubMed

    Pyysalo, Sampo; Ohta, Tomoko; Rak, Rafal; Sullivan, Dan; Mao, Chunhong; Wang, Chunxia; Sobral, Bruno; Tsujii, Jun'ichi; Ananiadou, Sophia

    2012-06-26

    We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST'09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST'09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58% F-score, is broadly comparable with levels reported for other relation extraction tasks. For the ID task, the highest-performing system achieved 56% F-score, comparable to the state-of-the-art performance at the established ST'09 task. In the EPI task, the best result was 53% F-score for the full set of extraction targets and 69% F-score for a reduced set of core extraction targets, approaching a level of performance sufficient for user-facing applications. In this study, we extend on previously reported results and perform further analyses of the outputs of the participating systems. We place specific emphasis on aspects of system performance relating to real-world applicability, considering alternate evaluation metrics and performing additional manual analysis of system outputs. We further demonstrate that the strengths of extraction systems can be combined to improve on the performance achieved by any system in isolation. The manually annotated corpora, supporting resources, and evaluation tools for all tasks are available from http://www.bionlp-st.org and the tasks continue as open challenges for all interested parties.

  10. Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011

    PubMed Central

    2012-01-01

    We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST'09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST'09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58% F-score, is broadly comparable with levels reported for other relation extraction tasks. For the ID task, the highest-performing system achieved 56% F-score, comparable to the state-of-the-art performance at the established ST'09 task. In the EPI task, the best result was 53% F-score for the full set of extraction targets and 69% F-score for a reduced set of core extraction targets, approaching a level of performance sufficient for user-facing applications. In this study, we extend on previously reported results and perform further analyses of the outputs of the participating systems. We place specific emphasis on aspects of system performance relating to real-world applicability, considering alternate evaluation metrics and performing additional manual analysis of system outputs. We further demonstrate that the strengths of extraction systems can be combined to improve on the performance achieved by any system in isolation. The manually annotated corpora, supporting resources, and evaluation tools for all tasks are available from http://www.bionlp-st.org and the tasks continue as open challenges for all interested parties. PMID:22759456

  11. All-optical polarization control and noise cleaning based on a nonlinear lossless polarizer

    NASA Astrophysics Data System (ADS)

    Barozzi, Matteo; Vannucci, Armando; Picchi, Giorgio

    2015-01-01

    We propose an all-optical fiber-based device able to accomplish both polarization control and OSNR enhancement of an amplitude modulated optical signal, affected by unpolarized additive white Gaussian noise, at the same time. The proposed noise cleaning device is made of a nonlinear lossless polarizer (NLP), that performs polarization control, followed by an ideal polarizing filter that removes the orthogonally polarized half of additive noise. The NLP transforms every input signal polarization into a unique, well defined output polarization (without any loss of signal energy) and its task is to impose a signal polarization aligned with the transparent eigenstate of the polarizing filter. In order to effectively control the polarization of the modulated signal, we show that two different NLP configurations (with counter- or co-propagating pump laser) are needed, as a function of the signal polarization coherence time. The NLP is designed so that polarization attraction is effective only on the "noiseless" (i.e., information-bearing) component of the signal and not on noise, that remains unpolarized at the NLP output. Hence, the proposed device is able to discriminate signal power (that is preserved) from in-band noise power (that is partly suppressed). Since signal repolarization is detrimental if applied to polarization-multiplexed formats, the noise cleaner application is limited here to "legacy" links, with 10 Gb/s OOK modulation, still representing the most common format in deployed networks. By employing the appropriate NLP configurations, we obtain an OSNR gain close to 3dB. Furthermore, we show how the achievable OSNR gain can be estimated theoretically.

  12. Speech Processing and Recognition (SPaRe)

    DTIC Science & Technology

    2011-01-01

    results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and

  13. A bibliometric analysis of natural language processing in medical research.

    PubMed

    Chen, Xieling; Xie, Haoran; Wang, Fu Lee; Liu, Ziqing; Xu, Juan; Hao, Tianyong

    2018-03-22

    Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field. We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007-2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method. There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country's publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. CONCLUSIONS: A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.

  14. Aurora B Interaction of Centrosomal Nlp Regulates Cytokinesis*

    PubMed Central

    Yan, Jie; Jin, Shunqian; Li, Jia; Zhan, Qimin

    2010-01-01

    Cytokinesis is a fundamental cellular process, which ensures equal abscission and fosters diploid progenies. Aberrant cytokinesis may result in genomic instability and cell transformation. However, the underlying regulatory machinery of cytokinesis is largely undefined. Here, we demonstrate that Nlp (Ninein-like protein), a recently identified BRCA1-associated centrosomal protein that is required for centrosomes maturation at interphase and spindle formation in mitosis, also contributes to the accomplishment of cytokinesis. Through immunofluorescent analysis, Nlp is found to localize at midbody during cytokinesis. Depletion of endogenous Nlp triggers aborted division and subsequently leads to multinucleated phenotypes. Nlp can be recruited by Aurora B to the midbody apparatus via their physical association at the late stage of mitosis. Disruption of their interaction induces aborted cytokinesis. Importantly, Nlp is characterized as a novel substrate of Aurora B and can be phosphorylated by Aurora B. The specific phosphorylation sites are mapped at Ser-185, Ser-448, and Ser-585. The phosphorylation at Ser-448 and Ser-585 is likely required for Nlp association with Aurora B and localization at midbody. Meanwhile, the phosphorylation at Ser-185 is vital to Nlp protein stability. Disruptions of these phosphorylation sites abolish cytokinesis and lead to chromosomal instability. Collectively, these observations demonstrate that regulation of Nlp by Aurora B is critical for the completion of cytokinesis, providing novel insights into understanding the machinery of cell cycle progression. PMID:20864540

  15. Aurora B interaction of centrosomal Nlp regulates cytokinesis.

    PubMed

    Yan, Jie; Jin, Shunqian; Li, Jia; Zhan, Qimin

    2010-12-17

    Cytokinesis is a fundamental cellular process, which ensures equal abscission and fosters diploid progenies. Aberrant cytokinesis may result in genomic instability and cell transformation. However, the underlying regulatory machinery of cytokinesis is largely undefined. Here, we demonstrate that Nlp (Ninein-like protein), a recently identified BRCA1-associated centrosomal protein that is required for centrosomes maturation at interphase and spindle formation in mitosis, also contributes to the accomplishment of cytokinesis. Through immunofluorescent analysis, Nlp is found to localize at midbody during cytokinesis. Depletion of endogenous Nlp triggers aborted division and subsequently leads to multinucleated phenotypes. Nlp can be recruited by Aurora B to the midbody apparatus via their physical association at the late stage of mitosis. Disruption of their interaction induces aborted cytokinesis. Importantly, Nlp is characterized as a novel substrate of Aurora B and can be phosphorylated by Aurora B. The specific phosphorylation sites are mapped at Ser-185, Ser-448, and Ser-585. The phosphorylation at Ser-448 and Ser-585 is likely required for Nlp association with Aurora B and localization at midbody. Meanwhile, the phosphorylation at Ser-185 is vital to Nlp protein stability. Disruptions of these phosphorylation sites abolish cytokinesis and lead to chromosomal instability. Collectively, these observations demonstrate that regulation of Nlp by Aurora B is critical for the completion of cytokinesis, providing novel insights into understanding the machinery of cell cycle progression.

  16. An Overview of Computer-Based Natural Language Processing.

    ERIC Educational Resources Information Center

    Gevarter, William B.

    Computer-based Natural Language Processing (NLP) is the key to enabling humans and their computer-based creations to interact with machines using natural languages (English, Japanese, German, etc.) rather than formal computer languages. NLP is a major research area in the fields of artificial intelligence and computational linguistics. Commercial…

  17. NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.

    ERIC Educational Resources Information Center

    Zhou, Lina; Zhang, Dongsong

    2003-01-01

    Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…

  18. Semantic biomedical resource discovery: a Natural Language Processing framework.

    PubMed

    Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis

    2015-09-30

    A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.

  19. Automated chart review utilizing natural language processing algorithm for asthma predictive index.

    PubMed

    Kaur, Harsheen; Sohn, Sunghwan; Wi, Chung-Il; Ryu, Euijung; Park, Miguel A; Bachman, Kay; Kita, Hirohito; Croghan, Ivana; Castro-Rodriguez, Jose A; Voge, Gretchen A; Liu, Hongfang; Juhn, Young J

    2018-02-13

    Thus far, no algorithms have been developed to automatically extract patients who meet Asthma Predictive Index (API) criteria from the Electronic health records (EHR) yet. Our objective is to develop and validate a natural language processing (NLP) algorithm to identify patients that meet API criteria. This is a cross-sectional study nested in a birth cohort study in Olmsted County, MN. Asthma status ascertained by manual chart review based on API criteria served as gold standard. NLP-API was developed on a training cohort (n = 87) and validated on a test cohort (n = 427). Criterion validity was measured by sensitivity, specificity, positive predictive value and negative predictive value of the NLP algorithm against manual chart review for asthma status. Construct validity was determined by associations of asthma status defined by NLP-API with known risk factors for asthma. Among the eligible 427 subjects of the test cohort, 48% were males and 74% were White. Median age was 5.3 years (interquartile range 3.6-6.8). 35 (8%) had a history of asthma by NLP-API vs. 36 (8%) by abstractor with 31 by both approaches. NLP-API predicted asthma status with sensitivity 86%, specificity 98%, positive predictive value 88%, negative predictive value 98%. Asthma status by both NLP and manual chart review were significantly associated with the known asthma risk factors, such as history of allergic rhinitis, eczema, family history of asthma, and maternal history of smoking during pregnancy (p value < 0.05). Maternal smoking [odds ratio: 4.4, 95% confidence interval 1.8-10.7] was associated with asthma status determined by NLP-API and abstractor, and the effect sizes were similar between the reviews with 4.4 vs 4.2 respectively. NLP-API was able to ascertain asthma status in children mining from EHR and has a potential to enhance asthma care and research through population management and large-scale studies when identifying children who meet API criteria.

  20. Facilitating cancer research using natural language processing of pathology reports.

    PubMed

    Xu, Hua; Anderson, Kristin; Grann, Victor R; Friedman, Carol

    2004-01-01

    Many ongoing clinical research projects, such as projects involving studies associated with cancer, involve manual capture of information in surgical pathology reports so that the information can be used to determine the eligibility of recruited patients for the study and to provide other information, such as cancer prognosis. Natural language processing (NLP) systems offer an alternative to automated coding, but pathology reports have certain features that are difficult for NLP systems. This paper describes how a preprocessor was integrated with an existing NLP system (MedLEE) in order to reduce modification to the NLP system and to improve performance. The work was done in conjunction with an ongoing clinical research project that assesses disparities and risks of developing breast cancer for minority women. An evaluation of the system was performed using manually coded data from the research project's database as a gold standard. The evaluation outcome showed that the extended NLP system had a sensitivity of 90.6% and a precision of 91.6%. Results indicated that this system performed satisfactorily for capturing information for the cancer research project.

  1. Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature.

    PubMed

    Wu, Stephen; Liu, Hongfang

    2011-01-01

    Natural language processing (NLP) has become crucial in unlocking information stored in free text, from both clinical notes and biomedical literature. Clinical notes convey clinical information related to individual patient health care, while biomedical literature communicates scientific findings. This work focuses on semantic characterization of texts at an enterprise scale, comparing and contrasting the two domains and their NLP approaches. We analyzed the empirical distributional characteristics of NLP-discovered named entities in Mayo Clinic clinical notes from 2001-2010, and in the 2011 MetaMapped Medline Baseline. We give qualitative and quantitative measures of domain similarity and point to the feasibility of transferring resources and techniques. An important by-product for this study is the development of a weighted ontology for each domain, which gives distributional semantic information that may be used to improve NLP applications.

  2. Role of PROLOG (Programming and Logic) in natural-language processing. Report for September-December 1987

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McHale, M.L.

    The field of artificial Intelligence strives to produce computer programs that exhibit intelligent behavior. One of the areas of interest is the processing of natural language. This report discusses the role of the computer language PROLOG in Natural Language Processing (NLP) both from theoretic and pragmatic viewpoints. The reasons for using PROLOG for NLP are numerous. First, linguists can write natural-language grammars almost directly as PROLOG programs; this allows fast-prototyping of NLP systems and facilitates analysis of NLP theories. Second, semantic representations of natural-language texts that use logic formalisms are readily produced in PROLOG because of PROLOG's logical foundations. Third,more » PROLOG's built-in inferencing mechanisms are often sufficient for inferences on the logical forms produced by NLPs. Fourth, the logical, declarative nature of PROLOG may make it the language of choice for parallel computing systems. Finally, the fact that PROLOG has a de facto standard (Edinburgh) makes the porting of code from one computer system to another virtually trouble free. Perhaps the strongest tie one could make between NLP and PROLOG was stated by John Stuart Mill in his inaugural Address at St. Andrews: The structure of every sentence is a lesson in logic.« less

  3. Differentiation of ileostomy from colostomy procedures: assessing the accuracy of current procedural terminology codes and the utility of natural language processing.

    PubMed

    Vo, Elaine; Davila, Jessica A; Hou, Jason; Hodge, Krystle; Li, Linda T; Suliburk, James W; Kao, Lillian S; Berger, David H; Liang, Mike K

    2013-08-01

    Large databases provide a wealth of information for researchers, but identifying patient cohorts often relies on the use of current procedural terminology (CPT) codes. In particular, studies of stoma surgery have been limited by the accuracy of CPT codes in identifying and differentiating ileostomy procedures from colostomy procedures. It is important to make this distinction because the prevalence of complications associated with stoma formation and reversal differ dramatically between types of stoma. Natural language processing (NLP) is a process that allows text-based searching. The Automated Retrieval Console is an NLP-based software that allows investigators to design and perform NLP-assisted document classification. In this study, we evaluated the role of CPT codes and NLP in differentiating ileostomy from colostomy procedures. Using CPT codes, we conducted a retrospective study that identified all patients undergoing a stoma-related procedure at a single institution between January 2005 and December 2011. All operative reports during this time were reviewed manually to abstract the following variables: formation or reversal and ileostomy or colostomy. Sensitivity and specificity for validation of the CPT codes against the mastery surgery schedule were calculated. Operative reports were evaluated by use of NLP to differentiate ileostomy- from colostomy-related procedures. Sensitivity and specificity for identifying patients with ileostomy or colostomy procedures were calculated for CPT codes and NLP for the entire cohort. CPT codes performed well in identifying stoma procedures (sensitivity 87.4%, specificity 97.5%). A total of 664 stoma procedures were identified by CPT codes between 2005 and 2011. The CPT codes were adequate in identifying stoma formation (sensitivity 97.7%, specificity 72.4%) and stoma reversal (sensitivity 74.1%, specificity 98.7%), but they were inadequate in identifying ileostomy (sensitivity 35.0%, specificity 88.1%) and colostomy (75.2% and 80.9%). NLP performed with greater sensitivity, specificity, and accuracy than CPT codes in identifying stoma procedures and stoma types. Major differences where NLP outperformed CPT included identifying ileostomy (specificity 95.8%, sensitivity 88.3%, and accuracy 91.5%) and colostomy (97.6%, 90.5%, and 92.8%, respectively). CPT codes can identify effectively patients who have had stoma procedures and are adequate in distinguishing between formation and reversal; however, CPT codes cannot differentiate ileostomy from colostomy. NLP can be used to differentiate between ileostomy- and colostomy-related procedures. The role of NLP in conjunction with electronic medical records in data retrieval warrants further investigation. Published by Mosby, Inc.

  4. A Cloud-based Approach to Medical NLP

    PubMed Central

    Chard, Kyle; Russell, Michael; Lussier, Yves A.; Mendonça, Eneida A; Silverstein, Jonathan C.

    2011-01-01

    Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN. PMID:22195072

  5. A cloud-based approach to medical NLP.

    PubMed

    Chard, Kyle; Russell, Michael; Lussier, Yves A; Mendonça, Eneida A; Silverstein, Jonathan C

    2011-01-01

    Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.

  6. Indexing Anatomical Phrases in Neuro-Radiology Reports to the UMLS 2005AA

    PubMed Central

    Bashyam, Vijayaraghavan; Taira, Ricky K.

    2005-01-01

    This work describes a methodology to index anatomical phrases to the 2005AA release of the Unified Medical Language System (UMLS). A phrase chunking tool based on Natural Language Processing (NLP) was developed to identify semantically coherent phrases within medical reports. Using this phrase chunker, a set of 2,551 unique anatomical phrases was extracted from brain radiology reports. These phrases were mapped to the 2005AA release of the UMLS using a vector space model. Precision for the task of indexing unique phrases was 0.87. PMID:16778995

  7. Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

    PubMed

    Schroeck, Florian R; Patterson, Olga V; Alba, Patrick R; Pattison, Erik A; Seigne, John D; DuVall, Scott L; Robertson, Douglas J; Sirovich, Brenda; Goodney, Philip P

    2017-12-01

    To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports. Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer. When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer. NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data. Published by Elsevier Inc.

  8. Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions.

    PubMed

    Sohn, Sunghwan; Wang, Yanshan; Wi, Chung-Il; Krusemark, Elizabeth A; Ryu, Euijung; Ali, Mir H; Juhn, Young J; Liu, Hongfang

    2017-11-30

    To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n = 298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. There exist notable lexical variations (word-level similarity = 0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity = 0.944, asthma-related concept similarity = 0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  9. Building a common pipeline for rule-based document classification.

    PubMed

    Patterson, Olga V; Ginter, Thomas; DuVall, Scott L

    2013-01-01

    Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.

  10. Recurrent Artificial Neural Networks and Finite State Natural Language Processing.

    ERIC Educational Resources Information Center

    Moisl, Hermann

    It is argued that pessimistic assessments of the adequacy of artificial neural networks (ANNs) for natural language processing (NLP) on the grounds that they have a finite state architecture are unjustified, and that their adequacy in this regard is an empirical issue. First, arguments that counter standard objections to finite state NLP on the…

  11. The Application of Natural Language Processing to Augmentative and Alternative Communication

    ERIC Educational Resources Information Center

    Higginbotham, D. Jeffery; Lesher, Gregory W.; Moulton, Bryan J.; Roark, Brian

    2012-01-01

    Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next…

  12. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study.

    PubMed

    Li, Li; Chase, Herbert S; Patel, Chintan O; Friedman, Carol; Weng, Chunhua

    2008-11-06

    The prevalence of electronic medical record (EMR) systems has made mass-screening for clinical trials viable through secondary uses of clinical data, which often exist in both structured and free text formats. The tradeoffs of using information in either data format for clinical trials screening are understudied. This paper compares the results of clinical trial eligibility queries over ICD9-encoded diagnoses and NLP-processed textual discharge summaries. The strengths and weaknesses of both data sources are summarized along the following dimensions: information completeness, expressiveness, code granularity, and accuracy of temporal information. We conclude that NLP-processed patient reports supplement important information for eligibility screening and should be used in combination with structured data.

  13. Temporal data representation, normalization, extraction, and reasoning: A review from clinical domain

    PubMed Central

    Madkour, Mohcine; Benhaddou, Driss; Tao, Cui

    2016-01-01

    Background and Objective We live our lives by the calendar and the clock, but time is also an abstraction, even an illusion. The sense of time can be both domain-specific and complex, and is often left implicit, requiring significant domain knowledge to accurately recognize and harness. In the clinical domain, the momentum gained from recent advances in infrastructure and governance practices has enabled the collection of tremendous amount of data at each moment in time. Electronic Health Records (EHRs) have paved the way to making these data available for practitioners and researchers. However, temporal data representation, normalization, extraction and reasoning are very important in order to mine such massive data and therefore for constructing the clinical timeline. The objective of this work is to provide an overview of the problem of constructing a timeline at the clinical point of care and to summarize the state-of-the-art in processing temporal information of clinical narratives. Methods This review surveys the methods used in three important area: modeling and representing of time, Medical NLP methods for extracting time, and methods of time reasoning and processing. The review emphasis on the current existing gap between present methods and the semantic web technologies and catch up with the possible combinations. Results the main findings of this review is revealing the importance of time processing not only in constructing timelines and clinical decision support systems but also as a vital component of EHR data models and operations. Conclusions Extracting temporal information in clinical narratives is a challenging task. The inclusion of ontologies and semantic web will lead to better assessment of the annotation task and, together with medical NLP techniques, will help resolving granularity and co-reference resolution problems. PMID:27040831

  14. Integrating natural language processing expertise with patient safety event review committees to improve the analysis of medication events.

    PubMed

    Fong, Allan; Harriott, Nicole; Walters, Donna M; Foley, Hanan; Morrissey, Richard; Ratwani, Raj R

    2017-08-01

    Many healthcare providers have implemented patient safety event reporting systems to better understand and improve patient safety. Reviewing and analyzing these reports is often time consuming and resource intensive because of both the quantity of reports and length of free-text descriptions in the reports. Natural language processing (NLP) experts collaborated with clinical experts on a patient safety committee to assist in the identification and analysis of medication related patient safety events. Different NLP algorithmic approaches were developed to identify four types of medication related patient safety events and the models were compared. Well performing NLP models were generated to categorize medication related events into pharmacy delivery delays, dispensing errors, Pyxis discrepancies, and prescriber errors with receiver operating characteristic areas under the curve of 0.96, 0.87, 0.96, and 0.81 respectively. We also found that modeling the brief without the resolution text generally improved model performance. These models were integrated into a dashboard visualization to support the patient safety committee review process. We demonstrate the capabilities of various NLP models and the use of two text inclusion strategies at categorizing medication related patient safety events. The NLP models and visualization could be used to improve the efficiency of patient safety event data review and analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Comparison of Three Information Sources for Smoking Information in Electronic Health Records

    PubMed Central

    Wang, Liwei; Ruan, Xiaoyang; Yang, Ping; Liu, Hongfang

    2016-01-01

    OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI. MATERIALS AND METHODS Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). RESULTS NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. CONCLUSION These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. PMID:27980387

  16. The application of natural language processing to augmentative and alternative communication.

    PubMed

    Higginbotham, D Jeffery; Lesher, Gregory W; Moulton, Bryan J; Roark, Brian

    2011-01-01

    Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next generation of AAC technology.

  17. A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.

    PubMed

    Peng, Yifan; Torii, Manabu; Wu, Cathy H; Vijay-Shanker, K

    2014-08-23

    Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task. A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations. In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.

  18. Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

    PubMed

    Chen, Po-Hao; Zafar, Hanna; Galperin-Aizenberg, Maya; Cook, Tessa

    2018-04-01

    A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.

  19. Novel Use of Natural Language Processing (NLP) to Predict Suicidal Ideation and Psychiatric Symptoms in a Text-Based Mental Health Intervention in Madrid.

    PubMed

    Cook, Benjamin L; Progovac, Ana M; Chen, Pei; Mullin, Brian; Hou, Sherry; Baca-Garcia, Enrique

    2016-01-01

    Natural language processing (NLP) and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12). Predictor variables included structured items (e.g., relating to sleep and well-being) and responses to one unstructured question, "how do you feel today?" We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4) were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible.

  20. Towards symbiosis in knowledge representation and natural language processing for structuring clinical practice guidelines.

    PubMed

    Weng, Chunhua; Payne, Philip R O; Velez, Mark; Johnson, Stephen B; Bakken, Suzanne

    2014-01-01

    The successful adoption by clinicians of evidence-based clinical practice guidelines (CPGs) contained in clinical information systems requires efficient translation of free-text guidelines into computable formats. Natural language processing (NLP) has the potential to improve the efficiency of such translation. However, it is laborious to develop NLP to structure free-text CPGs using existing formal knowledge representations (KR). In response to this challenge, this vision paper discusses the value and feasibility of supporting symbiosis in text-based knowledge acquisition (KA) and KR. We compare two ontologies: (1) an ontology manually created by domain experts for CPG eligibility criteria and (2) an upper-level ontology derived from a semantic pattern-based approach for automatic KA from CPG eligibility criteria text. Then we discuss the strengths and limitations of interweaving KA and NLP for KR purposes and important considerations for achieving the symbiosis of KR and NLP for structuring CPGs to achieve evidence-based clinical practice.

  1. Identifying Falls Risk Screenings Not Documented with Administrative Codes Using Natural Language Processing

    PubMed Central

    Zhu, Vivienne J; Walker, Tina D; Warren, Robert W; Jenny, Peggy B; Meystre, Stephane; Lenert, Leslie A

    2017-01-01

    Quality reporting that relies on coded administrative data alone may not completely and accurately depict providers’ performance. To assess this concern with a test case, we developed and evaluated a natural language processing (NLP) approach to identify falls risk screenings documented in clinical notes of patients without coded falls risk screening data. Extracting information from 1,558 clinical notes (mainly progress notes) from 144 eligible patients, we generated a lexicon of 38 keywords relevant to falls risk screening, 26 terms for pre-negation, and 35 terms for post-negation. The NLP algorithm identified 62 (out of the 144) patients who falls risk screening documented only in clinical notes and not coded. Manual review confirmed 59 patients as true positives and 77 patients as true negatives. Our NLP approach scored 0.92 for precision, 0.95 for recall, and 0.93 for F-measure. These results support the concept of utilizing NLP to enhance healthcare quality reporting. PMID:29854264

  2. Natural Language Processing Accurately Calculates Adenoma and Sessile Serrated Polyp Detection Rates.

    PubMed

    Nayor, Jennifer; Borges, Lawrence F; Goryachev, Sergey; Gainer, Vivian S; Saltzman, John R

    2018-07-01

    ADR is a widely used colonoscopy quality indicator. Calculation of ADR is labor-intensive and cumbersome using current electronic medical databases. Natural language processing (NLP) is a method used to extract meaning from unstructured or free text data. (1) To develop and validate an accurate automated process for calculation of adenoma detection rate (ADR) and serrated polyp detection rate (SDR) on data stored in widely used electronic health record systems, specifically Epic electronic health record system, Provation ® endoscopy reporting system, and Sunquest PowerPath pathology reporting system. Screening colonoscopies performed between June 2010 and August 2015 were identified using the Provation ® reporting tool. An NLP pipeline was developed to identify adenomas and sessile serrated polyps (SSPs) on pathology reports corresponding to these colonoscopy reports. The pipeline was validated using a manual search. Precision, recall, and effectiveness of the natural language processing pipeline were calculated. ADR and SDR were then calculated. We identified 8032 screening colonoscopies that were linked to 3821 pathology reports (47.6%). The NLP pipeline had an accuracy of 100% for adenomas and 100% for SSPs. Mean total ADR was 29.3% (range 14.7-53.3%); mean male ADR was 35.7% (range 19.7-62.9%); and mean female ADR was 24.9% (range 9.1-51.0%). Mean total SDR was 4.0% (0-9.6%). We developed and validated an NLP pipeline that accurately and automatically calculates ADRs and SDRs using data stored in Epic, Provation ® and Sunquest PowerPath. This NLP pipeline can be used to evaluate colonoscopy quality parameters at both individual and practice levels.

  3. Retrieval of radiology reports citing critical findings with disease-specific customization.

    PubMed

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, Ip; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. THIS PAPER: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications - an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) - to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application's performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks.

  4. Retrieval of Radiology Reports Citing Critical Findings with Disease-Specific Customization

    PubMed Central

    Lacson, Ronilda; Sugarbaker, Nathanael; Prevedello, Luciano M; Ivan, IP; Mar, Wendy; Andriole, Katherine P; Khorasani, Ramin

    2012-01-01

    Background: Communication of critical results from diagnostic procedures between caregivers is a Joint Commission national patient safety goal. Evaluating critical result communication often requires manual analysis of voluminous data, especially when reviewing unstructured textual results of radiologic findings. Information retrieval (IR) tools can facilitate this process by enabling automated retrieval of radiology reports that cite critical imaging findings. However, IR tools that have been developed for one disease or imaging modality often need substantial reconfiguration before they can be utilized for another disease entity. Purpose: This paper: 1) describes the process of customizing two Natural Language Processing (NLP) and Information Retrieval/Extraction applications – an open-source toolkit, A Nearly New Information Extraction system (ANNIE); and an application developed in-house, Information for Searching Content with an Ontology-Utilizing Toolkit (iSCOUT) – to illustrate the varying levels of customization required for different disease entities and; 2) evaluates each application’s performance in identifying and retrieving radiology reports citing critical imaging findings for three distinct diseases, pulmonary nodule, pneumothorax, and pulmonary embolus. Results: Both applications can be utilized for retrieval. iSCOUT and ANNIE had precision values between 0.90-0.98 and recall values between 0.79 and 0.94. ANNIE had consistently higher precision but required more customization. Conclusion: Understanding the customizations involved in utilizing NLP applications for various diseases will enable users to select the most suitable tool for specific tasks. PMID:22934127

  5. Natural language processing and inference rules as strategies for updating problem list in an electronic health record.

    PubMed

    Plazzotta, Fernando; Otero, Carlos; Luna, Daniel; de Quiros, Fernan Gonzalez Bernaldo

    2013-01-01

    Physicians do not always keep the problem list accurate, complete and updated. To analyze natural language processing (NLP) techniques and inference rules as strategies to maintain completeness and accuracy of the problem list in EHRs. Non systematic literature review in PubMed, in the last 10 years. Strategies to maintain the EHRs problem list were analyzed in two ways: inputting and removing problems from the problem list. NLP and inference rules have acceptable performance for inputting problems into the problem list. No studies using these techniques for removing problems were published Conclusion: Both tools, NLP and inference rules have had acceptable results as tools for maintain the completeness and accuracy of the problem list.

  6. Cognition-Based Approaches for High-Precision Text Mining

    ERIC Educational Resources Information Center

    Shannon, George John

    2017-01-01

    This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both…

  7. Common Ground: An Interactive Visual Exploration and Discovery for Complex Health Data

    DTIC Science & Technology

    2015-04-01

    working with Intermountain Healthcare on a new rich dataset extracted directly from medical notes using natural language processing ( NLP ) algorithms...probabilities based on a state- of-the-art NLP classifiers. At that stage the data did not include geographic information or temporal information but we

  8. Using rule-based natural language processing to improve disease normalization in biomedical text.

    PubMed

    Kang, Ning; Singh, Bharat; Afzal, Zubair; van Mulligen, Erik M; Kors, Jan A

    2013-01-01

    In order for computers to extract useful information from unstructured text, a concept normalization system is needed to link relevant concepts in a text to sources that contain further information about the concept. Popular concept normalization tools in the biomedical field are dictionary-based. In this study we investigate the usefulness of natural language processing (NLP) as an adjunct to dictionary-based concept normalization. We compared the performance of two biomedical concept normalization systems, MetaMap and Peregrine, on the Arizona Disease Corpus, with and without the use of a rule-based NLP module. Performance was assessed for exact and inexact boundary matching of the system annotations with those of the gold standard and for concept identifier matching. Without the NLP module, MetaMap and Peregrine attained F-scores of 61.0% and 63.9%, respectively, for exact boundary matching, and 55.1% and 56.9% for concept identifier matching. With the aid of the NLP module, the F-scores of MetaMap and Peregrine improved to 73.3% and 78.0% for boundary matching, and to 66.2% and 69.8% for concept identifier matching. For inexact boundary matching, performances further increased to 85.5% and 85.4%, and to 73.6% and 73.3% for concept identifier matching. We have shown the added value of NLP for the recognition and normalization of diseases with MetaMap and Peregrine. The NLP module is general and can be applied in combination with any concept normalization system. Whether its use for concept types other than disease is equally advantageous remains to be investigated.

  9. Epidemiology of angina pectoris: role of natural language processing of the medical record

    PubMed Central

    Pakhomov, Serguei; Hemingway, Harry; Weston, Susan A.; Jacobsen, Steven J.; Rodeheffer, Richard; Roger, Véronique L.

    2007-01-01

    Background The diagnosis of angina is challenging as it relies on symptom descriptions. Natural language processing (NLP) of the electronic medical record (EMR) can provide access to such information contained in free text that may not be fully captured by conventional diagnostic coding. Objective To test the hypothesis that NLP of the EMR improves angina pectoris (AP) ascertainment over diagnostic codes. Methods Billing records of in- and out-patients were searched for ICD-9 codes for AP, chronic ischemic heart disease and chest pain. EMR clinical reports were searched electronically for 50 specific non-negated natural language synonyms to these ICD-9 codes. The two methods were compared to a standardized assessment of angina by Rose questionnaire for three diagnostic levels: unspecified chest pain, exertional chest pain, and Rose angina. Results Compared to the Rose questionnaire, the true positive rate of EMR-NLP for unspecified chest pain was 62% (95%CI:55–67) vs. 51% (95%CI:44–58) for diagnostic codes (p<0.001). For exertional chest pain, the EMR-NLP true positive rate was 71% (95%CI:61–80) vs. 62% (95%CI:52–73) for diagnostic codes (p=0.10). Both approaches had 88% (95%CI:65–100) true positive rate for Rose angina. The EMR-NLP method consistently identified more patients with exertional chest pain over 28-month follow-up. Conclusion EMR-NLP method improves the detection of unspecified and exertional chest pain cases compared to diagnostic codes. These findings have implications for epidemiological and clinical studies of angina pectoris. PMID:17383310

  10. A Morphological Analyzer for Vocalized or Not Vocalized Arabic Language

    NASA Astrophysics Data System (ADS)

    El Amine Abderrahim, Med; Breksi Reguig, Fethi

    This research has been to show the realization of a morphological analyzer of the Arabic language (vocalized or not vocalized). This analyzer is based upon our object model for the Arabic Natural Language Processing (NLP) and can be exploited by NLP applications such as translation machine, orthographical correction and the search for information.

  11. Natural language processing, pragmatics, and verbal behavior

    PubMed Central

    Cherpas, Chris

    1992-01-01

    Natural Language Processing (NLP) is that part of Artificial Intelligence (AI) concerned with endowing computers with verbal and listener repertoires, so that people can interact with them more easily. Most attention has been given to accurately parsing and generating syntactic structures, although NLP researchers are finding ways of handling the semantic content of language as well. It is increasingly apparent that understanding the pragmatic (contextual and consequential) dimension of natural language is critical for producing effective NLP systems. While there are some techniques for applying pragmatics in computer systems, they are piecemeal, crude, and lack an integrated theoretical foundation. Unfortunately, there is little awareness that Skinner's (1957) Verbal Behavior provides an extensive, principled pragmatic analysis of language. The implications of Skinner's functional analysis for NLP and for verbal aspects of epistemology lead to a proposal for a “user expert”—a computer system whose area of expertise is the long-term computer user. The evolutionary nature of behavior suggests an AI technology known as genetic algorithms/programming for implementing such a system. ImagesFig. 1 PMID:22477052

  12. Opiates Modulate Noxious Chemical Nociception through a Complex Monoaminergic/Peptidergic Cascade

    PubMed Central

    Mills, Holly; Ortega, Amanda; Law, Wenjing; Hapiak, Vera; Summers, Philip; Clark, Tobias

    2016-01-01

    The ability to detect noxious stimuli, process the nociceptive signal, and elicit an appropriate behavioral response is essential for survival. In Caenorhabditis elegans, opioid receptor agonists, such as morphine, mimic serotonin, and suppress the overall withdrawal from noxious stimuli through a pathway requiring the opioid-like receptor, NPR-17. This serotonin- or morphine-dependent modulation can be rescued in npr-17-null animals by the expression of npr-17 or a human κ opioid receptor in the two ASI sensory neurons, with ASI opioid signaling selectively inhibiting ASI neuropeptide release. Serotonergic modulation requires peptides encoded by both nlp-3 and nlp-24, and either nlp-3 or nlp-24 overexpression mimics morphine and suppresses withdrawal. Peptides encoded by nlp-3 act differentially, with only NLP-3.3 mimicking morphine, whereas other nlp-3 peptides antagonize NLP-3.3 modulation. Together, these results demonstrate that opiates modulate nociception in Caenorhabditis elegans through a complex monoaminergic/peptidergic cascade, and suggest that this model may be useful for dissecting opiate signaling in mammals. SIGNIFICANCE STATEMENT Opiates are used extensively to treat chronic pain. In Caenorhabditis elegans, opioid receptor agonists suppress the overall withdrawal from noxious chemical stimuli through a pathway requiring an opioid-like receptor and two distinct neuropeptide-encoding genes, with individual peptides from the same gene functioning antagonistically to modulate nociception. Endogenous opioid signaling functions as part of a complex, monoaminergic/peptidergic signaling cascade and appears to selectively inhibit neuropeptide release, mediated by a α-adrenergic-like receptor, from two sensory neurons. Importantly, receptor null animals can be rescued by the expression of the human κ opioid receptor, and injection of human opioid receptor ligands mimics exogenous opiates, highlighting the utility of this model for dissecting opiate signaling in mammals. PMID:27194330

  13. Using automatically extracted information from mammography reports for decision-support

    PubMed Central

    Bozkurt, Selen; Gimenez, Francisco; Burnside, Elizabeth S.; Gulkesen, Kemal H.; Rubin, Daniel L.

    2016-01-01

    Objective To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report. Materials and methods We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect (“reference standard”) structured inputs to the DSS (“RS-DSS”) vs NLP-derived inputs (“NLP-DSS”) on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist. Results The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004 ± 0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%. Conclusion The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions. PMID:27388877

  14. Integrating UIMA annotators in a web-based text processing framework.

    PubMed

    Chen, Xiang; Arnold, Corey W

    2013-01-01

    The Unstructured Information Management Architecture (UIMA) [1] framework is a growing platform for natural language processing (NLP) applications. However, such applications may be difficult for non-technical users deploy. This project presents a web-based framework that wraps UIMA-based annotator systems into a graphical user interface for researchers and clinicians, and a web service for developers. An annotator that extracts data elements from lung cancer radiology reports is presented to illustrate the use of the system. Annotation results from the web system can be exported to multiple formats for users to utilize in other aspects of their research and workflow. This project demonstrates the benefits of a lay-user interface for complex NLP applications. Efforts such as this can lead to increased interest and support for NLP work in the clinical domain.

  15. An Evolving Ecosystem for Natural Language Processing in Department of Veterans Affairs.

    PubMed

    Garvin, Jennifer H; Kalsy, Megha; Brandt, Cynthia; Luther, Stephen L; Divita, Guy; Coronado, Gregory; Redd, Doug; Christensen, Carrie; Hill, Brent; Kelly, Natalie; Treitler, Qing Zeng

    2017-02-01

    In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.

  16. Automated processing of electronic medical records is a reliable method of determining aspirin use in populations at risk for cardiovascular events.

    PubMed

    Pakhomov, Serguei Vs; Shah, Nilay D; Hanson, Penny; Balasubramaniam, Saranya C; Smith, Steven A

    2010-01-01

    Low-dose aspirin reduces cardiovascular risk; however, monitoring over-the-counter medication use relies on the time-consuming and costly manual review of medical records. Our objective is to validate natural language processing (NLP) of the electronic medical record (EMR) for extracting medication exposure and contraindication information. The text of EMRs for 499 patients with type 2 diabetes was searched using NLP for evidence of aspirin use and its contraindications. The results were compared to a standardised manual records review. Of the 499 patients, 351 (70%) were using aspirin and 148 (30%) were not, according to manual review. NLP correctly identified 346 of the 351 aspirin-positive and 134 of the 148 aspirin-negative patients, indicating a sensitivity of 99% (95% CI 97-100) and specificity of 91% (95% CI 88-97). Of the 148 aspirin-negative patients, 66 (45%) had contraindications and 82 (55%) did not, according to manual review. NLP search for contraindications correctly identified 61 of the 66 patients with contraindications and 58 of the 82 patients without, yielding a sensitivity of 92% (95% CI 84-97) and a specificity of 71% (95% CI 60-80). NLP of the EMR is accurate in ascertaining documented aspirin use and could potentially be used for epidemiological research as a source of cardiovascular risk factor information.

  17. Common data model for natural language processing based on two existing standard information models: CDA+GrAF.

    PubMed

    Meystre, Stéphane M; Lee, Sanghoon; Jung, Chai Young; Chevrier, Raphaël D

    2012-08-01

    An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text. We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled "CDA+GrAF". We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks. This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and "translating" annotations between different NLP applications, and eventually "plug-and-play" of different modules in NLP applications. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Neuro-Linguistic Programming, Matching Sensory Predicates, and Rapport.

    ERIC Educational Resources Information Center

    Schmedlen, George W.; And Others

    A key task for the therapist in psychotherapy is to build trust and rapport with the client. Neuro-Linguistic Programming (NLP) practitioners believe that matching the sensory modality (representational system) of a client's predicates (verbs, adverbs, and adjectives) improves rapport. In this study, 16 volunteer subjects participated in two…

  19. Visual Exploration of Semantic Relationships in Neural Word Embeddings

    DOE PAGES

    Liu, Shusen; Bremer, Peer-Timo; Thiagarajan, Jayaraman J.; ...

    2017-08-29

    Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). But, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. Particularly, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or evenmore » misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. We introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.« less

  20. Visual Exploration of Semantic Relationships in Neural Word Embeddings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Shusen; Bremer, Peer-Timo; Thiagarajan, Jayaraman J.

    Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). But, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. Particularly, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or evenmore » misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. We introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.« less

  1. Mastering Overdetection and Underdetection in Learner-Answer Processing: Simple Techniques for Analysis and Diagnosis

    ERIC Educational Resources Information Center

    Blanchard, Alexia; Kraif, Olivier; Ponton, Claude

    2009-01-01

    This paper presents a "didactic triangulation" strategy to cope with the problem of reliability of NLP applications for computer-assisted language learning (CALL) systems. It is based on the implementation of basic but well mastered NLP techniques and puts the emphasis on an adapted gearing between computable linguistic clues and didactic features…

  2. Finding 'Evidence of Absence' in Medical Notes: Using NLP for Clinical Inferencing.

    PubMed

    Carter, Marjorie E; Divita, Guy; Redd, Andrew; Rubin, Michael A; Samore, Matthew H; Gupta, Kalpana; Trautner, Barbara W; Gundlapalli, Adi V

    2016-01-01

    Extracting evidence of the absence of a target of interest from medical text can be useful in clinical inferencing. The purpose of our study was to develop a natural language processing (NLP) pipelineto identify the presence of indwelling urinary catheters from electronic medical notes to aid in detection of catheter-associated urinary tract infections (CAUTI). Finding clear evidence that a patient does not have an indwelling urinary catheter is useful in making a determination regarding CAUTI. We developed a lexicon of seven core concepts to infer the absence of a urinary catheter. Of the 990,391 concepts extractedby NLP from a large corpus of 744,285 electronic medical notes from 5589 hospitalized patients, 63,516 were labeled as evidence of absence.Human review revealed three primary causes for false negatives. The lexicon and NLP pipeline were refined using this information, resulting in outputs with an acceptable false positive rate of 11%.

  3. Comparison of Caenorhabditis elegans NLP peptides with arthropod neuropeptides.

    PubMed

    Husson, Steven J; Lindemans, Marleen; Janssen, Tom; Schoofs, Liliane

    2009-04-01

    Neuropeptides are small messenger molecules that can be found in all metazoans, where they govern a diverse array of physiological processes. Because neuropeptides seem to be conserved among pest species, selected peptides can be considered as attractive targets for drug discovery. Much can be learned from the model system Caenorhabditis elegans because of the availability of a sequenced genome and state-of-the-art postgenomic technologies that enable characterization of endogenous peptides derived from neuropeptide-like protein (NLP) precursors. Here, we provide an overview of the NLP peptide family in C. elegans and discuss their resemblance with arthropod neuropeptides and their relevance for anthelmintic discovery.

  4. Chinese Sentence Classification Based on Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Gu, Chengwei; Wu, Ming; Zhang, Chuang

    2017-10-01

    Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.

  5. Increased expression of Nlp, a potential oncogene in ovarian cancer, and its implication in carcinogenesis.

    PubMed

    Qu, Danni; Qu, Hongyan; Fu, Ming; Zhao, Xuelian; Liu, Rong; Sui, Lihua; Zhan, Qimin

    2008-08-01

    Nlp (Ninein-like protein), a novel centrosome protein involved in microtubule nucleation, has been studied extensively in our laboratory, and its overexpression has been found in some human tumors. To understand the role of Nlp in human ovarian cancer development, we studied the correlation of Nlp expression with clinicopathological parameters and survival in epithelial ovarian cancer, and the impact of Nlp overexpression on ovarian cancer cells. Nlp expression in normal, borderline, benign and malignant epithelial ovarian tissues was examined by immunohistochemistry. The correlation between Nlp expression and tumor grade, FIGO stage and histological type was also evaluated. Survival was calculated using Kaplan-Meier estimates. Cell proliferation and apoptosis were assayed after stable transfection of pEGFP-C3-Nlp or empty vector in human ovarian cancer cell line SKOV3. Nlp was positive in 1 of 10 (10%) normal ovarian tissues, 5 of 34 (14.7%) benign tumors, 9 of 26 (34.6%) borderline tumors and 73 of 131 (56.0%) ovarian tumors. Nlp immunoreactivity intensity significantly correlated with tumor grade, but not with FIGO stage or histological type. Kaplan-Meier curves showed that Nlp overexpression was marginally associated with decreased overall survival. Overexpression of Nlp enhanced proliferation and inhibited apoptosis induced by paclitaxel in the SKOV3 cell line. Overexpression of Nlp in ovarian tumors raises the possibility that Nlp may play a role in ovarian carcinogenesis.

  6. Overexpression of centrosomal protein Nlp confers breast carcinoma resistance to paclitaxel.

    PubMed

    Zhao, Weihong; Song, Yongmei; Xu, Binghe; Zhan, Qimin

    2012-02-01

    Nlp (ninein-like protein), an important molecule involved in centrosome maturation and spindle formation, plays an important role in tumorigenesis and its abnormal expression was recently observed in human breast and lung cancers. In this study, the correlation between overexpression of Nlp and paclitaxel chemosensitivity was investigated to explore the mechanisms of resistance to paclitaxel and to understand the effect of Nlp upon apoptosis induced by chemotherapeutic agents. Nlp expression vector was stably transfected into breast cancer MCF-7 cells. With Nlp overexpression, the survival rates, cell cycle distributions and apoptosis were analyzed in transfected MCF-7 cells by MTT test and FCM approach. The immunofluorescent assay was employed to detect the changes of microtubule after paclitaxel treatment. Immunoblotting analysis was used to examine expression of centrosomal proteins and apoptosis associated proteins. Subsequently, Nlp expression was retrospectively examined with 55 breast cancer samples derived from paclitaxel treated patients. Interestingly, the survival rates of MCF-7 cells with Nlp overexpressing were higher than that of control after paclitaxel treatment. Nlp overexpression promoted G2-M arrest and attenuated apoptosis induced by paclitaxel, which was coupled with elevated Bcl-2 protein. Nlp expression significantly lessened the microtubule polymerization and bundling elicited by paclitaxel attributing to alteration on the structure or dynamics of β-tubulin but not on its expression. The breast cancer patients with high expression of Nlp were likely resistant to the treatment of paclitaxel, as the response rate in Nlp negative patients was 62.5%, whereas was 58.3 and 15.8% in Nlp (+) and Nlp (++) patients respectively (p = 0.015). Nlp expression was positive correlated with those of Plk1 and PCNA. These findings provide insights into more rational chemotherapeutic regimens in clinical practice, and more effective approaches might be developed through targeting Nlp to increase chemotherapeutic sensitivity.

  7. Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

    ERIC Educational Resources Information Center

    Crossley, Scott A.

    2013-01-01

    This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…

  8. Does Evidence-Based PTS Treatment Reduce PTS Symptoms and Suicide in Iraq and Afghanistan Veterans Seeking VA Care

    DTIC Science & Technology

    We succeeded in developing a Natural Language Processing ( NLP ) System with excellent performance characteristics for determining the type of...people (quadruple-annotated) and7,226 of which were double annotated. We also developed an NLP system to extract PT Checklist (PCL) scores from clinical notes with excellent accuracy (98 positive predictive value).

  9. Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP

    PubMed Central

    Kaggal, Vinod C.; Elayavilli, Ravikumar Komandur; Mehrabi, Saeed; Pankratz, Joshua J.; Sohn, Sunghwan; Wang, Yanshan; Li, Dingcheng; Rastegar, Majid Mojarad; Murphy, Sean P.; Ross, Jason L.; Chaudhry, Rajeev; Buntrock, James D.; Liu, Hongfang

    2016-01-01

    The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future. PMID:27385912

  10. Toward a Learning Health-care System - Knowledge Delivery at the Point of Care Empowered by Big Data and NLP.

    PubMed

    Kaggal, Vinod C; Elayavilli, Ravikumar Komandur; Mehrabi, Saeed; Pankratz, Joshua J; Sohn, Sunghwan; Wang, Yanshan; Li, Dingcheng; Rastegar, Majid Mojarad; Murphy, Sean P; Ross, Jason L; Chaudhry, Rajeev; Buntrock, James D; Liu, Hongfang

    2016-01-01

    The concept of optimizing health care by understanding and generating knowledge from previous evidence, ie, the Learning Health-care System (LHS), has gained momentum and now has national prominence. Meanwhile, the rapid adoption of electronic health records (EHRs) enables the data collection required to form the basis for facilitating LHS. A prerequisite for using EHR data within the LHS is an infrastructure that enables access to EHR data longitudinally for health-care analytics and real time for knowledge delivery. Additionally, significant clinical information is embedded in the free text, making natural language processing (NLP) an essential component in implementing an LHS. Herein, we share our institutional implementation of a big data-empowered clinical NLP infrastructure, which not only enables health-care analytics but also has real-time NLP processing capability. The infrastructure has been utilized for multiple institutional projects including the MayoExpertAdvisor, an individualized care recommendation solution for clinical care. We compared the advantages of big data over two other environments. Big data infrastructure significantly outperformed other infrastructure in terms of computing speed, demonstrating its value in making the LHS a possibility in the near future.

  11. Dietary Nanosized Lactobacillus plantarum Enhances the Anticancer Effect of Kimchi on Azoxymethane and Dextran Sulfate Sodium-Induced Colon Cancer in C57BL/6J Mice.

    PubMed

    Lee, Hyun Ah; Kim, Hyunung; Lee, Kwang-Won; Park, Kun-Young

    2016-01-01

    This study was undertaken to evaluate enhancement of the chemopreventive properties of kimchi by dietary nanosized Lactobacillus (Lab.)plantarum (nLp) in an azoxymethane (AOM)/dextran sulfate sodium (DSS)-induced colitis-associated colorectal cancer C57BL/6J mouse model. nLp is a dead, shrunken, processed form of Lab. Plantarum isolated from kimchi that is 0.5-1.0 µm in size. The results obtained showed that animals fed kimchi with nLp (K-nLp) had longer colons and lower colon weights/length ratios and developed fewer tumors than mice fed kimchi alone (K). In addition, K-nLp administration reduced levels of proinflammatory cytokine serum levels and mediated the mRNA and protein expressions of inflammatory, apoptotic, and cell-cycle markers to suppress inflammation and induce tumor-cell apoptosis and cell-cycle arrest. Moreover, it elevated natural killer-cell cytotoxicity. The study suggests adding nLp to kimchi could improve the suppressive effect of kimchi on AOM/DSS-induced colorectal cancer. These findings indicate nLp has potential use as a functional chemopreventive ingredient in the food industry.

  12. Toward a complete dataset of drug-drug interaction information from publicly available sources.

    PubMed

    Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

    2015-06-01

    Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  13. Inappropriate Expression of an NLP Effector in Colletotrichum orbiculare Impairs Infection on Cucurbitaceae Cultivars via Plant Recognition of the C-Terminal Region.

    PubMed

    Azmi, Nur Sabrina Ahmad; Singkaravanit-Ogawa, Suthitar; Ikeda, Kyoko; Kitakura, Saeko; Inoue, Yoshihiro; Narusaka, Yoshihiro; Shirasu, Ken; Kaido, Masanori; Mise, Kazuyuki; Takano, Yoshitaka

    2018-01-01

    The hemibiotrophic pathogen Colletotrichum orbiculare preferentially expresses a necrosis and ethylene-inducing peptide 1 (Nep1)-like protein named NLP1 during the switch to necrotrophy. Here, we report that the constitutive expression of NLP1 in C. orbiculare blocks pathogen infection in multiple Cucurbitaceae cultivars via their enhanced defense responses. NLP1 has a cytotoxic activity that induces cell death in Nicotiana benthamiana. However, C. orbiculare transgenic lines constitutively expressing a mutant NLP1 lacking the cytotoxic activity still failed to infect cucumber, indicating no clear relationship between cytotoxic activity and the NLP1-dependent enhanced defense. NLP1 also possesses the microbe-associated molecular pattern (MAMP) sequence called nlp24, recognized by Arabidopsis thaliana at its central region, similar to NLPs of other pathogens. Surprisingly, inappropriate expression of a mutant NLP1 lacking the MAMP signature is also effective for blocking pathogen infection, uncoupling the infection block from the corresponding MAMP. Notably, the deletion analyses of NLP1 suggested that the C-terminal region of NLP1 is critical to enhance defense in cucumber. The expression of mCherry fused with the C-terminal 32 amino acids of NLP1 was enough to trigger the defense of cucurbits, revealing that the C-terminal region of the NLP1 protein is recognized by cucurbits and, then, terminates C. orbiculare infection.

  14. Terminology model discovery using natural language processing and visualization techniques.

    PubMed

    Zhou, Li; Tao, Ying; Cimino, James J; Chen, Elizabeth S; Liu, Hongfang; Lussier, Yves A; Hripcsak, George; Friedman, Carol

    2006-12-01

    Medical terminologies are important for unambiguous encoding and exchange of clinical information. The traditional manual method of developing terminology models is time-consuming and limited in the number of phrases that a human developer can examine. In this paper, we present an automated method for developing medical terminology models based on natural language processing (NLP) and information visualization techniques. Surgical pathology reports were selected as the testing corpus for developing a pathology procedure terminology model. The use of a general NLP processor for the medical domain, MedLEE, provides an automated method for acquiring semantic structures from a free text corpus and sheds light on a new high-throughput method of medical terminology model development. The use of an information visualization technique supports the summarization and visualization of the large quantity of semantic structures generated from medical documents. We believe that a general method based on NLP and information visualization will facilitate the modeling of medical terminologies.

  15. Workshop on using natural language processing applications for enhancing clinical decision making: an executive summary

    PubMed Central

    Pai, Vinay M; Rodgers, Mary; Conroy, Richard; Luo, James; Zhou, Ruixia; Seto, Belinda

    2014-01-01

    In April 2012, the National Institutes of Health organized a two-day workshop entitled ‘Natural Language Processing: State of the Art, Future Directions and Applications for Enhancing Clinical Decision-Making’ (NLP-CDS). This report is a summary of the discussions during the second day of the workshop. Collectively, the workshop presenters and participants emphasized the need for unstructured clinical notes to be included in the decision making workflow and the need for individualized longitudinal data tracking. The workshop also discussed the need to: (1) combine evidence-based literature and patient records with machine-learning and prediction models; (2) provide trusted and reproducible clinical advice; (3) prioritize evidence and test results; and (4) engage healthcare professionals, caregivers, and patients. The overall consensus of the NLP-CDS workshop was that there are promising opportunities for NLP and CDS to deliver cognitive support for healthcare professionals, caregivers, and patients. PMID:23921193

  16. Towards comprehensive syntactic and semantic annotations of the clinical narrative

    PubMed Central

    Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F; Warner, Colin; Hwang, Jena D; Choi, Jinho D; Dligach, Dmitriy; Nielsen, Rodney D; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana K

    2013-01-01

    Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible. PMID:23355458

  17. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    PubMed

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  18. Sociolinguistically Informed Natural Language Processing: Automating Irony Detection

    DTIC Science & Technology

    2017-10-23

    ML and NLP technologies fail to detect ironic intent empirically. We specifically proposed to assess quantitatively (using the collected dataset...Aim 2. To analyze when existing ML and NLP technologies fail to detect ironic intent empirically. We specifically proposed to assess quantitatively ...of the embedding reddit thread, and the other comments in this thread) constitute 4 sub-reddit (URL) description number of labeled comments politics

  19. Automatic detection of protected health information from clinic narratives.

    PubMed

    Yang, Hui; Garibaldi, Jonathan M

    2015-12-01

    This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. The role of centrosomal Nlp in the control of mitotic progression and tumourigenesis.

    PubMed

    Li, J; Zhan, Q

    2011-05-10

    The human centrosomal ninein-like protein (Nlp) is a new member of the γ-tubulin complexes binding proteins (GTBPs) that is essential for proper execution of various mitotic events. The primary function of Nlp is to promote microtubule nucleation that contributes to centrosome maturation, spindle formation and chromosome segregation. Its subcellular localisation and protein stability are regulated by several crucial mitotic kinases, such as Plk1, Nek2, Cdc2 and Aurora B. Several lines of evidence have linked Nlp to human cancer. Deregulation of Nlp in cell models results in aberrant spindle, chromosomal missegregation and multinulei, and induces chromosomal instability and renders cells tumourigenic. Overexpression of Nlp induces anchorage-independent growth and immortalised primary cell transformation. In addition, we first demonstrate that the expression of Nlp is elevated primarily due to NLP gene amplification in human breast cancer and lung carcinoma. Consistently, transgenic mice overexpressing Nlp display spontaneous tumours in breast, ovary and testicle, and show rapid onset of radiation-induced lymphoma, indicating that Nlp is involved in tumourigenesis. This review summarises our current knowledge of physiological roles of Nlp, with an emphasis on its potentials in tumourigenesis.

  1. The role of centrosomal Nlp in the control of mitotic progression and tumourigenesis

    PubMed Central

    Li, J; Zhan, Q

    2011-01-01

    The human centrosomal ninein-like protein (Nlp) is a new member of the γ-tubulin complexes binding proteins (GTBPs) that is essential for proper execution of various mitotic events. The primary function of Nlp is to promote microtubule nucleation that contributes to centrosome maturation, spindle formation and chromosome segregation. Its subcellular localisation and protein stability are regulated by several crucial mitotic kinases, such as Plk1, Nek2, Cdc2 and Aurora B. Several lines of evidence have linked Nlp to human cancer. Deregulation of Nlp in cell models results in aberrant spindle, chromosomal missegregation and multinulei, and induces chromosomal instability and renders cells tumourigenic. Overexpression of Nlp induces anchorage-independent growth and immortalised primary cell transformation. In addition, we first demonstrate that the expression of Nlp is elevated primarily due to NLP gene amplification in human breast cancer and lung carcinoma. Consistently, transgenic mice overexpressing Nlp display spontaneous tumours in breast, ovary and testicle, and show rapid onset of radiation-induced lymphoma, indicating that Nlp is involved in tumourigenesis. This review summarises our current knowledge of physiological roles of Nlp, with an emphasis on its potentials in tumourigenesis. PMID:21505454

  2. Direct transcriptional activation of BT genes by NLP transcription factors is a key component of the nitrate response in Arabidopsis.

    PubMed

    Sato, Takeo; Maekawa, Shugo; Konishi, Mineko; Yoshioka, Nozomi; Sasaki, Yuki; Maeda, Haruna; Ishida, Tetsuya; Kato, Yuki; Yamaguchi, Junji; Yanagisawa, Shuichi

    2017-01-29

    Nitrate modulates growth and development, functioning as a nutrient signal in plants. Although many changes in physiological processes in response to nitrate have been well characterized as nitrate responses, the molecular mechanisms underlying the nitrate response are not yet fully understood. Here, we show that NLP transcription factors, which are key regulators of the nitrate response, directly activate the nitrate-inducible expression of BT1 and BT2 encoding putative scaffold proteins with a plant-specific domain structure in Arabidopsis. Interestingly, the 35S promoter-driven expression of BT2 partially rescued growth inhibition caused by reductions in NLP activity in Arabidopsis. Furthermore, simultaneous disruption of BT1 and BT2 affected nitrate-dependent lateral root development. These results suggest that direct activation of BT1 and BT2 by NLP transcriptional activators is a key component of the molecular mechanism underlying the nitrate response in Arabidopsis. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Coordinate regulation of the mother centriole component nlp by nek2 and plk1 protein kinases.

    PubMed

    Rapley, Joseph; Baxter, Joanne E; Blot, Joelle; Wattam, Samantha L; Casenghi, Martina; Meraldi, Patrick; Nigg, Erich A; Fry, Andrew M

    2005-02-01

    Mitotic entry requires a major reorganization of the microtubule cytoskeleton. Nlp, a centrosomal protein that binds gamma-tubulin, is a G(2)/M target of the Plk1 protein kinase. Here, we show that human Nlp and its Xenopus homologue, X-Nlp, are also phosphorylated by the cell cycle-regulated Nek2 kinase. X-Nlp is a 213-kDa mother centriole-specific protein, implicating it in microtubule anchoring. Although constant in abundance throughout the cell cycle, it is displaced from centrosomes upon mitotic entry. Overexpression of active Nek2 or Plk1 causes premature displacement of Nlp from interphase centrosomes. Active Nek2 is also capable of phosphorylating and displacing a mutant form of Nlp that lacks Plk1 phosphorylation sites. Importantly, kinase-inactive Nek2 interferes with Plk1-induced displacement of Nlp from interphase centrosomes and displacement of endogenous Nlp from mitotic spindle poles, while active Nek2 stimulates Plk1 phosphorylation of Nlp in vitro. Unlike Plk1, Nek2 does not prevent association of Nlp with gamma-tubulin. Together, these results provide the first example of a protein involved in microtubule organization that is coordinately regulated at the G(2)/M transition by two centrosomal kinases. We also propose that phosphorylation by Nek2 may prime Nlp for phosphorylation by Plk1.

  4. Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study.

    PubMed

    Zheng, Chengyi; Luo, Yi; Mercado, Cheryl; Sy, Lina; Jacobsen, Steven J; Ackerson, Brad; Lewin, Bruno; Tseng, Hung Fu

    2018-06-19

    Diagnosis codes are inadequate for accurately identifying herpes zoster ophthalmicus (HZO). There is significant lack of population-based studies on HZO due to the high expense of manual review of medical records. To assess whether HZO can be identified from the clinical notes using natural language processing (NLP). To investigate the epidemiology of HZO among HZ population based on the developed approach. A retrospective cohort analysis. A total of 49,914 southern California residents aged over 18 years, who had a new diagnosis of HZ. An NLP-based algorithm was developed and validated with the manually curated validation dataset (n=461). The algorithm was applied on over 1 million clinical notes associated with the study population. HZO versus non-HZO cases were compared by age, sex, race, and comorbidities. We measured the accuracy of NLP algorithm. NLP algorithm achieved 95.6% sensitivity and 99.3% specificity. Compared to the diagnosis codes, NLP identified significant more HZO cases among HZ population (13.9% versus 1.7%). Compared to the non-HZO group, the HZO group was older, had more males, had more Whites, and had more outpatient visits. We developed and validated an automatic method to identify HZO cases with high accuracy. As one of the largest studies on HZO, our finding emphasizes the importance of preventing HZ in the elderly population. This method can be a valuable tool to support population-based studies and clinical care of HZO in the era of big data. This article is protected by copyright. All rights reserved.

  5. The Sentence Fairy: A Natural-Language Generation System to Support Children's Essay Writing

    ERIC Educational Resources Information Center

    Harbusch, Karin; Itsova, Gergana; Koch, Ulrich; Kuhner, Christine

    2008-01-01

    We built an NLP system implementing a "virtual writing conference" for elementary-school children, with German as the target language. Currently, state-of-the-art computer support for writing tasks is restricted to multiple-choice questions or quizzes because automatic parsing of the often ambiguous and fragmentary texts produced by pupils…

  6. RNAi-mediated disruption of neuropeptide genes, nlp-3 and nlp-12, cause multiple behavioral defects in Meloidogyne incognita.

    PubMed

    Dash, Manoranjan; Dutta, Tushar K; Phani, Victor; Papolu, Pradeep K; Shivakumara, Tagginahalli N; Rao, Uma

    2017-08-26

    Owing to the current deficiencies in chemical control options and unavailability of novel management strategies, root-knot nematode (M. incognita) infections remain widespread with significant socio-economic impacts. Helminth nervous systems are peptide-rich and appear to be putative drug targets that could be exploited by antihelmintic chemotherapy. Herein, to characterize the novel peptidergic neurotransmitters, in silico mining of M. incognita genomic and transciptomic datasets revealed the presence of 16 neuropeptide-like protein (nlp) genes with structural hallmarks of neuropeptide preproproteins; among which 13 nlps were PCR-amplified and sequenced. Two key nlp genes (Mi-nlp-3 and Mi-nlp-12) were localized to the basal bulb and tail region of nematode body via in situ hybridization assay. Mi-nlp-3 and Mi-nlp-12 were greatly expressed (in qRT-PCR assay) in the pre-parasitic juveniles and adult females, suggesting the association of these genes in host recognition, development and reproduction of M. incognita. In vitro knockdown of Mi-nlp-3 and Mi-nlp-12 via RNAi demonstrated the significant reduction in attraction and penetration of M. incognita in tomato root in Pluronic gel medium. A pronounced perturbation in development and reproduction of NLP-silenced worms was also documented in adzuki beans in CYG growth pouches. The deleterious phenotypes obtained due to NLP knockdown suggests that transgenic plants engineered to express RNA constructs targeting nlp genes may emerge as an environmentally viable option to manage nematode problems in crop plants. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies

    DTIC Science & Technology

    2010-09-30

    is not always the case—for example, ligatures in Latin-fonts, and glyphs in Arabic fonts (King, 2008; Carrier, 2009). This complexity, and others...such effects can render electronic text useless for natural language processing ( NLP ). Typically, file converters do not expose the details of the...the many component NLP technologies typically used inside information extraction and text categorization applications, such as tokenization, part-of

  8. Centrosomal Nlp is an oncogenic protein that is gene-amplified in human tumors and causes spontaneous tumorigenesis in transgenic mice.

    PubMed

    Shao, Shujuan; Liu, Rong; Wang, Yang; Song, Yongmei; Zuo, Lihui; Xue, Liyan; Lu, Ning; Hou, Ning; Wang, Mingrong; Yang, Xiao; Zhan, Qimin

    2010-02-01

    Disruption of mitotic events contributes greatly to genomic instability and results in mutator phenotypes. Indeed, abnormalities of mitotic components are closely associated with malignant transformation and tumorigenesis. Here we show that ninein-like protein (Nlp), a recently identified BRCA1-associated centrosomal protein involved in microtubule nucleation and spindle formation, is an oncogenic protein. Nlp was found to be overexpressed in approximately 80% of human breast and lung carcinomas analyzed. In human lung cancers, this deregulated expression was associated with NLP gene amplification. Further analysis revealed that Nlp exhibited strong oncogenic properties; for example, it conferred to NIH3T3 rodent fibroblasts the capacity for anchorage-independent growth in vitro and tumor formation in nude mice. Consistent with these data, transgenic mice overexpressing Nlp displayed spontaneous tumorigenesis in the breast, ovary, and testicle within 60 weeks. In addition, Nlp overexpression induced more rapid onset of radiation-induced lymphoma. Furthermore, mouse embryonic fibroblasts (MEFs) derived from Nlp transgenic mice showed centrosome amplification, suggesting that Nlp overexpression mimics BRCA1 loss. These findings demonstrate that Nlp abnormalities may contribute to genomic instability and tumorigenesis and suggest that Nlp might serve as a potential biomarker for clinical diagnosis and therapeutic target.

  9. Centrosomal Nlp is an oncogenic protein that is gene-amplified in human tumors and causes spontaneous tumorigenesis in transgenic mice

    PubMed Central

    Shao, Shujuan; Liu, Rong; Wang, Yang; Song, Yongmei; Zuo, Lihui; Xue, Liyan; Lu, Ning; Hou, Ning; Wang, Mingrong; Yang, Xiao; Zhan, Qimin

    2010-01-01

    Disruption of mitotic events contributes greatly to genomic instability and results in mutator phenotypes. Indeed, abnormalities of mitotic components are closely associated with malignant transformation and tumorigenesis. Here we show that ninein-like protein (Nlp), a recently identified BRCA1-associated centrosomal protein involved in microtubule nucleation and spindle formation, is an oncogenic protein. Nlp was found to be overexpressed in approximately 80% of human breast and lung carcinomas analyzed. In human lung cancers, this deregulated expression was associated with NLP gene amplification. Further analysis revealed that Nlp exhibited strong oncogenic properties; for example, it conferred to NIH3T3 rodent fibroblasts the capacity for anchorage-independent growth in vitro and tumor formation in nude mice. Consistent with these data, transgenic mice overexpressing Nlp displayed spontaneous tumorigenesis in the breast, ovary, and testicle within 60 weeks. In addition, Nlp overexpression induced more rapid onset of radiation-induced lymphoma. Furthermore, mouse embryonic fibroblasts (MEFs) derived from Nlp transgenic mice showed centrosome amplification, suggesting that Nlp overexpression mimics BRCA1 loss. These findings demonstrate that Nlp abnormalities may contribute to genomic instability and tumorigenesis and suggest that Nlp might serve as a potential biomarker for clinical diagnosis and therapeutic target. PMID:20093778

  10. Determining post-test risk in a national sample of stress nuclear myocardial perfusion imaging reports: Implications for natural language processing tools.

    PubMed

    Levy, Andrew E; Shah, Nishant R; Matheny, Michael E; Reeves, Ruth M; Gobbel, Glenn T; Bradley, Steven M

    2018-04-25

    Reporting standards promote clarity and consistency of stress myocardial perfusion imaging (MPI) reports, but do not require an assessment of post-test risk. Natural Language Processing (NLP) tools could potentially help estimate this risk, yet it is unknown whether reports contain adequate descriptive data to use NLP. Among VA patients who underwent stress MPI and coronary angiography between January 1, 2009 and December 31, 2011, 99 stress test reports were randomly selected for analysis. Two reviewers independently categorized each report for the presence of critical data elements essential to describing post-test ischemic risk. Few stress MPI reports provided a formal assessment of post-test risk within the impression section (3%) or the entire document (4%). In most cases, risk was determinable by combining critical data elements (74% impression, 98% whole). If ischemic risk was not determinable (25% impression, 2% whole), inadequate description of systolic function (9% impression, 1% whole) and inadequate description of ischemia (5% impression, 1% whole) were most commonly implicated. Post-test ischemic risk was determinable but rarely reported in this sample of stress MPI reports. This supports the potential use of NLP to help clarify risk. Further study of NLP in this context is needed.

  11. NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records.

    PubMed

    Wang, Yue; Luo, Jin; Hao, Shiying; Xu, Haihua; Shin, Andrew Young; Jin, Bo; Liu, Rui; Deng, Xiaohong; Wang, Lijuan; Zheng, Le; Zhao, Yifan; Zhu, Chunqing; Hu, Zhongkai; Fu, Changlin; Hao, Yanpeng; Zhao, Yingzhen; Jiang, Yunliang; Dai, Dorothy; Culver, Devore S; Alfreds, Shaun T; Todd, Rogow; Stearns, Frank; Sylvester, Karl G; Widen, Eric; Ling, Xuefeng B

    2015-12-01

    In order to proactively manage congestive heart failure (CHF) patients, an effective CHF case finding algorithm is required to process both structured and unstructured electronic medical records (EMR) to allow complementary and cost-efficient identification of CHF patients. We set to identify CHF cases from both EMR codified and natural language processing (NLP) found cases. Using narrative clinical notes from all Maine Health Information Exchange (HIE) patients, the NLP case finding algorithm was retrospectively (July 1, 2012-June 30, 2013) developed with a random subset of HIE associated facilities, and blind-tested with the remaining facilities. The NLP based method was integrated into a live HIE population exploration system and validated prospectively (July 1, 2013-June 30, 2014). Total of 18,295 codified CHF patients were included in Maine HIE. Among the 253,803 subjects without CHF codings, our case finding algorithm prospectively identified 2411 uncodified CHF cases. The positive predictive value (PPV) is 0.914, and 70.1% of these 2411 cases were found to be with CHF histories in the clinical notes. A CHF case finding algorithm was developed, tested and prospectively validated. The successful integration of the CHF case findings algorithm into the Maine HIE live system is expected to improve the Maine CHF care. Copyright © 2015. Published by Elsevier Ireland Ltd.

  12. BRCA1 interaction of centrosomal protein Nlp is required for successful mitotic progression.

    PubMed

    Jin, Shunqian; Gao, Hua; Mazzacurati, Lucia; Wang, Yang; Fan, Wenhong; Chen, Qiang; Yu, Wei; Wang, Mingrong; Zhu, Xueliang; Zhang, Chuanmao; Zhan, Qimin

    2009-08-21

    Breast cancer susceptibility gene BRCA1 is implicated in the control of mitotic progression, although the underlying mechanism(s) remains to be further defined. Deficiency of BRCA1 function leads to disrupted mitotic machinery and genomic instability. Here, we show that BRCA1 physically interacts and colocalizes with Nlp, an important molecule involved in centrosome maturation and spindle formation. Interestingly, Nlp centrosomal localization and its protein stability are regulated by normal cellular BRCA1 function because cells containing BRCA1 mutations or silenced for endogenous BRCA1 exhibit disrupted Nlp colocalization to centrosomes and enhanced Nlp degradation. Its is likely that the BRCA1 regulation of Nlp stability involves Plk1 suppression. Inhibition of endogenous Nlp via the small interfering RNA approach results in aberrant spindle formation, aborted chromosomal segregation, and aneuploidy, which mimic the phenotypes of disrupted BRCA1. Thus, BRCA1 interaction of Nlp might be required for the successful mitotic progression, and abnormalities of Nlp lead to genomic instability.

  13. BRCA1 Interaction of Centrosomal Protein Nlp Is Required for Successful Mitotic Progression*♦

    PubMed Central

    Jin, Shunqian; Gao, Hua; Mazzacurati, Lucia; Wang, Yang; Fan, Wenhong; Chen, Qiang; Yu, Wei; Wang, Mingrong; Zhu, Xueliang; Zhang, Chuanmao; Zhan, Qimin

    2009-01-01

    Breast cancer susceptibility gene BRCA1 is implicated in the control of mitotic progression, although the underlying mechanism(s) remains to be further defined. Deficiency of BRCA1 function leads to disrupted mitotic machinery and genomic instability. Here, we show that BRCA1 physically interacts and colocalizes with Nlp, an important molecule involved in centrosome maturation and spindle formation. Interestingly, Nlp centrosomal localization and its protein stability are regulated by normal cellular BRCA1 function because cells containing BRCA1 mutations or silenced for endogenous BRCA1 exhibit disrupted Nlp colocalization to centrosomes and enhanced Nlp degradation. Its is likely that the BRCA1 regulation of Nlp stability involves Plk1 suppression. Inhibition of endogenous Nlp via the small interfering RNA approach results in aberrant spindle formation, aborted chromosomal segregation, and aneuploidy, which mimic the phenotypes of disrupted BRCA1. Thus, BRCA1 interaction of Nlp might be required for the successful mitotic progression, and abnormalities of Nlp lead to genomic instability. PMID:19509300

  14. Cdc2/cyclin B1 regulates centrosomal Nlp proteolysis and subcellular localization.

    PubMed

    Zhao, Xuelian; Jin, Shunqian; Song, Yongmei; Zhan, Qimin

    2010-11-01

    The formation of proper mitotic spindles is required for appropriate chromosome segregation during cell division. Aberrant spindle formation often causes aneuploidy and results in tumorigenesis. However, the underlying mechanism of regulating spindle formation and chromosome separation remains to be further defined. Centrosomal Nlp (ninein-like protein) is a recently characterized BRCA1-regulated centrosomal protein and plays an important role in centrosome maturation and spindle formation. In this study, we show that Nlp can be phosphorylated by cell cycle protein kinase Cdc2/cyclin B1. The phosphorylation sites of Nlp are mapped at Ser185 and Ser589. Interestingly, the Cdc2/cyclin B1 phosphorylation site Ser185 of Nlp is required for its recognition by PLK1, which enable Nlp depart from centrosomes to allow the establishment of a mitotic scaffold at the onset of mitosis . PLK1 fails to dissociate the Nlp mutant lacking Ser185 from centrosome, suggesting that Cdc2/cyclin B1 might serve as a primary kinase of PLK1 in regulating Nlp subcellular localization. However, the phosphorylation at the site Ser589 by Cdc2/cyclin B1 plays an important role in Nlp protein stability probably due to its effect on protein degradation. Furthermore, we show that deregulated expression or subcellular localization of Nlp lead to multinuclei in cells, indicating that scheduled levels of Nlp and proper subcellular localization of Nlp are critical for successful completion of normal cell mitosis, These findings demonstrate that Cdc2/cyclin B1 is a key regulator in maintaining appropriate degradation and subcellular localization of Nlp, providing novel insights into understanding on the role of Cdc2/cyclin B1 in mitotic progression.

  15. OntoMate: a text-mining tool aiding curation at the Rat Genome Database

    PubMed Central

    Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary

    2015-01-01

    The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558

  16. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine.

    PubMed

    Friedman, Carol; Rindflesch, Thomas C; Corn, Milton

    2013-10-01

    Natural language processing (NLP) is crucial for advancing healthcare because it is needed to transform relevant information locked in text into structured data that can be used by computer processes aimed at improving patient care and advancing medicine. In light of the importance of NLP to health, the National Library of Medicine (NLM) recently sponsored a workshop to review the state of the art in NLP focusing on text in English, both in biomedicine and in the general language domain. Specific goals of the NLM-sponsored workshop were to identify the current state of the art, grand challenges and specific roadblocks, and to identify effective use and best practices. This paper reports on the main outcomes of the workshop, including an overview of the state of the art, strategies for advancing the field, and obstacles that need to be addressed, resulting in recommendations for a research agenda intended to advance the field. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.

    PubMed

    Chiu, Billy; Pyysalo, Sampo; Vulić, Ivan; Korhonen, Anna

    2018-02-05

    Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Notably, verbs have been excluded, although they are essential for the interpretation of biomedical language. Further, current resources do not discern between semantic similarity and semantic relatedness, although this has been proven as an important predictor of the usefulness of word representations and their performance in downstream applications. We present two novel comprehensive resources targeting the evaluation of word representations in biomedicine. These resources, Bio-SimVerb and Bio-SimLex, address the previously mentioned problems, and can be used for evaluations of verb and noun representations respectively. In our experiments, we have computed the Pearson's correlation between performances on intrinsic and extrinsic tasks using twelve popular state-of-the-art representation models (e.g. word2vec models). The intrinsic-extrinsic correlations using our datasets are notably higher than with previous intrinsic evaluation benchmarks such as UMNSRS and MayoSRS. In addition, when evaluating representation models for their abilities to capture verb and noun semantics individually, we show a considerable variation between performances across all models. Bio-SimVerb and Bio-SimLex enable intrinsic evaluation of word representations. This evaluation can serve as a predictor of performance on various downstream tasks in the biomedical domain. The results on Bio-SimVerb and Bio-SimLex using standard word representation models highlight the importance of developing dedicated evaluation resources for NLP in biomedicine for particular word classes (e.g. verbs). These are needed to identify the most accurate methods for learning class-specific representations. Bio-SimVerb and Bio-SimLex are publicly available.

  18. Filling the gaps between tools and users: a tool comparator, using protein-protein interaction as an example.

    PubMed

    Kano, Yoshinobu; Nguyen, Ngan; Saetre, Rune; Yoshida, Kazuhiro; Miyao, Yusuke; Tsuruoka, Yoshimasa; Matsubayashi, Yuichiro; Ananiadou, Sophia; Tsujii, Jun'ichi

    2008-01-01

    Recently, several text mining programs have reached a near-practical level of performance. Some systems are already being used by biologists and database curators. However, it has also been recognized that current Natural Language Processing (NLP) and Text Mining (TM) technology is not easy to deploy, since research groups tend to develop systems that cater specifically to their own requirements. One of the major reasons for the difficulty of deployment of NLP/TM technology is that re-usability and interoperability of software tools are typically not considered during development. While some effort has been invested in making interoperable NLP/TM toolkits, the developers of end-to-end systems still often struggle to reuse NLP/TM tools, and often opt to develop similar programs from scratch instead. This is particularly the case in BioNLP, since the requirements of biologists are so diverse that NLP tools have to be adapted and re-organized in a much more extensive manner than was originally expected. Although generic frameworks like UIMA (Unstructured Information Management Architecture) provide promising ways to solve this problem, the solution that they provide is only partial. In order for truly interoperable toolkits to become a reality, we also need sharable type systems and a developer-friendly environment for software integration that includes functionality for systematic comparisons of available tools, a simple I/O interface, and visualization tools. In this paper, we describe such an environment that was developed based on UIMA, and we show its feasibility through our experience in developing a protein-protein interaction (PPI) extraction system.

  19. Validating a strategy for psychosocial phenotyping using a large corpus of clinical text.

    PubMed

    Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H

    2013-12-01

    To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.

  20. Validating a strategy for psychosocial phenotyping using a large corpus of clinical text

    PubMed Central

    Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H

    2013-01-01

    Objective To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. Materials and methods From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. Results A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6–0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Conclusions Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype. PMID:24169276

  1. Sales Training for Army Recruiter Success: Modeling the Sales Strategies and Skills of Excellent Recruiters

    DTIC Science & Technology

    1987-11-01

    strategies used by excellent Army recruiters. Neurolinguistic programming (NLP) was used as the protocol for modeling performance and acquiring...Behavioral and Social Sciences 3001 Eisenhower Avenue, Alexandria, VA 22333-5600 10. PROGRAM ELEMENT. PROJECT. TASK ARE* 4 WORK UNIT...Modeling ’Expert knowledge,, Neurolinguistics Knowledge engineering; Recruiting Sales, &’ Sales cycle Sales skills Sales strategies 20

  2. A CCG-Based Method for Training a Semantic Role Labeler in the Absence of Explicit Syntactic Training Data

    ERIC Educational Resources Information Center

    Boxwell, Stephen A.

    2011-01-01

    Treebanks are a necessary prerequisite for many NLP tasks, including, but not limited to, semantic role labeling. For many languages, however, treebanks are either nonexistent or too small to be useful. Time-critical applications may require rapid deployment of natural language software for a new critical language--much faster than the development…

  3. Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data.

    PubMed

    Sauer, Brian C; Jones, Barbara E; Globe, Gary; Leng, Jianwei; Lu, Chao-Chin; He, Tao; Teng, Chia-Chen; Sullivan, Patrick; Zeng, Qing

    2016-01-01

    Pulmonary function tests (PFTs) are objective estimates of lung function, but are not reliably stored within the Veteran Health Affairs data systems as structured data. The aim of this study was to validate the natural language processing (NLP) tool we developed-which extracts spirometric values and responses to bronchodilator administration-against expert review, and to estimate the number of additional spirometric tests identified beyond the structured data. All patients at seven Veteran Affairs Medical Centers with a diagnostic code for asthma Jan 1, 2006-Dec 31, 2012 were included. Evidence of spirometry with a bronchodilator challenge (BDC) was extracted from structured data as well as clinical documents. NLP's performance was compared against a human reference standard using a random sample of 1,001 documents. In the validation set NLP demonstrated a precision of 98.9 percent (95 percent confidence intervals (CI): 93.9 percent, 99.7 percent), recall of 97.8 percent (95 percent CI: 92.2 percent, 99.7 percent), and an F-measure of 98.3 percent for the forced vital capacity pre- and post pairs and precision of 100 percent (95 percent CI: 96.6 percent, 100 percent), recall of 100 percent (95 percent CI: 96.6 percent, 100 percent), and an F-measure of 100 percent for the forced expiratory volume in one second pre- and post pairs for bronchodilator administration. Application of the NLP increased the proportion identified with complete bronchodilator challenge by 25 percent. This technology can improve identification of PFTs for epidemiologic research. Caution must be taken in assuming that a single domain of clinical data can completely capture the scope of a disease, treatment, or clinical test.

  4. Mass Spectrometry of Single GABAergic Somatic Motorneurons Identifies a Novel Inhibitory Peptide, As-NLP-22, in the Nematode Ascaris suum.

    PubMed

    Konop, Christopher J; Knickelbine, Jennifer J; Sygulla, Molly S; Wruck, Colin D; Vestling, Martha M; Stretton, Antony O W

    2015-12-01

    Neuromodulators have become an increasingly important component of functional circuits, dramatically changing the properties of both neurons and synapses to affect behavior. To explore the role of neuropeptides in Ascaris suum behavior, we devised an improved method for cleanly dissecting single motorneuronal cell bodies from the many other cell processes and hypodermal tissue in the ventral nerve cord. We determined their peptide content using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS). The reduced complexity of the peptide mixture greatly aided the detection of peptides; peptide levels were sufficient to permit sequencing by tandem MS from single cells. Inhibitory motorneurons, known to be GABAergic, contain a novel neuropeptide, As-NLP-22 (SLASGRWGLRPamide). From this sequence and information from the A. suum expressed sequence tag (EST) database, we cloned the transcript (As-nlp-22) and synthesized a riboprobe for in situ hybridization, which labeled the inhibitory motorneurons; this validates the integrity of the dissection method, showing that the peptides detected originate from the cells themselves and not from adhering processes from other cells (e.g., synaptic terminals). Synthetic As-NLP-22 has potent inhibitory activity on acetylcholine-induced muscle contraction as well as on basal muscle tone. Both of these effects are dose-dependent: the inhibitory effect on ACh contraction has an IC50 of 8.3 × 10(-9) M. When injected into whole worms, As-NLP-22 produces a dose-dependent inhibition of locomotory movements and, at higher levels, complete paralysis. These experiments demonstrate the utility of MALDI TOF/TOF MS in identifying novel neuromodulators at the single-cell level. Graphical Abstract ᅟ.

  5. Mass Spectrometry of Single GABAergic Somatic Motorneurons Identifies a Novel Inhibitory Peptide, As-NLP-22, in the Nematode Ascaris suum

    NASA Astrophysics Data System (ADS)

    Konop, Christopher J.; Knickelbine, Jennifer J.; Sygulla, Molly S.; Wruck, Colin D.; Vestling, Martha M.; Stretton, Antony O. W.

    2015-12-01

    Neuromodulators have become an increasingly important component of functional circuits, dramatically changing the properties of both neurons and synapses to affect behavior. To explore the role of neuropeptides in Ascaris suum behavior, we devised an improved method for cleanly dissecting single motorneuronal cell bodies from the many other cell processes and hypodermal tissue in the ventral nerve cord. We determined their peptide content using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS). The reduced complexity of the peptide mixture greatly aided the detection of peptides; peptide levels were sufficient to permit sequencing by tandem MS from single cells. Inhibitory motorneurons, known to be GABAergic, contain a novel neuropeptide, As-NLP-22 (SLASGRWGLRPamide). From this sequence and information from the A. suum expressed sequence tag (EST) database, we cloned the transcript ( As-nlp-22) and synthesized a riboprobe for in situ hybridization, which labeled the inhibitory motorneurons; this validates the integrity of the dissection method, showing that the peptides detected originate from the cells themselves and not from adhering processes from other cells (e.g., synaptic terminals). Synthetic As-NLP-22 has potent inhibitory activity on acetylcholine-induced muscle contraction as well as on basal muscle tone. Both of these effects are dose-dependent: the inhibitory effect on ACh contraction has an IC50 of 8.3 × 10-9 M. When injected into whole worms, As-NLP-22 produces a dose-dependent inhibition of locomotory movements and, at higher levels, complete paralysis. These experiments demonstrate the utility of MALDI TOF/TOF MS in identifying novel neuromodulators at the single-cell level.

  6. An Evaluation of a Natural Language Processing Tool for Identifying and Encoding Allergy Information in Emergency Department Clinical Notes

    PubMed Central

    Goss, Foster R.; Plasek, Joseph M.; Lau, Jason J.; Seger, Diane L.; Chang, Frank Y.; Zhou, Li

    2014-01-01

    Emergency department (ED) visits due to allergic reactions are common. Allergy information is often recorded in free-text provider notes; however, this domain has not yet been widely studied by the natural language processing (NLP) community. We developed an allergy module built on the MTERMS NLP system to identify and encode food, drug, and environmental allergies and allergic reactions. The module included updates to our lexicon using standard terminologies, and novel disambiguation algorithms. We developed an annotation schema and annotated 400 ED notes that served as a gold standard for comparison to MTERMS output. MTERMS achieved an F-measure of 87.6% for the detection of allergen names and no known allergies, 90% for identifying true reactions in each allergy statement where true allergens were also identified, and 69% for linking reactions to their allergen. These preliminary results demonstrate the feasibility using NLP to extract and encode allergy information from clinical notes. PMID:25954363

  7. Universality of next-to-leading power threshold effects for colourless final states in hadronic collisions

    NASA Astrophysics Data System (ADS)

    Del Duca, V.; Laenen, E.; Magnea, L.; Vernazza, L.; White, C. D.

    2017-11-01

    We consider the production of an arbitrary number of colour-singlet particles near partonic threshold, and show that next-to-leading order cross sections for this class of processes have a simple universal form at next-to-leading power (NLP) in the energy of the emitted gluon radiation. Our analysis relies on a recently derived factorisation formula for NLP threshold effects at amplitude level, and therefore applies both if the leading-order process is tree-level and if it is loop-induced. It holds for differential distributions as well. The results can furthermore be seen as applications of recently derived next-to-soft theorems for gauge theory amplitudes. We use our universal expression to re-derive known results for the production of up to three Higgs bosons at NLO in the large top mass limit, and for the hadro-production of a pair of electroweak gauge bosons. Finally, we present new analytic results for Higgs boson pair production at NLO and NLP, with exact top-mass dependence.

  8. Molecular characterization and functional analysis of a necrosis- and ethylene-inducing, protein-encoding gene family from Verticillium dahliae.

    PubMed

    Zhou, Bang-Jun; Jia, Pei-Song; Gao, Feng; Guo, Hui-Shan

    2012-07-01

    Verticillium dahliae Kleb. is a hemibiotrophic, phytopathogenic fungus that causes wilt disease in a wide range of crops, including cotton. Successful host colonization by hemibiotrophic pathogens requires the induction of plant cell death to provide the saprophytic nutrition for the transition from the biotrophic to the necrotrophic stage. In this study, we identified a necrosis-inducing Phytophthora protein (NPP1) domain-containing protein family containing nine genes in a virulent, defoliating isolate of V. dahliae (V592), named the VdNLP genes. Functional analysis demonstrated that only two of these VdNLP genes, VdNLP1 and VdNLP2, encoded proteins that were capable of inducing necrotic lesions and triggering defense responses in Nicotiana benthamiana, Arabidopsis, and cotton plants. Both VdNLP1 and VdNLP2 induced the wilting of cotton seedling cotyledons. However, gene-deletion mutants targeted by VdNLP1, VdNLP2, or both did not affect the pathogenicity of V. dahliae V592 in cotton infection. Similar expression and induction patterns were found for seven of the nine VdNLP transcripts. Through a comparison of the conserved amino acid residues of VdNLP with different necrosis-inducing activities, combined with mutagenesis-based analyses, we identified several novel conserved amino acid residues, in addition to the known conserved heptapeptide GHRHDWE motif and the cysteine residues of the NPP domain-containing protein, that are indispensable for the necrosis-inducing activity of the VdNLP2 protein.

  9. The potential of zwitterionic nanoliposomes against neurotoxic alpha-synuclein aggregates in Parkinson's Disease.

    PubMed

    Aliakbari, Farhang; Mohammad-Beigi, Hossein; Rezaei-Ghaleh, Nasrollah; Becker, Stefan; Dehghani Esmatabad, Faezeh; Eslampanah Seyedi, Hadieh Alsadat; Bardania, Hassan; Tayaranian Marvian, Amir; Collingwood, Joanna F; Christiansen, Gunna; Zweckstetter, Markus; Otzen, Daniel E; Morshedi, Dina

    2018-05-17

    The protein α-synuclein (αSN) aggregates to form fibrils in neuronal cells of Parkinson's patients. Here we report on the effect of neutral (zwitterionic) nanoliposomes (NLPs), supplemented with cholesterol (NLP-Chol) and decorated with PEG (NLP-Chol-PEG), on αSN aggregation and neurotoxicity. Both NLPs retard αSN fibrillization in a concentration-independent fashion. They do so largely by increasing lag time (formation of fibrillization nuclei) rather than elongation (extension of existing nuclei). Interactions between neutral NLPs and αSN may locate to the N-terminus of the protein. This interaction can even perturb the interaction of αSN with negatively charged NLPs which induces an α-helical structure in αSN. This interaction was found to occur throughout the fibrillization process. Both NLP-Chol and NLP-Chol-PEG were shown to be biocompatible in vitro, and to reduce αSN neurotoxicity and reactive oxygen species (ROS) levels with no influence on intracellular calcium in neuronal cells, emphasizing a prospective role for NLPs in reducing αSN pathogenicity in vivo as well as utility as a vehicle for drug delivery.

  10. DNA-targeted 2-nitroimidazoles: studies of the influence of the phenanthridine-linked nitroimidazoles, 2-NLP-3 and 2-NLP-4, on DNA damage induced by ionizing radiation.

    PubMed

    Buchko, Garry W; Weinfeld, Michael

    2002-09-01

    The nitroimidazole-linked phenanthridines 2-NLP-3 (5-[3-(2-nitro-1-imidazoyl)-propyl]-phenanthridinium bromide) and 2-NLP-4 (5-[3-(2-nitro-1-imidazoyl)-butyl]-phenanthridinium bromide) are composed of the radiosensitizer, 2-nitroimidazole, attached to the DNA intercalator phenanthridine by a 3- and 4-carbon linker, respectively. Previous in vitro assays showed both compounds to be 10-100 times more efficient as hypoxic cell radiosensitizers (based on external drug concentrations) than the untargeted 2-nitroimidazole radiosensitizer, misonidazole (Cowan et al., Radiat. Res. 127, 81-89, 1991). Here we have used a (32)P postlabeling assay and 5'-end-labeled oligonucleotide assay to compare the radiation-induced DNA damage generated in the presence of 2-NLP-3, 2-NLP-4, phenanthridine and misonidazole. After irradiation of the DNA under anoxic conditions, we observed a significantly greater level of 3'-phosphoglycolate DNA damage in the presence of 2-NLP-3 or 2-NLP-4 compared to irradiation of the DNA in the presence of misonidazole. This may account at least in part for the greater cellular radiosensitization shown by the nitroimidazole-linked phenanthridines over misonidazole. Of the two nitroimidazole-linked phenanthridines, the better in vitro radiosensitizer, 2-NLP-4, generated more 3'-phosphoglycolate in DNA than did 2-NLP-3. At all concentrations, phenanthridine had little effect on the levels of DNA damage, suggesting that the enhanced radiosensitization displayed by 2-NLP-3 and 2-NLP-4 is due to the localization of the 2-nitroimidazole to the DNA by the phenanthridine substituent and not to radiosensitization by the phenanthridine moiety itself.

  11. Drosophila TAP/p32 is a core histone chaperone that cooperates with NAP-1, NLP, and nucleophosmin in sperm chromatin remodeling during fertilization

    PubMed Central

    Emelyanov, Alexander V.; Rabbani, Joshua; Mehta, Monika; Vershilova, Elena; Keogh, Michael C.

    2014-01-01

    Nuclear DNA in the male gamete of sexually reproducing animals is organized as sperm chromatin compacted primarily by sperm-specific protamines. Fertilization leads to sperm chromatin remodeling, during which protamines are expelled and replaced by histones. Despite our increased understanding of the factors that mediate nucleosome assembly in the nascent male pronucleus, the machinery for protamine removal remains largely unknown. Here we identify four Drosophila protamine chaperones that mediate the dissociation of protamine–DNA complexes: NAP-1, NLP, and nucleophosmin are previously characterized histone chaperones, and TAP/p32 has no known function in chromatin metabolism. We show that TAP/p32 is required for the removal of Drosophila protamine B in vitro, whereas NAP-1, NLP, and Nph share roles in the removal of protamine A. Embryos from P32-null females show defective formation of the male pronucleus in vivo. TAP/p32, similar to NAP-1, NLP, and Nph, facilitates nucleosome assembly in vitro and is therefore a histone chaperone. Furthermore, mutants of P32, Nlp, and Nph exhibit synthetic-lethal genetic interactions. In summary, we identified factors mediating protamine removal from DNA and reconstituted in a defined system the process of sperm chromatin remodeling that exchanges protamines for histones to form the nucleosome-based chromatin characteristic of somatic cells. PMID:25228646

  12. Mitotic regulator Nlp interacts with XPA/ERCC1 complexes and regulates nucleotide excision repair (NER) in response to UV radiation.

    PubMed

    Ma, Xiao-Juan; Shang, Li; Zhang, Wei-Min; Wang, Ming-Rong; Zhan, Qi-Min

    2016-04-10

    Cellular response to DNA damage, including ionizing radiation (IR) and UV radiation, is critical for the maintenance of genomic fidelity. Defects of DNA repair often result in genomic instability and malignant cell transformation. Centrosomal protein Nlp (ninein-like protein) has been characterized as an important cell cycle regulator that is required for proper mitotic progression. In this study, we demonstrate that Nlp is able to improve nucleotide excision repair (NER) activity and protects cells against UV radiation. Upon exposure of cells to UVC, Nlp is translocated into the nucleus. The C-terminus (1030-1382) of Nlp is necessary and sufficient for its nuclear import. Upon UVC radiation, Nlp interacts with XPA and ERCC1, and enhances their association. Interestingly, down-regulated expression of Nlp is found to be associated with human skin cancers, indicating that dysregulated Nlp might be related to the development of human skin cancers. Taken together, this study identifies mitotic protein Nlp as a new and important member of NER pathway and thus provides novel insights into understanding of regulatory machinery involved in NER. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. A Part-Of-Speech term weighting scheme for biomedical information retrieval.

    PubMed

    Wang, Yanshan; Wu, Stephen; Li, Dingcheng; Mehrabi, Saeed; Liu, Hongfang

    2016-10-01

    In the era of digitalization, information retrieval (IR), which retrieves and ranks documents from large collections according to users' search queries, has been popularly applied in the biomedical domain. Building patient cohorts using electronic health records (EHRs) and searching literature for topics of interest are some IR use cases. Meanwhile, natural language processing (NLP), such as tokenization or Part-Of-Speech (POS) tagging, has been developed for processing clinical documents or biomedical literature. We hypothesize that NLP can be incorporated into IR to strengthen the conventional IR models. In this study, we propose two NLP-empowered IR models, POS-BoW and POS-MRF, which incorporate automatic POS-based term weighting schemes into bag-of-word (BoW) and Markov Random Field (MRF) IR models, respectively. In the proposed models, the POS-based term weights are iteratively calculated by utilizing a cyclic coordinate method where golden section line search algorithm is applied along each coordinate to optimize the objective function defined by mean average precision (MAP). In the empirical experiments, we used the data sets from the Medical Records track in Text REtrieval Conference (TREC) 2011 and 2012 and the Genomics track in TREC 2004. The evaluation on TREC 2011 and 2012 Medical Records tracks shows that, for the POS-BoW models, the mean improvement rates for IR evaluation metrics, MAP, bpref, and P@10, are 10.88%, 4.54%, and 3.82%, compared to the BoW models; and for the POS-MRF models, these rates are 13.59%, 8.20%, and 8.78%, compared to the MRF models. Additionally, we experimentally verify that the proposed weighting approach is superior to the simple heuristic and frequency based weighting approaches, and validate our POS category selection. Using the optimal weights calculated in this experiment, we tested the proposed models on the TREC 2004 Genomics track and obtained average of 8.63% and 10.04% improvement rates for POS-BoW and POS-MRF, respectively. These significant improvements verify the effectiveness of leveraging POS tagging for biomedical IR tasks. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Using natural language processing to identify problem usage of prescription opioids.

    PubMed

    Carrell, David S; Cronkite, David; Palmer, Roy E; Saunders, Kathleen; Gross, David E; Masters, Elizabeth T; Hylan, Timothy R; Von Korff, Michael

    2015-12-01

    Accurate and scalable surveillance methods are critical to understand widespread problems associated with misuse and abuse of prescription opioids and for implementing effective prevention and control measures. Traditional diagnostic coding incompletely documents problem use. Relevant information for each patient is often obscured in vast amounts of clinical text. We developed and evaluated a method that combines natural language processing (NLP) and computer-assisted manual review of clinical notes to identify evidence of problem opioid use in electronic health records (EHRs). We used the EHR data and text of 22,142 patients receiving chronic opioid therapy (≥70 days' supply of opioids per calendar quarter) during 2006-2012 to develop and evaluate an NLP-based surveillance method and compare it to traditional methods based on International Classification of Disease, Ninth Edition (ICD-9) codes. We developed a 1288-term dictionary for clinician mentions of opioid addiction, abuse, misuse or overuse, and an NLP system to identify these mentions in unstructured text. The system distinguished affirmative mentions from those that were negated or otherwise qualified. We applied this system to 7336,445 electronic chart notes of the 22,142 patients. Trained abstractors using a custom computer-assisted software interface manually reviewed 7751 chart notes (from 3156 patients) selected by the NLP system and classified each note as to whether or not it contained textual evidence of problem opioid use. Traditional diagnostic codes for problem opioid use were found for 2240 (10.1%) patients. NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes. Inter-rater reliability among pairs of abstractors reviewing notes was high, with kappa=0.86 and 97% agreement for one pair, and kappa=0.71 and 88% agreement for another pair. Scalable, semi-automated NLP methods can efficiently and accurately identify evidence of problem opioid use in vast amounts of EHR text. Incorporating such methods into surveillance efforts may increase prevalence estimates by as much as one-third relative to traditional methods. Copyright © 2015. Published by Elsevier Ireland Ltd.

  15. Combining textual and visual information for image retrieval in the medical domain.

    PubMed

    Gkoufas, Yiannis; Morou, Anna; Kalamboukis, Theodore

    2011-01-01

    In this article we have assembled the experience obtained from our participation in the imageCLEF evaluation task over the past two years. Exploitation on the use of linear combinations for image retrieval has been attempted by combining visual and textual sources of images. From our experiments we conclude that a mixed retrieval technique that applies both textual and visual retrieval in an interchangeably repeated manner improves the performance while overcoming the scalability limitations of visual retrieval. In particular, the mean average precision (MAP) has increased from 0.01 to 0.15 and 0.087 for 2009 and 2010 data, respectively, when content-based image retrieval (CBIR) is performed on the top 1000 results from textual retrieval based on natural language processing (NLP).

  16. An information extraction framework for cohort identification using electronic health records.

    PubMed

    Liu, Hongfang; Bielinski, Suzette J; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B; Jonnalagadda, Siddhartha R; Ravikumar, K E; Wu, Stephen T; Kullo, Iftikhar J; Chute, Christopher G

    2013-01-01

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.

  17. Prediction task guided representation learning of medical codes in EHR.

    PubMed

    Cui, Liwen; Xie, Xiaolei; Shen, Zuojun

    2018-06-18

    There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples. Copyright © 2018. Published by Elsevier Inc.

  18. Eudicot plant-specific sphingolipids determine host selectivity of microbial NLP cytolysins.

    PubMed

    Lenarčič, Tea; Albert, Isabell; Böhm, Hannah; Hodnik, Vesna; Pirc, Katja; Zavec, Apolonija B; Podobnik, Marjetka; Pahovnik, David; Žagar, Ema; Pruitt, Rory; Greimel, Peter; Yamaji-Hasegawa, Akiko; Kobayashi, Toshihide; Zienkiewicz, Agnieszka; Gömann, Jasmin; Mortimer, Jenny C; Fang, Lin; Mamode-Cassim, Adiilah; Deleu, Magali; Lins, Laurence; Oecking, Claudia; Feussner, Ivo; Mongrand, Sébastien; Anderluh, Gregor; Nürnberger, Thorsten

    2017-12-15

    Necrosis and ethylene-inducing peptide 1-like (NLP) proteins constitute a superfamily of proteins produced by plant pathogenic bacteria, fungi, and oomycetes. Many NLPs are cytotoxins that facilitate microbial infection of eudicot, but not of monocot plants. Here, we report glycosylinositol phosphorylceramide (GIPC) sphingolipids as NLP toxin receptors. Plant mutants with altered GIPC composition were more resistant to NLP toxins. Binding studies and x-ray crystallography showed that NLPs form complexes with terminal monomeric hexose moieties of GIPCs that result in conformational changes within the toxin. Insensitivity to NLP cytolysins of monocot plants may be explained by the length of the GIPC head group and the architecture of the NLP sugar-binding site. We unveil early steps in NLP cytolysin action that determine plant clade-specific toxin selectivity. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  19. Polo-like kinase 1 regulates Nlp, a centrosome protein involved in microtubule nucleation.

    PubMed

    Casenghi, Martina; Meraldi, Patrick; Weinhart, Ulrike; Duncan, Peter I; Körner, Roman; Nigg, Erich A

    2003-07-01

    In animal cells, most microtubules are nucleated at centrosomes. At the onset of mitosis, centrosomes undergo a structural reorganization, termed maturation, which leads to increased microtubule nucleation activity. Centrosome maturation is regulated by several kinases, including Polo-like kinase 1 (Plk1). Here, we identify a centrosomal Plk1 substrate, termed Nlp (ninein-like protein), whose properties suggest an important role in microtubule organization. Nlp interacts with two components of the gamma-tubulin ring complex and stimulates microtubule nucleation. Plk1 phosphorylates Nlp and disrupts both its centrosome association and its gamma-tubulin interaction. Overexpression of an Nlp mutant lacking Plk1 phosphorylation sites severely disturbs mitotic spindle formation. We propose that Nlp plays an important role in microtubule organization during interphase, and that the activation of Plk1 at the onset of mitosis triggers the displacement of Nlp from the centrosome, allowing the establishment of a mitotic scaffold with enhanced microtubule nucleation activity.

  20. NLP is a novel transcription regulator involved in VSG expression site control in Trypanosoma brucei.

    PubMed

    Narayanan, Mani Shankar; Kushwaha, Manish; Ersfeld, Klaus; Fullbrook, Alexander; Stanne, Tara M; Rudenko, Gloria

    2011-03-01

    Trypanosoma brucei mono-allelically expresses one of approximately 1500 variant surface glycoprotein (VSG) genes while multiplying in the mammalian bloodstream. The active VSG is transcribed by RNA polymerase I in one of approximately 15 telomeric VSG expression sites (ESs). T. brucei is unusual in controlling gene expression predominantly post-transcriptionally, and how ESs are mono-allelically controlled remains a mystery. Here we identify a novel transcription regulator, which resembles a nucleoplasmin-like protein (NLP) with an AT-hook motif. NLP is key for ES control in bloodstream form T. brucei, as NLP knockdown results in 45- to 65-fold derepression of the silent VSG221 ES. NLP is also involved in repression of transcription in the inactive VSG Basic Copy arrays, minichromosomes and procyclin loci. NLP is shown to be enriched on the 177- and 50-bp simple sequence repeats, the non-transcribed regions around rDNA and procyclin, and both active and silent ESs. Blocking NLP synthesis leads to downregulation of the active ES, indicating that NLP plays a role in regulating appropriate levels of transcription of ESs in both their active and silent state. Discovery of the unusual transcription regulator NLP provides new insight into the factors that are critical for ES control.

  1. An Introduction to Natural Language Processing: How You Can Get More From Those Electronic Notes You Are Generating.

    PubMed

    Kimia, Amir A; Savova, Guergana; Landschaft, Assaf; Harper, Marvin B

    2015-07-01

    Electronically stored clinical documents may contain both structured data and unstructured data. The use of structured clinical data varies by facility, but clinicians are familiar with coded data such as International Classification of Diseases, Ninth Revision, Systematized Nomenclature of Medicine-Clinical Terms codes, and commonly other data including patient chief complaints or laboratory results. Most electronic health records have much more clinical information stored as unstructured data, for example, clinical narrative such as history of present illness, procedure notes, and clinical decision making are stored as unstructured data. Despite the importance of this information, electronic capture or retrieval of unstructured clinical data has been challenging. The field of natural language processing (NLP) is undergoing rapid development, and existing tools can be successfully used for quality improvement, research, healthcare coding, and even billing compliance. In this brief review, we provide examples of successful uses of NLP using emergency medicine physician visit notes for various projects and the challenges of retrieving specific data and finally present practical methods that can run on a standard personal computer as well as high-end state-of-the-art funded processes run by leading NLP informatics researchers.

  2. NlpI contributes to Escherichia coli K1 strain RS218 interaction with human brain microvascular endothelial cells.

    PubMed

    Teng, Ching-Hao; Tseng, Yu-Ting; Maruvada, Ravi; Pearce, Donna; Xie, Yi; Paul-Satyaseela, Maneesh; Kim, Kwang Sik

    2010-07-01

    Escherichia coli K1 is the most common Gram-negative bacillary organism causing neonatal meningitis. E. coli K1 binding to and invasion of human brain microvascular endothelial cells (HBMECs) is a prerequisite for its traversal of the blood-brain barrier (BBB) and penetration into the brain. In the present study, we identified NlpI as a novel bacterial determinant contributing to E. coli K1 interaction with HBMECs. The deletion of nlpI did not affect the expression of the known bacterial determinants involved in E. coli K1-HBMEC interaction, such as type 1 fimbriae, flagella, and OmpA, and the contribution of NlpI to HBMECs binding and invasion was independent of those bacterial determinants. Previous reports have shown that the nlpI mutant of E. coli K-12 exhibits growth defect at 42 degrees C at low osmolarity, and its thermosensitive phenotype can be suppressed by a mutation on the spr gene. The nlpI mutant of strain RS218 exhibited similar thermosensitive phenotype, but additional spr mutation did not restore the ability of the nlpI mutant to interact with HBMECs. These findings suggest the decreased ability of the nlpI mutant to interact with HBMECs is not associated with the thermosensitive phenotype. NlpI was determined as an outer membrane-anchored protein in E. coli, and the nlpI mutant was defective in cytosolic phospholipase A(2)alpha (cPLA(2)alpha) phosphorylation compared to the parent strain. These findings illustrate the first demonstration of NlpI's contribution to E. coli K1 binding to and invasion of HBMECs, and its contribution is likely to involve cPLA(2)alpha.

  3. De-identification of clinical notes via recurrent neural network and conditional random field.

    PubMed

    Liu, Zengjian; Tang, Buzhou; Wang, Xiaolong; Chen, Qingcai

    2017-11-01

    De-identification, identifying information from data, such as protected health information (PHI) present in clinical data, is a critical step to enable data to be shared or published. The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and RDOC Individualized Domains (N-GRID) clinical natural language processing (NLP) challenge contains a de-identification track in de-identifying electronic medical records (EMRs) (i.e., track 1). The challenge organizers provide 1000 annotated mental health records for this track, 600 out of which are used as a training set and 400 as a test set. We develop a hybrid system for the de-identification task on the training set. Firstly, four individual subsystems, that is, a subsystem based on bidirectional LSTM (long-short term memory, a variant of recurrent neural network), a subsystem-based on bidirectional LSTM with features, a subsystem based on conditional random field (CRF) and a rule-based subsystem, are used to identify PHI instances. Then, an ensemble learning-based classifiers is deployed to combine all PHI instances predicted by above three machine learning-based subsystems. Finally, the results of the ensemble learning-based classifier and the rule-based subsystem are merged together. Experiments conducted on the official test set show that our system achieves the highest micro F1-scores of 93.07%, 91.43% and 95.23% under the "token", "strict" and "binary token" criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge. In addition, on the dataset of 2014 i2b2 NLP challenge, our system achieves the highest micro F1-scores of 96.98%, 95.11% and 98.28% under the "token", "strict" and "binary token" criteria respectively, outperforming other state-of-the-art systems. All these experiments prove the effectiveness of our proposed method. Copyright © 2017. Published by Elsevier Inc.

  4. Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator.

    PubMed

    Ramanan, S V; Radhakrishna, Kedar; Waghmare, Abijeet; Raj, Tony; Nathan, Senthil P; Sreerama, Sai Madhukar; Sampath, Sriram

    2016-08-01

    Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.

  5. Drosophila TAP/p32 is a core histone chaperone that cooperates with NAP-1, NLP, and nucleophosmin in sperm chromatin remodeling during fertilization.

    PubMed

    Emelyanov, Alexander V; Rabbani, Joshua; Mehta, Monika; Vershilova, Elena; Keogh, Michael C; Fyodorov, Dmitry V

    2014-09-15

    Nuclear DNA in the male gamete of sexually reproducing animals is organized as sperm chromatin compacted primarily by sperm-specific protamines. Fertilization leads to sperm chromatin remodeling, during which protamines are expelled and replaced by histones. Despite our increased understanding of the factors that mediate nucleosome assembly in the nascent male pronucleus, the machinery for protamine removal remains largely unknown. Here we identify four Drosophila protamine chaperones that mediate the dissociation of protamine-DNA complexes: NAP-1, NLP, and nucleophosmin are previously characterized histone chaperones, and TAP/p32 has no known function in chromatin metabolism. We show that TAP/p32 is required for the removal of Drosophila protamine B in vitro, whereas NAP-1, NLP, and Nph share roles in the removal of protamine A. Embryos from P32-null females show defective formation of the male pronucleus in vivo. TAP/p32, similar to NAP-1, NLP, and Nph, facilitates nucleosome assembly in vitro and is therefore a histone chaperone. Furthermore, mutants of P32, Nlp, and Nph exhibit synthetic-lethal genetic interactions. In summary, we identified factors mediating protamine removal from DNA and reconstituted in a defined system the process of sperm chromatin remodeling that exchanges protamines for histones to form the nucleosome-based chromatin characteristic of somatic cells. © 2014 Emelyanov et al.; Published by Cold Spring Harbor Laboratory Press.

  6. Natural language processing and visualization in the molecular imaging domain.

    PubMed

    Tulipano, P Karina; Tao, Ying; Millar, William S; Zanzonico, Pat; Kolbert, Katherine; Xu, Hua; Yu, Hong; Chen, Lifeng; Lussier, Yves A; Friedman, Carol

    2007-06-01

    Molecular imaging is at the crossroads of genomic sciences and medical imaging. Information within the molecular imaging literature could be used to link to genomic and imaging information resources and to organize and index images in a way that is potentially useful to researchers. A number of natural language processing (NLP) systems are available to automatically extract information from genomic literature. One existing NLP system, known as BioMedLEE, automatically extracts biological information consisting of biomolecular substances and phenotypic data. This paper focuses on the adaptation, evaluation, and application of BioMedLEE to the molecular imaging domain. In order to adapt BioMedLEE for this domain, we extend an existing molecular imaging terminology and incorporate it into BioMedLEE. BioMedLEE's performance is assessed with a formal evaluation study. The system's performance, measured as recall and precision, is 0.74 (95% CI: [.70-.76]) and 0.70 (95% CI [.63-.76]), respectively. We adapt a JAVA viewer known as PGviewer for the simultaneous visualization of images with NLP extracted information.

  7. An annotated corpus with nanomedicine and pharmacokinetic parameters

    PubMed Central

    Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

    2017-01-01

    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. PMID:29066897

  8. An early illness recognition framework using a temporal Smith Waterman algorithm and NLP.

    PubMed

    Hajihashemi, Zahra; Popescu, Mihail

    2013-01-01

    In this paper we propose a framework for detecting health patterns based on non-wearable sensor sequence similarity and natural language processing (NLP). In TigerPlace, an aging in place facility from Columbia, MO, we deployed 47 sensor networks together with a nursing electronic health record (EHR) system to provide early illness recognition. The proposed framework utilizes sensor sequence similarity and NLP on EHR nursing comments to automatically notify the physician when health problems are detected. The reported methodology is inspired by genomic sequence annotation using similarity algorithms such as Smith Waterman (SW). Similarly, for each sensor sequence, we associate health concepts extracted from the nursing notes using Metamap, a NLP tool provided by Unified Medical Language System (UMLS). Since sensor sequences, unlike genomics ones, have an associated time dimension we propose a temporal variant of SW (TSW) to account for time. The main challenges presented by our framework are finding the most suitable time sequence similarity and aggregation of the retrieved UMLS concepts. On a pilot dataset from three Tiger Place residents, with a total of 1685 sensor days and 626 nursing records, we obtained an average precision of 0.64 and a recall of 0.37.

  9. Neurolinguistic programming: a systematic review of the effects on health outcomes.

    PubMed

    Sturt, Jackie; Ali, Saima; Robertson, Wendy; Metcalfe, David; Grove, Amy; Bourne, Claire; Bridle, Chris

    2012-11-01

    Neurolinguistic programming (NLP) in health care has captured the interest of doctors, healthcare professionals, and managers. To evaluate the effects of NLP on health-related outcomes. Systematic review of experimental studies. The following data sources were searched: MEDLINE, PsycINFO, ASSIA, AMED, CINAHL, Web of Knowledge, CENTRAL, NLP specialist databases, reference lists, review articles, and NLP professional associations, training providers, and research groups. Searches revealed 1459 titles from which 10 experimental studies were included. Five studies were randomised controlled trials (RCTs) and five were pre-post studies. Targeted health conditions were anxiety disorders, weight maintenance, morning sickness, substance misuse, and claustrophobia during MRI scanning. NLP interventions were mainly delivered across 4-20 sessions although three were single session. Eighteen outcomes were reported and the RCT sample sizes ranged from 22 to 106. Four RCTs reported no significant between group differences with the fifth finding in favour of the NLP arm (F = 8.114, P<0.001). Three RCTs and five pre-post studies reported within group improvements. Risk of bias across all studies was high or uncertain. There is little evidence that NLP interventions improve health-related outcomes. This conclusion reflects the limited quantity and quality of NLP research, rather than robust evidence of no effect. There is currently insufficient evidence to support the allocation of NHS resources to NLP activities outside of research purposes.

  10. Creation of Lung-Targeted Dexamethasone Immunoliposome and Its Therapeutic Effect on Bleomycin-Induced Lung Injury in Rats

    PubMed Central

    Li, Nan; Hu, Yang; Zhang, Yuan; Xu, Jin-Fu; Li, Xia; Ren, Jie; Su, Bo; Yuan, Wei-Zhong; Teng, Xin-Rong; Zhang, Rong-Xuan; Jiang, Dian-hua; Mulet, Xavier; Li, Hui-Ping

    2013-01-01

    Objective Acute lung injury (ALI), is a major cause of morbidity and mortality, which is routinely treated with the administration of systemic glucocorticoids. The current study investigated the distribution and therapeutic effect of a dexamethasone(DXM)-loaded immunoliposome (NLP) functionalized with pulmonary surfactant protein A (SP-A) antibody (SPA-DXM-NLP) in an animal model. Methods DXM-NLP was prepared using film dispersion combined with extrusion techniques. SP-A antibody was used as the lung targeting agent. Tissue distribution of SPA-DXM-NLP was investigated in liver, spleen, kidney and lung tissue. The efficacy of SPA-DXM-NLP against lung injury was assessed in a rat model of bleomycin-induced acute lung injury. Results The SPA-DXM-NLP complex was successfully synthesized and the particles were stable at 4°C. Pulmonary dexamethasone levels were 40 times higher with SPA-DXM-NLP than conventional dexamethasone injection. Administration of SPA-DXM-NLP significantly attenuated lung injury and inflammation, decreased incidence of infection, and increased survival in animal models. Conclusions The administration of SPA-DXM-NLP to animal models resulted in increased levels of DXM in the lungs, indicating active targeting. The efficacy against ALI of the immunoliposomes was shown to be superior to conventional dexamethasone administration. These results demonstrate the potential of actively targeted glucocorticoid therapy in the treatment of lung disease in clinical practice. PMID:23516459

  11. Neurolinguistic programming: a systematic review of the effects on health outcomes

    PubMed Central

    Sturt, Jackie; Ali, Saima; Robertson, Wendy; Metcalfe, David; Grove, Amy; Bourne, Claire; Bridle, Chris

    2012-01-01

    Background Neurolinguistic programming (NLP) in health care has captured the interest of doctors, healthcare professionals, and managers. Aim To evaluate the effects of NLP on health-related outcomes. Design and setting Systematic review of experimental studies. Method The following data sources were searched: MEDLINE®, PsycINFO, ASSIA, AMED, CINAHL®, Web of Knowledge, CENTRAL, NLP specialist databases, reference lists, review articles, and NLP professional associations, training providers, and research groups. Results Searches revealed 1459 titles from which 10 experimental studies were included. Five studies were randomised controlled trials (RCTs) and five were pre-post studies. Targeted health conditions were anxiety disorders, weight maintenance, morning sickness, substance misuse, and claustrophobia during MRI scanning. NLP interventions were mainly delivered across 4–20 sessions although three were single session. Eighteen outcomes were reported and the RCT sample sizes ranged from 22 to 106. Four RCTs reported no significant between group differences with the fifth finding in favour of the NLP arm (F = 8.114, P<0.001). Three RCTs and five pre-post studies reported within group improvements. Risk of bias across all studies was high or uncertain. Conclusion There is little evidence that NLP interventions improve health-related outcomes. This conclusion reflects the limited quantity and quality of NLP research, rather than robust evidence of no effect. There is currently insufficient evidence to support the allocation of NHS resources to NLP activities outside of research purposes. PMID:23211179

  12. Using ontology network structure in text mining.

    PubMed

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  13. A pharmacological study of NLP-12 neuropeptide signaling in free-living and parasitic nematodes.

    PubMed

    Peeters, Lise; Janssen, Tom; De Haes, Wouter; Beets, Isabel; Meelkop, Ellen; Grant, Warwick; Schoofs, Liliane

    2012-03-01

    NLP-12a and b have been identified as cholecystokinin/sulfakinin-like neuropeptides in the free-living nematode Caenorhabditis elegans. They are suggested to play an important role in the regulation of digestive enzyme secretion and fat storage. This study reports on the identification and characterization of an NLP-12-like peptide precursor gene in the rat parasitic nematode Strongyloides ratti. The S. ratti NLP-12 peptides are able to activate both C. elegans CKR-2 receptor isoforms in a dose-dependent way with affinities in the same nanomolar range as the native C. elegans NLP-12 peptides. The C-terminal RPLQFamide sequence motif of the NLP-12 peptides is perfectly conserved between free-living and parasitic nematodes. Based on systemic amino acid replacements the Arg-, Leu- and Phe- residues appear to be critical for high-affinity receptor binding. Finally, a SAR analysis revealed the essential pharmacophore in C. elegans NLP-12b to be the pentapeptide RPLQFamide. Copyright © 2011 Elsevier Inc. All rights reserved.

  14. Energy-modeled flight in a wind field

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feldman, M.A.; Cliff, E.M.

    Optimal shaping of aerospace trajectories has provided the motivation for much modern study of optimization theory and algorithms. Current industrial practice favors approaches where the continuous-time optimal control problem is transcribed to a finite-dimensional nonlinear programming problem (NLP) by a discretization process. Two such formulations are implemented in the POST and the OTIS codes. In the present paper we use a discretization that is specially adapted to the flight problem of interest. Among the unique aspects of the present discretization are: a least-squares formulation for certain kinematic constraints; the use of an energy ideas to enforce Newton`s Laws; and, themore » inclusion of large magnitude horizontal winds. In the next section we shall provide a description of the flight problem and its NLP representation. Following this we provide some details of the constraint formulation. Finally, we present an overview of the NLP problem.« less

  15. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.

    PubMed

    Grundmeier, Robert W; Masino, Aaron J; Casper, T Charles; Dean, Jonathan M; Bell, Jamie; Enriquez, Rene; Deakyne, Sara; Chamberlain, James M; Alpern, Elizabeth R

    2016-11-09

    Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English "stop words" and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927). NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.

  16. NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes.

    PubMed

    McEwan, Reed; Melton, Genevieve B; Knoll, Benjamin C; Wang, Yan; Hultman, Gretchen; Dale, Justin L; Meyer, Tim; Pakhomov, Serguei V

    2016-01-01

    Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota.

  17. Effects of the ninein-like protein centrosomal protein on breast cancer cell invasion and migration

    PubMed Central

    LIU, QI; WANG, XINZHAO; LV, MINLIN; MU, DIANBIN; WANG, LEILEI; ZUO, WENSU; YU, ZHIYONG

    2015-01-01

    To investigate the effects of the centrosomal protein, ninein-like protein (Nlp), on the proliferation, invasion and metastasis of MCF-7 breast cancer cells, the present study established green fluorescent protein (GFP)-containing MCF7 plasmids with steady and overexpression of Nlp (MCG7-GFP-N1p) and blank plasmids (MCF7-GFP) using lentiviral transfection technology in MCF7 the breast cancer cell line. The expression of Nlp was determined by reverse transcription-quantitative polymerase chain reaction and western blott analysis. Differences in levels of proliferation, invasion and metastasis between the MCF7-GFP-Nlp group and MCF-GFP group were compared using MTT, plate colony formation and Transwell migration assays. The cell growth was more rapid and the colony forming rate was markedly increased in the MCF7-GFP-Nlp group (P<0.05) compared with the MCF7-GFP group. The number of cells in the MCF-GFP-Nlp and MCF7-GFP groups transferred across membranes were 878±18.22 and 398±8.02, respectively, in the migration assay. The invasive capacity was significantly increased in the MCF7-GFP-Nlp group (P<0.05) compared with the MCF7-GFP group. The western blotting results demonstrated high expression levels of C-X-C chemokine receptor type 4 in the MCF7-GFP-Nlp group. The increased expression of Nlp was associated with an increase in MCF7 cell proliferation, invasion and metastasis, which indicated that Nlp promoted breast tumorigenesis and may be used as a potent biological index to predict breast cancer metastasis and develop therapeutic regimes. PMID:25901761

  18. Bullying in Virtual Learning Communities.

    PubMed

    Nikiforos, Stefanos; Tzanavaris, Spyros; Kermanidis, Katia Lida

    2017-01-01

    Bullying through the internet has been investigated and analyzed mainly in the field of social media. In this paper, it is attempted to analyze bullying in the Virtual Learning Communities using Natural Language Processing (NLP) techniques, mainly in the context of sociocultural learning theories. Therefore four case studies took place. We aim to apply NLP techniques to speech analysis on communication data of online communities. Emphasis is given on qualitative data, taking into account the subjectivity of the collaborative activity. Finally, this is the first time such type of analysis is attempted on Greek data.

  19. Gene expression and pharmacology of nematode NLP-12 neuropeptides.

    PubMed

    McVeigh, Paul; Leech, Suzie; Marks, Nikki J; Geary, Timothy G; Maule, Aaron G

    2006-05-31

    This study examines the biology of NLP-12 neuropeptides in Caenorhabditis elegans, and in the parasitic nematodes Ascaris suum and Trichostrongylus colubriformis. DYRPLQFamide (1 nM-10 microM; n > or =6) produced contraction of innervated dorsal and ventral Ascaris body wall muscle preparations (10 microM, 6.8+/-1.9 g; 1 microM, 4.6+/-1.8 g; 0.1 microM, 4.1+/-2.0 g; 10 nM, 3.8+/-2.0 g; n > or =6), and also caused a qualitatively similar, but quantitatively lower contractile response (10 microM, 4.0+/-1.5 g, n=6) on denervated muscle strips. Ovijector muscle displayed no measurable response (10 microM, n=5). nlp-12 cDNAs were characterised from A. suum (As-nlp-12) and T. colubriformis (Tc-nlp-12), both of which show sequence similarity to C. elegans nlp-12, in that they encode multiple copies of -LQFamide peptides. In C. elegans, reverse transcriptase (RT)-PCR analysis showed that nlp-12 was transcribed throughout the life cycle, suggesting that DYRPLQFamide plays a constitutive role in the nervous system of this nematode. Transcription was also identified in both L3 and adult stages of T. colubriformis, in which Tc-nlp-12 is expressed in a single tail neurone. Conversely, As-nlp-12 is expressed in both head and tail tissue of adult female A. suum, suggesting species-specific differences in the transcription pattern of this gene.

  20. An Information Extraction Framework for Cohort Identification Using Electronic Health Records

    PubMed Central

    Liu, Hongfang; Bielinski, Suzette J.; Sohn, Sunghwan; Murphy, Sean; Wagholikar, Kavishwar B.; Jonnalagadda, Siddhartha R.; Ravikumar, K.E.; Wu, Stephen T.; Kullo, Iftikhar J.; Chute, Christopher G

    Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework. PMID:24303255

  1. Overexpression of Arabidopsis NLP7 improves plant growth under both nitrogen-limiting and -sufficient conditions by enhancing nitrogen and carbon assimilation.

    PubMed

    Yu, Lin-Hui; Wu, Jie; Tang, Hui; Yuan, Yang; Wang, Shi-Mei; Wang, Yu-Ping; Zhu, Qi-Sheng; Li, Shi-Gui; Xiang, Cheng-Bin

    2016-06-13

    Nitrogen is essential for plant survival and growth. Excessive application of nitrogenous fertilizer has generated serious environment pollution and increased production cost in agriculture. To deal with this problem, tremendous efforts have been invested worldwide to increase the nitrogen use ability of crops. However, only limited success has been achieved to date. Here we report that NLP7 (NIN-LIKE PROTEIN 7) is a potential candidate to improve plant nitrogen use ability. When overexpressed in Arabidopsis, NLP7 increases plant biomass under both nitrogen-poor and -rich conditions with better-developed root system and reduced shoot/root ratio. NLP7-overexpressing plants show a significant increase in key nitrogen metabolites, nitrogen uptake, total nitrogen content, and expression levels of genes involved in nitrogen assimilation and signalling. More importantly, overexpression of NLP7 also enhances photosynthesis rate and carbon assimilation, whereas knockout of NLP7 impaired both nitrogen and carbon assimilation. In addition, NLP7 improves plant growth and nitrogen use in transgenic tobacco (Nicotiana tabacum). Our results demonstrate that NLP7 significantly improves plant growth under both nitrogen-poor and -rich conditions by coordinately enhancing nitrogen and carbon assimilation and sheds light on crop improvement.

  2. Experimenting with semantic web services to understand the role of NLP technologies in healthcare.

    PubMed

    Jagannathan, V

    2006-01-01

    NLP technologies can play a significant role in healthcare where a predominant segment of the clinical documentation is in text form. In a graduate course focused on understanding semantic web services at West Virginia University, a class project was designed with the purpose of exploring potential use for NLP-based abstraction of clinical documentation. The role of NLP-technology was simulated using human abstractors and various workflows were investigated using public domain workflow and semantic web service technologies. This poster explores the potential use of NLP and the role of workflow and semantic web technologies in developing healthcare IT environments.

  3. Using natural language processing techniques to inform research on nanotechnology.

    PubMed

    Lewinski, Nastassja A; McInnes, Bridget T

    2015-01-01

    Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics.

  4. Patent Retrieval in Chemistry based on Semantically Tagged Named Entities

    DTIC Science & Technology

    2009-11-01

    their corresponding synonyms. An ex- ample query for TS-15 is: (" Betaine " OR "Glycine betaine " OR "Glycocol betaine " OR "Glycylbetaine" OR ...) AND...task in an automatic way based on noun- phrase detection incorporating the OpenNLP chun- 3 Informative Term Synonyms Source Betaine Glycine betaine ...Glycocol betaine , Glycylbetaine etc. ATC Peripheral Artery Disease Peripheral Artery Disorder, Peripheral Arterial Disease etc. MeSH Diels-Alder reaction

  5. Implicitly Defined Neural Networks for Sequence Labeling

    DTIC Science & Technology

    2017-07-31

    network are coupled together, in order to improve perfor- mance on complex, long-range dependencies in either direction of a sequence. We contrast our...struc- ture. 1.1 Related Work Long-range dependencies have been an issue as long as there have been NLP tasks, and there are many ef- fective approaches...retain informa- tion about dependencies . The Bidirectional LSTM (b- LSTM) (Graves and Schmidhuber, 2005), a natural ex- tension of (Schuster and Paliwal

  6. Automatic Lung-RADS™ classification with a natural language processing system.

    PubMed

    Beyer, Sebastian E; McKee, Brady J; Regis, Shawn M; McKee, Andrea B; Flacke, Sebastian; El Saadawi, Gilan; Wald, Christoph

    2017-09-01

    Our aim was to train a natural language processing (NLP) algorithm to capture imaging characteristics of lung nodules reported in a structured CT report and suggest the applicable Lung-RADS™ (LR) category. Our study included structured, clinical reports of consecutive CT lung screening (CTLS) exams performed from 08/2014 to 08/2015 at an ACR accredited Lung Cancer Screening Center. All patients screened were at high-risk for lung cancer according to the NCCN Guidelines ® . All exams were interpreted by one of three radiologists credentialed to read CTLS exams using LR using a standard reporting template. Training and test sets consisted of consecutive exams. Lung screening exams were divided into two groups: three training sets (500, 120, and 383 reports each) and one final evaluation set (498 reports). NLP algorithm results were compared with the gold standard of LR category assigned by the radiologist. The sensitivity/specificity of the NLP algorithm to correctly assign LR categories for suspicious nodules (LR 4) and positive nodules (LR 3/4) were 74.1%/98.6% and 75.0%/98.8% respectively. The majority of mismatches occurred in cases where pulmonary findings were present not currently addressed by LR. Misclassifications also resulted from the failure to identify exams as follow-up and the failure to completely characterize part-solid nodules. In a sub-group analysis among structured reports with standardized language, the sensitivity and specificity to detect LR 4 nodules were 87.0% and 99.5%, respectively. An NLP system can accurately suggest the appropriate LR category from CTLS exam findings when standardized reporting is used.

  7. Automatic Lung-RADS™ classification with a natural language processing system

    PubMed Central

    Beyer, Sebastian E.; Regis, Shawn M.; McKee, Andrea B.; Flacke, Sebastian; El Saadawi, Gilan; Wald, Christoph

    2017-01-01

    Background Our aim was to train a natural language processing (NLP) algorithm to capture imaging characteristics of lung nodules reported in a structured CT report and suggest the applicable Lung-RADS™ (LR) category. Methods Our study included structured, clinical reports of consecutive CT lung screening (CTLS) exams performed from 08/2014 to 08/2015 at an ACR accredited Lung Cancer Screening Center. All patients screened were at high-risk for lung cancer according to the NCCN Guidelines®. All exams were interpreted by one of three radiologists credentialed to read CTLS exams using LR using a standard reporting template. Training and test sets consisted of consecutive exams. Lung screening exams were divided into two groups: three training sets (500, 120, and 383 reports each) and one final evaluation set (498 reports). NLP algorithm results were compared with the gold standard of LR category assigned by the radiologist. Results The sensitivity/specificity of the NLP algorithm to correctly assign LR categories for suspicious nodules (LR 4) and positive nodules (LR 3/4) were 74.1%/98.6% and 75.0%/98.8% respectively. The majority of mismatches occurred in cases where pulmonary findings were present not currently addressed by LR. Misclassifications also resulted from the failure to identify exams as follow-up and the failure to completely characterize part-solid nodules. In a sub-group analysis among structured reports with standardized language, the sensitivity and specificity to detect LR 4 nodules were 87.0% and 99.5%, respectively. Conclusions An NLP system can accurately suggest the appropriate LR category from CTLS exam findings when standardized reporting is used. PMID:29221286

  8. A study of active learning methods for named entity recognition in clinical text.

    PubMed

    Chen, Yukun; Lasko, Thomas A; Mei, Qiaozhu; Denny, Joshua C; Xu, Hua

    2015-12-01

    Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Representing Information in Patient Reports Using Natural Language Processing and the Extensible Markup Language

    PubMed Central

    Friedman, Carol; Hripcsak, George; Shagina, Lyuda; Liu, Hongfang

    1999-01-01

    Objective: To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. Methods: A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP) was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. Results: The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. Conclusions: Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available. PMID:9925230

  10. The pleiotropic transcriptional regulator NlpR contributes to the modulation of nitrogen metabolism, lipogenesis and triacylglycerol accumulation in oleaginous rhodococci.

    PubMed

    Hernández, Martín A; Lara, Julia; Gago, Gabriela; Gramajo, Hugo; Alvarez, Héctor M

    2017-01-01

    The regulatory mechanisms involved in lipogenesis and triacylglycerol (TAG) accumulation are largely unknown in oleaginous rhodococci. In this study a regulatory protein (here called NlpR: Nitrogen lipid Regulator), which contributes to the modulation of nitrogen metabolism, lipogenesis and triacylglycerol accumulation in oleaginous rhodococci was identified. Under nitrogen deprivation conditions, in which TAG accumulation is stimulated, the nlpR gene was significantly upregulated, whereas a significant decrease of its expression and TAG accumulation occurred when cerulenin was added. The nlpR disruption negatively affected the nitrate/nitrite reduction as well as lipid biosynthesis under nitrogen-limiting conditions. In contrast, its overexpression increased TAG production during cultivation of cells in nitrogen-rich media. A putative 'NlpR-binding motif' upstream of several genes related to nitrogen and lipid metabolisms was found. The nlpR disruption in RHA1 strain led to a reduced transcription of genes involved in nitrate/nitrite assimilation, as well as in fatty acid and TAG biosynthesis. Purified NlpR was able to bind to narK, nirD, fasI, plsC and atf3 promoter regions. It was suggested that NlpR acts as a pleiotropic transcriptional regulator by activating of nitrate/nitrite assimilation genes and others genes involved in fatty acid and TAG biosynthesis, in response to nitrogen deprivation. © 2016 John Wiley & Sons Ltd.

  11. Life-span extension by dietary restriction is mediated by NLP-7 signaling and coelomocyte endocytosis in C. elegans.

    PubMed

    Park, Sang-Kyu; Link, Christopher D; Johnson, Thomas E

    2010-02-01

    Recent studies have shown that the rate of aging can be modulated by diverse interventions. Dietary restriction is the most widely used intervention to promote longevity; however, the mechanisms underlying the effect of dietary restriction remain elusive. In a previous study, we identified two novel genes, nlp-7 and cup-4, required for normal longevity in Caenorhabditis elegans. nlp-7 is one of a set of neuropeptide-like protein genes; cup-4 encodes an ion-channel involved in endocytosis by coelomocytes. Here, we assess whether nlp-7 and cup-4 mediate longevity increases by dietary restriction. RNAi of nlp-7 or cup-4 significantly reduces the life span of the eat-2 mutant, a genetic model of dietary restriction, but has no effect on the life span of long-lived mutants resulting from reduced insulin/IGF-1 signaling or dysfunction of the mitochondrial electron transport chain. The life-span extension observed in wild-type N2 worms by dietary restriction using bacterial dilution is prevented significantly in nlp-7 and cup-4 mutants. RNAi knockdown of genes encoding candidate receptors of NLP-7 and genes involved in endocytosis by coelomocytes also specifically shorten the life span of the eat-2 mutant. We conclude that two novel pathways, NLP-7 signaling and endocytosis by coelomocytes, are required for life extension under dietary restriction in C. elegans.

  12. Synthesis and characterization of Her2-NLP peptide conjugates targeting circulating breast cancer cells: cellular uptake and localization by fluorescent microscopic imaging.

    PubMed

    Cai, Huawei; Singh, Ajay N; Sun, Xiankai; Peng, Fangyu

    2015-01-01

    To synthesize a fluorescent Her2-NLP peptide conjugate consisting of Her2/neu targeting peptide and nuclear localization sequence peptide (NLP) and assess its cellular uptake and intracellular localization for radionuclide cancer therapy targeting Her2/neu-positive circulating breast cancer cells (CBCC). Fluorescent Cy5.5 Her2-NLP peptide conjugate was synthesized by coupling a bivalent peptide sequence, which consisted of a Her2-binding peptide (NH2-GSGKCCYSL) and an NLP peptide (CGYGPKKKRKVGG) linked by a polyethylene glycol (PEG) chain with 6 repeating units, with an activated Cy5.5 ester. The conjugate was separated and purified by HPLC and then characterized by Maldi-MS. The intracellular localization of fluorescent Cy5.5 Her2-NLP peptide conjugate was assessed by fluorescent microscopic imaging using a confocal microscope after incubation of Cy5.5-Her2-NLP with Her2/neu positive breast cancer cells and Her2/neu negative control breast cancer cells, respectively. Fluorescent signals were detected in cytoplasm of Her2/neu positive breast cancer cells (SKBR-3 and BT474 cell lines), but not or little in cytoplasm of Her2/neu negative breast cancer cells (MDA-MB-231), after incubation of the breast cancer cells with Cy5.5-Her2-NLP conjugates in vitro. No fluorescent signals were detected within the nuclei of Her2/neu positive SKBR-3 and BT474 breast cancer cells, neither Her2/neu negative MDA-MB-231 cells, incubated with the Cy5.5-Her2-NLP peptide conjugates, suggesting poor nuclear localization of the Cy5.5-Her2-NLP conjugates localized within the cytoplasm after their cellular uptake and internalization by the Her2/neu positive breast cancer cells. Her2-binding peptide (KCCYSL) is a promising agent for radionuclide therapy of Her2/neu positive breast cancer using a β(-) or α emitting radionuclide, but poor nuclear localization of the Her2-NLP peptide conjugates may limit its use for eradication of Her2/neu-positive CBCC using I-125 or other Auger electron emitting radionuclide.

  13. Neuro-Linguistic Programming and Family Therapy.

    ERIC Educational Resources Information Center

    Davis, Susan L. R.; Davis, Donald I.

    1983-01-01

    Presents a brief introduction to Neuro-Linguistic Programming (NLP), followed by case examples which illustrate some of the substantive gains which NLP techniques have provided in work with couples and families. NLP's major contributions involve understanding new models of human experience. (WAS)

  14. Synergist: Collaborative Analyst Assistant

    DTIC Science & Technology

    2009-04-01

    NLP Framework ............................................................................................ 4  3.2  Identifying Concepts in Text...48  iii LIST OF FIGURES Figure 1: Lymba’s NLP Pipeline...events, general concepts, relations and context, and build representations that yield well to reasoning on text and providing information access. NLP

  15. Extracting important information from Chinese Operation Notes with natural language processing methods.

    PubMed

    Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei

    2014-04-01

    Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes

    PubMed Central

    McEwan, Reed; Melton, Genevieve B.; Knoll, Benjamin C.; Wang, Yan; Hultman, Gretchen; Dale, Justin L.; Meyer, Tim; Pakhomov, Serguei V.

    2016-01-01

    Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota. PMID:27570663

  17. Insights into substrate specificity of NlpC/P60 cell wall hydrolases containing bacterial SH3 domains

    DOE PAGES

    Xu, Qingping; Mengin-Lecreulx, Dominique; Liu, Xueqian W.; ...

    2015-09-15

    Bacterial SH3 (SH3b) domains are commonly fused with papain-like Nlp/P60 cell wall hydrolase domains. To understand how the modular architecture of SH3b and NlpC/P60 affects the activity of the catalytic domain, three putative NlpC/P60 cell wall hydrolases were biochemically and structurally characterized. In addition, these enzymes all have γ-d-Glu-A 2pm (A 2pm is diaminopimelic acid) cysteine amidase (ordl-endopeptidase) activities but with different substrate specificities. One enzyme is a cell wall lysin that cleaves peptidoglycan (PG), while the other two are cell wall recycling enzymes that only cleave stem peptides with an N-terminall-Ala. Their crystal structures revealed a highly conserved structuremore » consisting of two SH3b domains and a C-terminal NlpC/P60 catalytic domain, despite very low sequence identity. Interestingly, loops from the first SH3b domain dock into the ends of the active site groove of the catalytic domain, remodel the substrate binding site, and modulate substrate specificity. Two amino acid differences at the domain interface alter the substrate binding specificity in favor of stem peptides in recycling enzymes, whereas the SH3b domain may extend the peptidoglycan binding surface in the cell wall lysins. Remarkably, the cell wall lysin can be converted into a recycling enzyme with a single mutation.Peptidoglycan is a meshlike polymer that envelops the bacterial plasma membrane and bestows structural integrity. Cell wall lysins and recycling enzymes are part of a set of lytic enzymes that target covalent bonds connecting the amino acid and amino sugar building blocks of the PG network. These hydrolases are involved in processes such as cell growth and division, autolysis, invasion, and PG turnover and recycling. To avoid cleavage of unintended substrates, these enzymes have very selective substrate specificities. Our biochemical and structural analysis of three modular NlpC/P60 hydrolases, one lysin, and two recycling enzymes, show that they may have evolved from a common molecular architecture, where the substrate preference is modulated by local changes. These results also suggest that new pathways for recycling PG turnover products, such as tracheal cytotoxin, may have evolved in bacteria in the human gut microbiome that involve NlpC/P60 cell wall hydrolases.« less

  18. Insights into substrate specificity of NlpC/P60 cell wall hydrolases containing bacterial SH3 domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Qingping; Mengin-Lecreulx, Dominique; Liu, Xueqian W.

    Bacterial SH3 (SH3b) domains are commonly fused with papain-like Nlp/P60 cell wall hydrolase domains. To understand how the modular architecture of SH3b and NlpC/P60 affects the activity of the catalytic domain, three putative NlpC/P60 cell wall hydrolases were biochemically and structurally characterized. In addition, these enzymes all have γ-d-Glu-A 2pm (A 2pm is diaminopimelic acid) cysteine amidase (ordl-endopeptidase) activities but with different substrate specificities. One enzyme is a cell wall lysin that cleaves peptidoglycan (PG), while the other two are cell wall recycling enzymes that only cleave stem peptides with an N-terminall-Ala. Their crystal structures revealed a highly conserved structuremore » consisting of two SH3b domains and a C-terminal NlpC/P60 catalytic domain, despite very low sequence identity. Interestingly, loops from the first SH3b domain dock into the ends of the active site groove of the catalytic domain, remodel the substrate binding site, and modulate substrate specificity. Two amino acid differences at the domain interface alter the substrate binding specificity in favor of stem peptides in recycling enzymes, whereas the SH3b domain may extend the peptidoglycan binding surface in the cell wall lysins. Remarkably, the cell wall lysin can be converted into a recycling enzyme with a single mutation.Peptidoglycan is a meshlike polymer that envelops the bacterial plasma membrane and bestows structural integrity. Cell wall lysins and recycling enzymes are part of a set of lytic enzymes that target covalent bonds connecting the amino acid and amino sugar building blocks of the PG network. These hydrolases are involved in processes such as cell growth and division, autolysis, invasion, and PG turnover and recycling. To avoid cleavage of unintended substrates, these enzymes have very selective substrate specificities. Our biochemical and structural analysis of three modular NlpC/P60 hydrolases, one lysin, and two recycling enzymes, show that they may have evolved from a common molecular architecture, where the substrate preference is modulated by local changes. These results also suggest that new pathways for recycling PG turnover products, such as tracheal cytotoxin, may have evolved in bacteria in the human gut microbiome that involve NlpC/P60 cell wall hydrolases.« less

  19. Insights into Substrate Specificity of NlpC/P60 Cell Wall Hydrolases Containing Bacterial SH3 Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Qingping; Mengin-Lecreulx, Dominique; Liu, Xueqian W.

    ABSTRACT Bacterial SH3 (SH3b) domains are commonly fused with papain-like Nlp/P60 cell wall hydrolase domains. To understand how the modular architecture of SH3b and NlpC/P60 affects the activity of the catalytic domain, three putative NlpC/P60 cell wall hydrolases were biochemically and structurally characterized. These enzymes all have γ-d-Glu-A 2pm (A 2pm is diaminopimelic acid) cysteine amidase (ordl-endopeptidase) activities but with different substrate specificities. One enzyme is a cell wall lysin that cleaves peptidoglycan (PG), while the other two are cell wall recycling enzymes that only cleave stem peptides with an N-terminall-Ala. Their crystal structures revealed a highly conserved structure consistingmore » of two SH3b domains and a C-terminal NlpC/P60 catalytic domain, despite very low sequence identity. Interestingly, loops from the first SH3b domain dock into the ends of the active site groove of the catalytic domain, remodel the substrate binding site, and modulate substrate specificity. Two amino acid differences at the domain interface alter the substrate binding specificity in favor of stem peptides in recycling enzymes, whereas the SH3b domain may extend the peptidoglycan binding surface in the cell wall lysins. Remarkably, the cell wall lysin can be converted into a recycling enzyme with a single mutation. IMPORTANCEPeptidoglycan is a meshlike polymer that envelops the bacterial plasma membrane and bestows structural integrity. Cell wall lysins and recycling enzymes are part of a set of lytic enzymes that target covalent bonds connecting the amino acid and amino sugar building blocks of the PG network. These hydrolases are involved in processes such as cell growth and division, autolysis, invasion, and PG turnover and recycling. To avoid cleavage of unintended substrates, these enzymes have very selective substrate specificities. Our biochemical and structural analysis of three modular NlpC/P60 hydrolases, one lysin, and two recycling enzymes, show that they may have evolved from a common molecular architecture, where the substrate preference is modulated by local changes. These results also suggest that new pathways for recycling PG turnover products, such as tracheal cytotoxin, may have evolved in bacteria in the human gut microbiome that involve NlpC/P60 cell wall hydrolases.« less

  20. Using natural language processing techniques to inform research on nanotechnology

    PubMed Central

    Lewinski, Nastassja A

    2015-01-01

    Summary Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics. PMID:26199848

  1. Canary: An NLP Platform for Clinicians and Researchers.

    PubMed

    Malmasi, Shervin; Sandor, Nicolae L; Hosomura, Naoshi; Goldberg, Matt; Skentzos, Stephen; Turchin, Alexander

    2017-05-03

    Information Extraction methods can help discover critical knowledge buried in the vast repositories of unstructured clinical data. However, these methods are underutilized in clinical research, potentially due to the absence of free software geared towards clinicians with little technical expertise. The skills required for developing/using such software constitute a major barrier for medical researchers wishing to employ these methods. To address this, we have developed Canary, a free and open-source solution designed for users without natural language processing (NLP) or software engineering experience. It was designed to be fast and work out of the box via a user-friendly graphical interface.

  2. DNA-Targeted 2-Nitroimidazoles: Studies of the Influence of the Phenanthridine-Linked Nitroimidazoles, 2-NLP-3 and 2-NLP-4, on DNA Damage Induced by Ionizing Radiation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buchko, Garry W.; Weinfeld, Michael

    The nitroimidazole-linked phenanthridines 2-NLP-3 (5-[3-(2-nitro-1-imidazoyl)-propyl]-phenanthridinium bromide) and 2-NLP-4 (5-[3-(2-nitro-1-imidazoyl)-butyl1]-phenanthridinium bromide) are composed of the radiosensitizer, 2-nitroimidazole, attached to the DNA intercalator phenanthridine via a 3- and 4-carbon linker, respectively. Previous in vitro assays show both compounds to be 10 - 100 times more efficient as hypoxic cell radiosensitizer, misonidazole[Cowan et al., Radiat. Res. 127, 81-89, 1991]. Here we have used a 32P postlabeling assay and 5'-end labeled oligonucleotide assay to compare the radiogenic DNA damage generated in the presence of 2-NLP-3, 2-NLP-4 compared to irradiation in the presence of misonidazole. This may account, at least in part, for the greatermore » cellular radiosensitization shown by the nitroimidazole-linked phenanthridines over misonidazole.« less

  3. A Conserved Dopamine-Cholecystokinin Signaling Pathway Shapes Context–Dependent Caenorhabditis elegans Behavior

    PubMed Central

    Bhattacharya, Raja; Touroutine, Denis; Barbagallo, Belinda; Climer, Jason; Lambert, Christopher M.; Clark, Christopher M.; Alkema, Mark J.; Francis, Michael M.

    2014-01-01

    An organism's ability to thrive in changing environmental conditions requires the capacity for making flexible behavioral responses. Here we show that, in the nematode Caenorhabditis elegans, foraging responses to changes in food availability require nlp-12, a homolog of the mammalian neuropeptide cholecystokinin (CCK). nlp-12 expression is limited to a single interneuron (DVA) that is postsynaptic to dopaminergic neurons involved in food-sensing, and presynaptic to locomotory control neurons. NLP-12 release from DVA is regulated through the D1-like dopamine receptor DOP-1, and both nlp-12 and dop-1 are required for normal local food searching responses. nlp-12/CCK overexpression recapitulates characteristics of local food searching, and DVA ablation or mutations disrupting muscle acetylcholine receptor function attenuate these effects. Conversely, nlp-12 deletion reverses behavioral and functional changes associated with genetically enhanced muscle acetylcholine receptor activity. Thus, our data suggest that dopamine-mediated sensory information about food availability shapes foraging in a context-dependent manner through peptide modulation of locomotory output. PMID:25167143

  4. The neuropeptide NLP-22 regulates a sleep-like state in Caenorhabditis elegans

    PubMed Central

    Nelson, MD; Trojanowski, NF; George-Raizen, JB; Smith, CJ; Yu, C-C; Fang-Yen, C; Raizen, DM

    2013-01-01

    Neuropeptides play central roles in the regulation of homeostatic behaviors such as sleep and feeding. Caenorhabditis elegans displays sleep-like quiescence of locomotion and feeding during a larval transition stage called lethargus and feeds during active larval and adult stages. Here we show that the neuropeptide NLP-22 is a regulator of Caenorhabditis elegans sleep-like quiescence observed during lethargus. nlp-22 shows cyclical mRNA expression in synchrony with lethargus; it is regulated by LIN-42, an orthologue of the core circadian protein PERIOD; and it is expressed solely in the two RIA interneurons. nlp-22 and the RIA interneurons are required for normal lethargus quiescence, and forced expression of nlp-22 during active stages causes anachronistic locomotion and feeding quiescence. Optogenetic stimulation of RIA interneurons has a movement-promoting effect, demonstrating functional complexity in a single neuron type. Our work defines a quiescence-regulating role for NLP-22 and expands our knowledge of the neural circuitry controlling Caenorhabditis elegans behavioral quiescence. PMID:24301180

  5. The neuropeptide NLP-22 regulates a sleep-like state in Caenorhabditis elegans.

    PubMed

    Nelson, M D; Trojanowski, N F; George-Raizen, J B; Smith, C J; Yu, C-C; Fang-Yen, C; Raizen, D M

    2013-01-01

    Neuropeptides have central roles in the regulation of homoeostatic behaviours such as sleep and feeding. Caenorhabditis elegans displays sleep-like quiescence of locomotion and feeding during a larval transition stage called lethargus and feeds during active larval and adult stages. Here we show that the neuropeptide NLP-22 is a regulator of Caenorhabditis elegans sleep-like quiescence observed during lethargus. nlp-22 shows cyclical mRNA expression in synchrony with lethargus; it is regulated by LIN-42, an orthologue of the core circadian protein PERIOD; and it is expressed solely in the two RIA interneurons. nlp-22 and the RIA interneurons are required for normal lethargus quiescence, and forced expression of nlp-22 during active stages causes anachronistic locomotion and feeding quiescence. Optogenetic stimulation of the RIA interneurons has a movement-promoting effect, demonstrating functional complexity in a single-neuron type. Our work defines a quiescence-regulating role for NLP-22 and expands our knowledge of the neural circuitry controlling Caenorhabditis elegans behavioural quiescence.

  6. Interacting TCP and NLP transcription factors control plant responses to nitrate availability.

    PubMed

    Guan, Peizhu; Ripoll, Juan-José; Wang, Renhou; Vuong, Lam; Bailey-Steinitz, Lindsay J; Ye, Dening; Crawford, Nigel M

    2017-02-28

    Plants have evolved adaptive strategies that involve transcriptional networks to cope with and survive environmental challenges. Key transcriptional regulators that mediate responses to environmental fluctuations in nitrate have been identified; however, little is known about how these regulators interact to orchestrate nitrogen (N) responses and cell-cycle regulation. Here we report that teosinte branched1/cycloidea/proliferating cell factor1-20 (TCP20) and NIN-like protein (NLP) transcription factors NLP6 and NLP7, which act as activators of nitrate assimilatory genes, bind to adjacent sites in the upstream promoter region of the nitrate reductase gene, NIA1 , and physically interact under continuous nitrate and N-starvation conditions. Regions of these proteins necessary for these interactions were found to include the type I/II Phox and Bem1p (PB1) domains of NLP6&7, a protein-interaction module conserved in animals for nutrient signaling, and the histidine- and glutamine-rich domain of TCP20, which is conserved across plant species. Under N starvation, TCP20-NLP6&7 heterodimers accumulate in the nucleus, and this coincides with TCP20 and NLP6&7-dependent up-regulation of nitrate assimilation and signaling genes and down-regulation of the G 2 /M cell-cycle marker gene, CYCB1;1 TCP20 and NLP6&7 also support root meristem growth under N starvation. These findings provide insights into how plants coordinate responses to nitrate availability, linking nitrate assimilation and signaling with cell-cycle progression.

  7. Automated Extraction of Substance Use Information from Clinical Texts.

    PubMed

    Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei; Arsoniadis, Elliot; Carter, Elizabeth W; Lindemann, Elizabeth; Sarkar, Indra Neil; Melton, Genevieve B

    2015-01-01

    Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.

  8. Scholarly Information Extraction Is Going to Make a Quantum Leap with PubMed Central (PMC).

    PubMed

    Matthies, Franz; Hahn, Udo

    2017-01-01

    With the increasing availability of complete full texts (journal articles), rather than their surrogates (titles, abstracts), as resources for text analytics, entirely new opportunities arise for information extraction and text mining from scholarly publications. Yet, we gathered evidence that a range of problems are encountered for full-text processing when biomedical text analytics simply reuse existing NLP pipelines which were developed on the basis of abstracts (rather than full texts). We conducted experiments with four different relation extraction engines all of which were top performers in previous BioNLP Event Extraction Challenges. We found that abstract-trained engines loose up to 6.6% F-score points when run on full-text data. Hence, the reuse of existing abstract-based NLP software in a full-text scenario is considered harmful because of heavy performance losses. Given the current lack of annotated full-text resources to train on, our study quantifies the price paid for this short cut.

  9. Minimum Fuel Trajectory Design in Multiple Dynamical Environments Utilizing Direct Transcription Methods and Particle Swarm Optimization

    DTIC Science & Technology

    2016-03-01

    89 3.1.3 NLP Improvement...3.2.1.2 NLP Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.2.2 Multiple-burn Planar LEO to GEO Transfer...101 3.2.2.1 PSO Initial Guess Generation . . . . . . . . . . . . . . . . . . . . . 101 3.2.2.2 NLP Improvement

  10. Mayo clinic NLP system for patient smoking status identification.

    PubMed

    Savova, Guergana K; Ogren, Philip V; Duffy, Patrick H; Buntrock, James D; Chute, Christopher G

    2008-01-01

    This article describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and Weka. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score, and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.

  11. Automatic Extraction of Drug Adverse Effects from Product Characteristics (SPCs): A Text Versus Table Comparison.

    PubMed

    Lamy, Jean-Baptiste; Ugon, Adrien; Berthelot, Hélène

    2016-01-01

    Potential adverse effects (AEs) of drugs are described in their summary of product characteristics (SPCs), a textual document. Automatic extraction of AEs from SPCs is useful for detecting AEs and for building drug databases. However, this task is difficult because each AE is associated with a frequency that must be extracted and the presentation of AEs in SPCs is heterogeneous, consisting of plain text and tables in many different formats. We propose a taxonomy for the presentation of AEs in SPCs. We set up natural language processing (NLP) and table parsing methods for extracting AEs from texts and tables of any format, and evaluate them on 10 SPCs. Automatic extraction performed better on tables than on texts. Tables should be recommended for the presentation of the AEs section of the SPCs.

  12. Designing visual displays and system models for safe reactor operations based on the user`s perspective of the system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown-VanHoozer, S.A.

    Most designers are not schooled in the area of human-interaction psychology and therefore tend to rely on the traditional ergonomic aspects of human factors when designing complex human-interactive workstations related to reactor operations. They do not take into account the differences in user information processing behavior and how these behaviors may affect individual and team performance when accessing visual displays or utilizing system models in process and control room areas. Unfortunately, by ignoring the importance of the integration of the user interface at the information process level, the result can be sub-optimization and inherently error- and failure-prone systems. Therefore, tomore » minimize or eliminate failures in human-interactive systems, it is essential that the designers understand how each user`s processing characteristics affects how the user gathers information, and how the user communicates the information to the designer and other users. A different type of approach in achieving this understanding is Neuro Linguistic Programming (NLP). The material presented in this paper is based on two studies involving the design of visual displays, NLP, and the user`s perspective model of a reactor system. The studies involve the methodology known as NLP, and its use in expanding design choices from the user`s ``model of the world,`` in the areas of virtual reality, workstation design, team structure, decision and learning style patterns, safety operations, pattern recognition, and much, much more.« less

  13. Assessing Question Quality Using NLP

    ERIC Educational Resources Information Center

    Kopp, Kristopher J.; Johnson, Amy M.; Crossley, Scott A.; McNamara, Danielle S.

    2017-01-01

    An NLP algorithm was developed to assess question quality to inform feedback on questions generated by students within iSTART (an intelligent tutoring system that teaches reading strategies). A corpus of 4575 questions was coded using a four-level taxonomy. NLP indices were calculated for each question and machine learning was used to predict…

  14. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to Support Healthcare Quality Improvement

    PubMed Central

    Masino, Aaron J.; Casper, T. Charles; Dean, Jonathan M.; Bell, Jamie; Enriquez, Rene; Deakyne, Sara; Chamberlain, James M.; Alpern, Elizabeth R.

    2016-01-01

    Summary Background Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. Objective To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Methods Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English “stop words” and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. Results There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927). Conclusions NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents. PMID:27826610

  15. The Effects of Clinical Hypnosis versus Neurolinguistic Programming (NLP) before External Cephalic Version (ECV): A Prospective Off-Centre Randomised, Double-Blind, Controlled Trial

    PubMed Central

    Reinhard, Joscha; Peiffer, Swati; Sänger, Nicole; Herrmann, Eva; Yuan, Juping; Louwen, Frank

    2012-01-01

    Objective. To examine the effects of clinical hypnosis versus NLP intervention on the success rate of ECV procedures in comparison to a control group. Methods. A prospective off-centre randomised trial of a clinical hypnosis intervention against NLP of women with a singleton breech fetus at or after 370/7 (259 days) weeks of gestation and normal amniotic fluid index. All 80 participants heard a 20-minute recorded intervention via head phones. Main outcome assessed was success rate of ECV. The intervention groups were compared with a control group with standard medical care alone (n = 122). Results. A total of 42 women, who received a hypnosis intervention prior to ECV, had a 40.5% (n = 17), successful ECV, whereas 38 women, who received NLP, had a 44.7% (n = 17) successful ECV (P > 0.05). The control group had similar patient characteristics compared to the intervention groups (P > 0.05). In the control group (n = 122) 27.3% (n = 33) had a statistically significant lower successful ECV procedure than NLP (P = 0.05) and hypnosis and NLP (P = 0.03). Conclusions. These findings suggest that prior clinical hypnosis and NLP have similar success rates of ECV procedures and are both superior to standard medical care alone. PMID:22778774

  16. The Effects of Clinical Hypnosis versus Neurolinguistic Programming (NLP) before External Cephalic Version (ECV): A Prospective Off-Centre Randomised, Double-Blind, Controlled Trial.

    PubMed

    Reinhard, Joscha; Peiffer, Swati; Sänger, Nicole; Herrmann, Eva; Yuan, Juping; Louwen, Frank

    2012-01-01

    Objective. To examine the effects of clinical hypnosis versus NLP intervention on the success rate of ECV procedures in comparison to a control group. Methods. A prospective off-centre randomised trial of a clinical hypnosis intervention against NLP of women with a singleton breech fetus at or after 37(0/7) (259 days) weeks of gestation and normal amniotic fluid index. All 80 participants heard a 20-minute recorded intervention via head phones. Main outcome assessed was success rate of ECV. The intervention groups were compared with a control group with standard medical care alone (n = 122). Results. A total of 42 women, who received a hypnosis intervention prior to ECV, had a 40.5% (n = 17), successful ECV, whereas 38 women, who received NLP, had a 44.7% (n = 17) successful ECV (P > 0.05). The control group had similar patient characteristics compared to the intervention groups (P > 0.05). In the control group (n = 122) 27.3% (n = 33) had a statistically significant lower successful ECV procedure than NLP (P = 0.05) and hypnosis and NLP (P = 0.03). Conclusions. These findings suggest that prior clinical hypnosis and NLP have similar success rates of ECV procedures and are both superior to standard medical care alone.

  17. Evidence-based Neuro Linguistic Psychotherapy: a meta-analysis.

    PubMed

    Zaharia, Cătălin; Reiner, Melita; Schütz, Peter

    2015-12-01

    Neuro Linguistic Programming (NLP) Framework has enjoyed enormous popularity in the field of applied psychology. NLP has been used in business, education, law, medicine and psychotherapy to identify people's patterns and alter their responses to stimuli, so they are better able to regulate their environment and themselves. NLP looks at achieving goals, creating stable relationships, eliminating barriers such as fears and phobias, building self-confidence, and self-esteem, and achieving peak performance. Neuro Linguistic Psychotherapy (NLPt) encompasses NLP as framework and set of interventions in the treatment of individuals with different psychological and/or social problems. We aimed systematically to analyse the available data regarding the effectiveness of Neuro Linguistic Psychotherapy (NLPt). The present work is a meta-analysis of studies, observational or randomized controlled trials, for evaluating the efficacy of Neuro Linguistic Programming in individuals with different psychological and/or social problems. The databases searched to identify studies in English and German language: CENTRAL in the Cochrane Library; PubMed; ISI Web of Knowledge (include results also from Medline and the Web of Science); PsycINFO (including PsycARTICLES); Psyndex; Deutschsprachige Diplomarbeiten der Psychologie (database of theses in Psychology in German language), Social SciSearch; National library of health and two NLP-specific research databases: one from the NLP Community (http://www.nlp.de/cgi-bin/research/nlprdb.cgi?action=res_entries) and one from the NLP Group (http://www.nlpgrup.com/bilimselarastirmalar/bilimsel-arastirmalar-4.html#Zweig154). From a total number of 425 studies, 350 were removed and considered not relevant based on the title and abstract. Included, in the final analysis, are 12 studies with numbers of participants ranging between 12 and 115 subjects. The vast majority of studies were prospective observational. The actual paper represents the first meta-analysis evaluating the effectiveness of NLP therapy for individuals with social/psychological problems. The overall meta-analysis found that the NLP therapy may add an overall standardized mean difference of 0.54 with a confidence interval of CI=[0.20; 0.88]. Neuro-Linguistic Psychotherapy as a psychotherapeutic modality grounded in theoretical frameworks, methodologies and interventions scientifically developed, including models developed by NLP, shows results that can hold its ground in comparison with other psychotherapeutic methods.

  18. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.

    PubMed

    Patel, Tejal A; Puppala, Mamta; Ogunti, Richard O; Ensor, Joe E; He, Tiancheng; Shewale, Jitesh B; Ankerst, Donna P; Kaklamani, Virginia G; Rodriguez, Angel A; Wong, Stephen T C; Chang, Jenny C

    2017-01-01

    A key challenge to mining electronic health records for mammography research is the preponderance of unstructured narrative text, which strikingly limits usable output. The imaging characteristics of breast cancer subtypes have been described previously, but without standardization of parameters for data mining. The authors searched the enterprise-wide data warehouse at the Houston Methodist Hospital, the Methodist Environment for Translational Enhancement and Outcomes Research (METEOR), for patients with Breast Imaging Reporting and Data System (BI-RADS) category 5 mammogram readings performed between January 2006 and May 2015 and an available pathology report. The authors developed natural language processing (NLP) software algorithms to automatically extract mammographic and pathologic findings from free text mammogram and pathology reports. The correlation between mammographic imaging features and breast cancer subtype was analyzed using one-way analysis of variance and the Fisher exact test. The NLP algorithm was able to obtain key characteristics for 543 patients who met the inclusion criteria. Patients with estrogen receptor-positive tumors were more likely to have spiculated margins (P = .0008), and those with tumors that overexpressed human epidermal growth factor receptor 2 (HER2) were more likely to have heterogeneous and pleomorphic calcifications (P = .0078 and P = .0002, respectively). Mammographic imaging characteristics, obtained from an automated text search and the extraction of mammogram reports using NLP techniques, correlated with pathologic breast cancer subtype. The results of the current study validate previously reported trends assessed by manual data collection. Furthermore, NLP provides an automated means with which to scale up data extraction and analysis for clinical decision support. Cancer 2017;114-121. © 2016 American Cancer Society. © 2016 American Cancer Society.

  19. Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing.

    PubMed

    Haller, Irina V; Renier, Colleen M; Juusola, Mitch; Hitz, Paul; Steffen, William; Asmus, Michael J; Craig, Terri; Mardekian, Jack; Masters, Elizabeth T; Elliott, Thomas E

    2017-10-01

    Clinical guidelines for the use of opioids in chronic noncancer pain recommend assessing risk for aberrant drug-related behaviors prior to initiating opioid therapy. Despite recent dramatic increases in prescription opioid misuse and abuse, use of screening tools by clinicians continues to be underutilized. This research evaluated natural language processing (NLP) together with other data extraction techniques for risk assessment of patients considered for opioid therapy as a means of predicting opioid abuse. Using a retrospective cohort of 3,668 chronic noncancer pain patients with at least one opioid agreement between January 1, 2007, and December 31, 2012, we examined the availability of electronic health record structured and unstructured data to populate the Opioid Risk Tool (ORT) and other selected outcomes. Clinician-documented opioid agreement violations in the clinical notes were determined using NLP techniques followed by manual review of the notes. Confirmed through manual review, the NLP algorithm had 96.1% sensitivity, 92.8% specificity, and 92.6% positive predictive value in identifying opioid agreement violation. At the time of most recent opioid agreement, automated ORT identified 42.8% of patients as at low risk, 28.2% as at moderate risk, and 29.0% as at high risk for opioid abuse. During a year following the agreement, 22.5% of patients had opioid agreement violations. Patients classified as high risk were three times more likely to violate opioid agreements compared with those with low/moderate risk. Our findings suggest that NLP techniques have potential utility to support clinicians in screening chronic noncancer pain patients considered for long-term opioid therapy. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  20. Natural language processing in pathology: a scoping review.

    PubMed

    Burger, Gerard; Abu-Hanna, Ameen; de Keizer, Nicolette; Cornet, Ronald

    2016-07-22

    Encoded pathology data are key for medical registries and analyses, but pathology information is often expressed as free text. We reviewed and assessed the use of NLP (natural language processing) for encoding pathology documents. Papers addressing NLP in pathology were retrieved from PubMed, Association for Computing Machinery (ACM) Digital Library and Association for Computational Linguistics (ACL) Anthology. We reviewed and summarised the study objectives; NLP methods used and their validation; software implementations; the performance on the dataset used and any reported use in practice. The main objectives of the 38 included papers were encoding and extraction of clinically relevant information from pathology reports. Common approaches were word/phrase matching, probabilistic machine learning and rule-based systems. Five papers (13%) compared different methods on the same dataset. Four papers did not specify the method(s) used. 18 of the 26 studies that reported F-measure, recall or precision reported values of over 0.9. Proprietary software was the most frequently mentioned category (14 studies); General Architecture for Text Engineering (GATE) was the most applied architecture overall. Practical system use was reported in four papers. Most papers used expert annotation validation. Different methods are used in NLP research in pathology, and good performances, that is, high precision and recall, high retrieval/removal rates, are reported for all of these. Lack of validation and of shared datasets precludes performance comparison. More comparative analysis and validation are needed to provide better insight into the performance and merits of these methods. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  1. Performance of a Natural Language Processing (NLP) Tool to Extract Pulmonary Function Test (PFT) Reports from Structured and Semistructured Veteran Affairs (VA) Data

    PubMed Central

    Sauer, Brian C.; Jones, Barbara E.; Globe, Gary; Leng, Jianwei; Lu, Chao-Chin; He, Tao; Teng, Chia-Chen; Sullivan, Patrick; Zeng, Qing

    2016-01-01

    Introduction/Objective: Pulmonary function tests (PFTs) are objective estimates of lung function, but are not reliably stored within the Veteran Health Affairs data systems as structured data. The aim of this study was to validate the natural language processing (NLP) tool we developed—which extracts spirometric values and responses to bronchodilator administration—against expert review, and to estimate the number of additional spirometric tests identified beyond the structured data. Methods: All patients at seven Veteran Affairs Medical Centers with a diagnostic code for asthma Jan 1, 2006–Dec 31, 2012 were included. Evidence of spirometry with a bronchodilator challenge (BDC) was extracted from structured data as well as clinical documents. NLP’s performance was compared against a human reference standard using a random sample of 1,001 documents. Results: In the validation set NLP demonstrated a precision of 98.9 percent (95 percent confidence intervals (CI): 93.9 percent, 99.7 percent), recall of 97.8 percent (95 percent CI: 92.2 percent, 99.7 percent), and an F-measure of 98.3 percent for the forced vital capacity pre- and post pairs and precision of 100 percent (95 percent CI: 96.6 percent, 100 percent), recall of 100 percent (95 percent CI: 96.6 percent, 100 percent), and an F-measure of 100 percent for the forced expiratory volume in one second pre- and post pairs for bronchodilator administration. Application of the NLP increased the proportion identified with complete bronchodilator challenge by 25 percent. Discussion/Conclusion: This technology can improve identification of PFTs for epidemiologic research. Caution must be taken in assuming that a single domain of clinical data can completely capture the scope of a disease, treatment, or clinical test. PMID:27376095

  2. Mapping Partners Master Drug Dictionary to RxNorm using an NLP-based approach.

    PubMed

    Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Chang, Frank Y; DiMaggio, Dana; Rocha, Roberto A

    2012-08-01

    To develop an automated method based on natural language processing (NLP) to facilitate the creation and maintenance of a mapping between RxNorm and a local medication terminology for interoperability and meaningful use purposes. We mapped 5961 terms from Partners Master Drug Dictionary (MDD) and 99 of the top prescribed medications to RxNorm. The mapping was conducted at both term and concept levels using an NLP tool, called MTERMS, followed by a manual review conducted by domain experts who created a gold standard mapping. The gold standard was used to assess the overall mapping between MDD and RxNorm and evaluate the performance of MTERMS. Overall, 74.7% of MDD terms and 82.8% of the top 99 terms had an exact semantic match to RxNorm. Compared to the gold standard, MTERMS achieved a precision of 99.8% and a recall of 73.9% when mapping all MDD terms, and a precision of 100% and a recall of 72.6% when mapping the top prescribed medications. The challenges and gaps in mapping MDD to RxNorm are mainly due to unique user or application requirements for representing drug concepts and the different modeling approaches inherent in the two terminologies. An automated approach based on NLP followed by human expert review is an efficient and feasible way for conducting dynamic mapping. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Head injury assessment of non-lethal projectile impacts: A combined experimental/computational method.

    PubMed

    Sahoo, Debasis; Robbe, Cyril; Deck, Caroline; Meyer, Frank; Papy, Alexandre; Willinger, Remy

    2016-11-01

    The main objective of this study is to develop a methodology to assess this risk based on experimental tests versus numerical predictive head injury simulations. A total of 16 non-lethal projectiles (NLP) impacts were conducted with rigid force plate at three different ranges of impact velocity (120, 72 and 55m/s) and the force/deformation-time data were used for the validation of finite element (FE) NLP. A good accordance between experimental and simulation data were obtained during validation of FE NLP with high correlation value (>0.98) and peak force discrepancy of less than 3%. A state-of-the art finite element head model with enhanced brain and skull material laws and specific head injury criteria was used for numerical computation of NLP impacts. Frontal and lateral FE NLP impacts to the head model at different velocities were performed under LS-DYNA. It is the very first time that the lethality of NLP is assessed by axonal strain computation to predict diffuse axonal injury (DAI) in NLP impacts to head. In case of temporo-parietal impact the min-max risk of DAI is 0-86%. With a velocity above 99.2m/s there is greater than 50% risk of DAI for temporo-parietal impacts. All the medium- and high-velocity impacts are susceptible to skull fracture, with a percentage risk higher than 90%. This study provides tool for a realistic injury (DAI and skull fracture) assessment during NLP impacts to the human head. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Nesfatin-1-like peptide is a novel metabolic factor that suppresses feeding, and regulates whole-body energy homeostasis in male Wistar rats

    PubMed Central

    Gawli, Kavishankar; Ramesh, Naresh

    2017-01-01

    Nucleobindin-1 has high sequence similarity to nucleobindin-2, which encodes the anorectic and metabolic peptide, nesfatin-1. We previously reported a nesfatin-1-like peptide (NLP), anorectic in fish and insulinotropic in mice islet beta-like cells. The main objective of this research was to determine whether NLP is a metabolic regulator in male Wistar rats. A single intraperitoneal (IP) injection of NLP (100 μg/kg BW) decreased food intake and increased ambulatory movement, without causing any change in total activity or energy expenditure when compared to saline-treated rats. Continuous subcutaneous infusion of NLP (100 μg/kg BW) using osmotic mini-pumps for 7 days caused a reduction in food intake on days 3 and 4. Similarly, water intake was also reduced for two days (days 3 and 4) with the effect being observed during the dark phase. This was accompanied by an increased RER and energy expenditure. However, decreased whole-body fat oxidation, and total activity were observed during the long-term treatment (7 days). Body weight gain was not significantly different between control and NLP infused rats. The expression of mRNAs encoding adiponectin, resistin, ghrelin, cholecystokinin and uncoupling protein 1 (UCP1) were significantly upregulated, while leptin and peptide YY mRNA expression was downregulated in NLP-treated rats. These findings indicate that administration of NLP at 100 μg/kg BW reduces food intake and modulates whole body energy balance. In summary, NLP is a novel metabolic peptide in rats. PMID:28542568

  5. Identifying QT prolongation from ECG impressions using a general-purpose Natural Language Processor

    PubMed Central

    Denny, Joshua C.; Miller, Randolph A.; Waitman, Lemuel Russell; Arrieta, Mark; Peterson, Joshua F.

    2009-01-01

    Objective Typically detected via electrocardiograms (ECGs), QT interval prolongation is a known risk factor for sudden cardiac death. Since medications can promote or exacerbate the condition, detection of QT interval prolongation is important for clinical decision support. We investigated the accuracy of natural language processing (NLP) for identifying QT prolongation from cardiologist-generated, free-text ECG impressions compared to corrected QT (QTc) thresholds reported by ECG machines. Methods After integrating negation detection to a locally-developed natural language processor, the KnowledgeMap concept identifier, we evaluated NLP-based detection of QT prolongation compared to the calculated QTc on a set of 44,318 ECGs obtained from hospitalized patients. We also created a string query using regular expressions to identify QT prolongation. We calculated sensitivity and specificity of the methods using manual physician review of the cardiologist-generated reports as the gold standard. To investigate causes of “false positive” calculated QTc, we manually reviewed randomly selected ECGs with a long calculated QTc but no mention of QT prolongation. Separately, we validated the performance of the negation detection algorithm on 5,000 manually-categorized ECG phrases for any medical concept (not limited to QT prolongation) prior to developing the NLP query for QT prolongation. Results The NLP query for QT prolongation correctly identified 2,364 of 2,373 ECGs with QT prolongation with a sensitivity of 0.996 and a positive predictive value of 1.000. There were no false positives. The regular expression query had a sensitivity of 0.999 and positive predictive value of 0.982. In contrast, the positive predictive value of common QTc thresholds derived from ECG machines was 0.07–0.25 with corresponding sensitivities of 0.994–0.046. The negation detection algorithm had a recall of 0.973 and precision of 0.982 for 10,490 concepts found within ECG impressions. Conclusions NLP and regular expression queries of cardiologists’ ECG interpretations can more effectively identify QT prolongation than the automated QTc intervals reported by ECG machines. Future clinical decision support could employ NLP queries to detect QTc prolongation and other reported ECG abnormalities. PMID:18938105

  6. Enhancing Comparative Effectiveness Research With Automated Pediatric Pneumonia Detection in a Multi-Institutional Clinical Repository: A PHIS+ Pilot Study.

    PubMed

    Meystre, Stephane; Gouripeddi, Ramkiran; Tieder, Joel; Simmons, Jeffrey; Srivastava, Rajendu; Shah, Samir

    2017-05-15

    Community-acquired pneumonia is a leading cause of pediatric morbidity. Administrative data are often used to conduct comparative effectiveness research (CER) with sufficient sample sizes to enhance detection of important outcomes. However, such studies are prone to misclassification errors because of the variable accuracy of discharge diagnosis codes. The aim of this study was to develop an automated, scalable, and accurate method to determine the presence or absence of pneumonia in children using chest imaging reports. The multi-institutional PHIS+ clinical repository was developed to support pediatric CER by expanding an administrative database of children's hospitals with detailed clinical data. To develop a scalable approach to find patients with bacterial pneumonia more accurately, we developed a Natural Language Processing (NLP) application to extract relevant information from chest diagnostic imaging reports. Domain experts established a reference standard by manually annotating 282 reports to train and then test the NLP application. Findings of pleural effusion, pulmonary infiltrate, and pneumonia were automatically extracted from the reports and then used to automatically classify whether a report was consistent with bacterial pneumonia. Compared with the annotated diagnostic imaging reports reference standard, the most accurate implementation of machine learning algorithms in our NLP application allowed extracting relevant findings with a sensitivity of .939 and a positive predictive value of .925. It allowed classifying reports with a sensitivity of .71, a positive predictive value of .86, and a specificity of .962. When compared with each of the domain experts manually annotating these reports, the NLP application allowed for significantly higher sensitivity (.71 vs .527) and similar positive predictive value and specificity . NLP-based pneumonia information extraction of pediatric diagnostic imaging reports performed better than domain experts in this pilot study. NLP is an efficient method to extract information from a large collection of imaging reports to facilitate CER. ©Stephane Meystre, Ramkiran Gouripeddi, Joel Tieder, Jeffrey Simmons, Rajendu Srivastava, Samir Shah. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.05.2017.

  7. Neuro-Linguistic Programming in Couple Therapy.

    ERIC Educational Resources Information Center

    Forman, Bruce D.

    Neuro-Linguistic Programming (NLP) is a method of understanding the organization of subjective human experience. The NLP model provides a theoretical framework for directing or guiding therapeutic change. According to NLP, people experience the so-called real world indirectly and operate on the real world as if it were like the model of it they…

  8. The Role of NLP in Teachers' Classroom Discourse

    ERIC Educational Resources Information Center

    Millrood, Radislav

    2004-01-01

    Neuro-linguistic programming (NLP) is an approach to language teaching which is claimed to help achieve excellence in learner performance. Yet there is little evidence of the impact that NLP techniques in teachers' discourse can have on learners. The article draws on workshops with teachers where classroom simulations were used to raise teachers'…

  9. Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.

    PubMed

    Hassanpour, Saeed; Bay, Graham; Langlotz, Curtis P

    2017-06-01

    We built a natural language processing (NLP) method to automatically extract clinical findings in radiology reports and characterize their level of change and significance according to a radiology-specific information model. We utilized a combination of machine learning and rule-based approaches for this purpose. Our method is unique in capturing different features and levels of abstractions at surface, entity, and discourse levels in text analysis. This combination has enabled us to recognize the underlying semantics of radiology report narratives for this task. We evaluated our method on radiology reports from four major healthcare organizations. Our evaluation showed the efficacy of our method in highlighting important changes (accuracy 99.2%, precision 96.3%, recall 93.5%, and F1 score 94.7%) and identifying significant observations (accuracy 75.8%, precision 75.2%, recall 75.7%, and F1 score 75.3%) to characterize radiology reports. This method can help clinicians quickly understand the key observations in radiology reports and facilitate clinical decision support, review prioritization, and disease surveillance.

  10. Semi Automatic Ontology Instantiation in the domain of Risk Management

    NASA Astrophysics Data System (ADS)

    Makki, Jawad; Alquier, Anne-Marie; Prince, Violaine

    One of the challenging tasks in the context of Ontological Engineering is to automatically or semi-automatically support the process of Ontology Learning and Ontology Population from semi-structured documents (texts). In this paper we describe a Semi-Automatic Ontology Instantiation method from natural language text, in the domain of Risk Management. This method is composed from three steps 1 ) Annotation with part-of-speech tags, 2) Semantic Relation Instances Extraction, 3) Ontology instantiation process. It's based on combined NLP techniques using human intervention between steps 2 and 3 for control and validation. Since it heavily relies on linguistic knowledge it is not domain dependent which is a good feature for portability between the different fields of risk management application. The proposed methodology uses the ontology of the PRIMA1 project (supported by the European community) as a Generic Domain Ontology and populates it via an available corpus. A first validation of the approach is done through an experiment with Chemical Fact Sheets from Environmental Protection Agency2.

  11. Non-abelian factorisation for next-to-leading-power threshold logarithms

    NASA Astrophysics Data System (ADS)

    Bonocore, D.; Laenen, E.; Magnea, L.; Vernazza, L.; White, C. D.

    2016-12-01

    Soft and collinear radiation is responsible for large corrections to many hadronic cross sections, near thresholds for the production of heavy final states. There is much interest in extending our understanding of this radiation to next-to-leading power (NLP) in the threshold expansion. In this paper, we generalise a previously proposed all-order NLP factorisation formula to include non-abelian corrections. We define a nonabelian radiative jet function, organising collinear enhancements at NLP, and compute it for quark jets at one loop. We discuss in detail the issue of double counting between soft and collinear regions. Finally, we verify our prescription by reproducing all NLP logarithms in Drell-Yan production up to NNLO, including those associated with double real emission. Our results constitute an important step in the development of a fully general resummation formalism for NLP threshold effects.

  12. Evaluation of Natural Language Processing (NLP) Systems to Annotate Drug Product Labeling with MedDRA Terminology.

    PubMed

    Ly, Thomas; Pamer, Carol; Dang, Oanh; Brajovic, Sonja; Haider, Shahrukh; Botsis, Taxiarchis; Milward, David; Winter, Andrew; Lu, Susan; Ball, Robert

    2018-05-31

    The FDA Adverse Event Reporting System (FAERS) is a primary data source for identifying unlabeled adverse events (AEs) in a drug or biologic drug product's postmarketing phase. Many AE reports must be reviewed by drug safety experts to identify unlabeled AEs, even if the reported AEs are previously identified, labeled AEs. Integrating the labeling status of drug product AEs into FAERS could increase report triage and review efficiency. Medical Dictionary for Regulatory Activities (MedDRA) is the standard for coding AE terms in FAERS cases. However, drug manufacturers are not required to use MedDRA to describe AEs in product labels. We hypothesized that natural language processing (NLP) tools could assist in automating the extraction and MedDRA mapping of AE terms in drug product labels. We evaluated the performance of three NLP systems, (ETHER, I2E, MetaMap) for their ability to extract AE terms from drug labels and translate the terms to MedDRA Preferred Terms (PTs). Pharmacovigilance-based annotation guidelines for extracting AE terms from drug labels were developed for this study. We compared each system's output to MedDRA PT AE lists, manually mapped by FDA pharmacovigilance experts using the guidelines, for ten drug product labels known as the "gold standard AE list" (GSL) dataset. Strict time and configuration conditions were imposed in order to test each system's capabilities under conditions of no human intervention and minimal system configuration. Each NLP system's output was evaluated for precision, recall and F measure in comparison to the GSL. A qualitative error analysis (QEA) was conducted to categorize a random sample of each NLP system's false positive and false negative errors. A total of 417, 278, and 250 false positive errors occurred in the ETHER, I2E, and MetaMap outputs, respectively. A total of 100, 80, and 187 false negative errors occurred in ETHER, I2E, and MetaMap outputs, respectively. Precision ranged from 64% to 77%, recall from 64% to 83% and F measure from 67% to 79%. I2E had the highest precision (77%), recall (83%) and F measure (79%). ETHER had the lowest precision (64%). MetaMap had the lowest recall (64%). The QEA found that the most prevalent false positive errors were context errors such as "Context error/General term", "Context error/Instructions or monitoring parameters", "Context error/Medical history preexisting condition underlying condition risk factor or contraindication", and "Context error/AE manifestations or secondary complication". The most prevalent false negative errors were in the "Incomplete or missed extraction" error category. Missing AE terms were typically due to long terms, or terms containing non-contiguous words which do not correspond exactly to MedDRA synonyms. MedDRA mapping errors were a minority of errors for ETHER and I2E but were the most prevalent false positive errors for MetaMap. The results demonstrate that it may be feasible to use NLP tools to extract and map AE terms to MedDRA PTs. However, the NLP tools we tested would need to be modified or reconfigured to lower the error rates to support their use in a regulatory setting. Tools specific for extracting AE terms from drug labels and mapping the terms to MedDRA PTs may need to be developed to support pharmacovigilance. Conducting research using additional NLP systems on a larger, diverse GSL would also be informative. Copyright © 2018. Published by Elsevier Inc.

  13. Automated pancreatic cyst screening using natural language processing: a new tool in the early detection of pancreatic cancer

    PubMed Central

    Roch, Alexandra M; Mehrabi, Saeed; Krishnan, Anand; Schmidt, Heidi E; Kesterson, Joseph; Beesley, Chris; Dexter, Paul R; Palakal, Mathew; Schmidt, C Max

    2015-01-01

    Introduction As many as 3% of computed tomography (CT) scans detect pancreatic cysts. Because pancreatic cysts are incidental, ubiquitous and poorly understood, follow-up is often not performed. Pancreatic cysts may have a significant malignant potential and their identification represents a ‘window of opportunity’ for the early detection of pancreatic cancer. The purpose of this study was to implement an automated Natural Language Processing (NLP)-based pancreatic cyst identification system. Method A multidisciplinary team was assembled. NLP-based identification algorithms were developed based on key words commonly used by physicians to describe pancreatic cysts and programmed for automated search of electronic medical records. A pilot study was conducted prospectively in a single institution. Results From March to September 2013, 566 233 reports belonging to 50 669 patients were analysed. The mean number of patients reported with a pancreatic cyst was 88/month (range 78–98). The mean sensitivity and specificity were 99.9% and 98.8%, respectively. Conclusion NLP is an effective tool to automatically identify patients with pancreatic cysts based on electronic medical records (EMR). This highly accurate system can help capture patients ‘at-risk’ of pancreatic cancer in a registry. PMID:25537257

  14. Assessment of commercial NLP engines for medication information extraction from dictated clinical notes.

    PubMed

    Jagannathan, V; Mullett, Charles J; Arbogast, James G; Halbritter, Kevin A; Yellapragada, Deepthi; Regulapati, Sushmitha; Bandaru, Pavani

    2009-04-01

    We assessed the current state of commercial natural language processing (NLP) engines for their ability to extract medication information from textual clinical documents. Two thousand de-identified discharge summaries and family practice notes were submitted to four commercial NLP engines with the request to extract all medication information. The four sets of returned results were combined to create a comparison standard which was validated against a manual, physician-derived gold standard created from a subset of 100 reports. Once validated, the individual vendor results for medication names, strengths, route, and frequency were compared against this automated standard with precision, recall, and F measures calculated. Compared with the manual, physician-derived gold standard, the automated standard was successful at accurately capturing medication names (F measure=93.2%), but performed less well with strength (85.3%) and route (80.3%), and relatively poorly with dosing frequency (48.3%). Moderate variability was seen in the strengths of the four vendors. The vendors performed better with the structured discharge summaries than with the clinic notes in an analysis comparing the two document types. Although automated extraction may serve as the foundation for a manual review process, it is not ready to automate medication lists without human intervention.

  15. Predicate Matching in NLP: A Review of Research on the Preferred Representational System.

    ERIC Educational Resources Information Center

    Sharpley, Christopher F.

    1984-01-01

    Reviews 15 studies that have investigated the use of the Preferred Representational System (PRS) in Neurolinguistic Programming (NLP). Aspects of design, methodology, population and dependent measures are evaluated, with comments on the outcomes obtained. Results suggested that there is little supportive evidence for the use of PRS in the NLP.…

  16. A Qualitative Investigation into the Experience of Neuro-Linguistic Programming Certification Training among Japanese Career Consultants

    ERIC Educational Resources Information Center

    Kotera, Yasuhiro

    2018-01-01

    Although the application of neuro-linguistic programming (NLP) has been reported worldwide, its scientific investigation is limited. Career consulting is one of the fields where NLP has been increasingly applied in Japan. This study explored why career consultants undertake NLP training, and what they find most useful to their practice. Thematic…

  17. Using the Natural Language Paradigm (NLP) to Increase Vocalizations of Older Adults with Cognitive Impairments

    ERIC Educational Resources Information Center

    LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.

    2007-01-01

    The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…

  18. Heavy quarkonium production at collider energies: Factorization and evolution

    NASA Astrophysics Data System (ADS)

    Kang, Zhong-Bo; Ma, Yan-Qing; Qiu, Jian-Wei; Sterman, George

    2014-08-01

    We present a perturbative QCD factorization formalism for inclusive production of heavy quarkonia of large transverse momentum, pT at collider energies, including both leading power (LP) and next-to-leading power (NLP) behavior in pT. We demonstrate that both LP and NLP contributions can be factorized in terms of perturbatively calculable short-distance partonic coefficient functions and universal nonperturbative fragmentation functions, and derive the evolution equations that are implied by the factorization. We identify projection operators for all channels of the factorized LP and NLP infrared safe short-distance partonic hard parts, and corresponding operator definitions of fragmentation functions. For the NLP, we focus on the contributions involving the production of a heavy quark pair, a necessary condition for producing a heavy quarkonium. We evaluate the first nontrivial order of evolution kernels for all relevant fragmentation functions, and discuss the role of NLP contributions.

  19. Jointly learning word embeddings using a corpus and a knowledge base

    PubMed Central

    Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi

    2018-01-01

    Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052

  20. The Effect of Neuro-Linguistic Programming (NLP) on Reading Comprehension in English for Specific Purposes Courses

    ERIC Educational Resources Information Center

    Farahani, Fahimeh

    2018-01-01

    Neuro-Linguistic Programming (NLP) has potential to help language learners; however, it has received scant attention. The present study was an attempt to investigate the effect of NLP techniques on reading comprehension of English as a Foreign Language (EFL) learners at an English for Specific Purposes (ESP) course. To achieve this goal, two…

  1. Expression of Caenorhabditis elegans antimicrobial peptide NLP-31 in Escherichia coli

    NASA Astrophysics Data System (ADS)

    Lim, Mei-Perng; Nathan, Sheila

    2014-09-01

    Burkholderia pseudomallei is the causative agent of melioidosis, a fulminant disease endemic in Southeast Asia and Northern Australia. The standardized form of therapy is antibiotics treatment; however, the bacterium has become increasingly resistant to these antibiotics. This has spurred the need to search for alternative therapeutic agents. Antimicrobial peptides (AMPs) are small proteins that possess broad-spectrum antimicrobial activity. In a previous study, the nematode Caenorhabditis elegans was infected by B. pseudomallei and a whole animal transcriptome analysis identified a number of AMP-encoded genes which were induced significantly in the infected worms. One of the AMPs identified is NLP-31 and to date, there are no reports of anti-B. pseudomallei activity demonstrated by NLP-31. To produce NLP-31 protein for future studies, the gene encoding for NLP-31 was cloned into the pET32b expression vector and transformed into Escherichia coli BL21(DE3). Protein expression was induced with 1 mM IPTG for 20 hours at 20°C and recombinant NLP-31 was detected in the soluble fraction. Taken together, a simple optimized heterologous production of AMPs in an E. coli expression system has been successfully developed.

  2. NLP-1: a DNA intercalating hypoxic cell radiosensitizer and cytotoxin

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Panicucci, R.; Heal, R.; Laderoute, K.

    The 2-nitroimidazole linked phenanthridine, NLP-1 (5-(3-(2-nitro-1-imidazoyl)-propyl)-phenanthridinium bromide), was synthesized with the rationale of targeting the nitroimidazole to DNA via the phenanthridine ring. The drug is soluble in aqueous solution (greater than 25 mM) and stable at room temperature. It binds to DNA with a binding constant 1/30 that of ethidium bromide. At a concentration of 0.5 mM, NLP-1 is 8 times more toxic to hypoxic than aerobic cells at 37 degrees C. This concentration is 40 times less than the concentration of misonidazole, a non-intercalating 2-nitroimidazole, required for the same degree of hypoxic cell toxicity. The toxicity of NLP-1 ismore » reduced at least 10-fold at 0 degrees C. Its ability to radiosensitize hypoxic cells is similar to misonidazole at 0 degrees C. Thus the putative targeting of the 2-nitroimidazole, NLP-1, to DNA, via its phenanthridine group, enhances its hypoxic toxicity, but not its radiosensitizing ability under the present test conditions. NLP-1 represents a lead compound for intercalating 2-nitroimidazoles with selective toxicity for hypoxic cells.« less

  3. Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.

    PubMed

    Endara, Lorena; Cui, Hong; Burleigh, J Gordon

    2018-03-01

    Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms. Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae. The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.

  4. How Confounder Strength Can Affect Allocation of Resources in Electronic Health Records.

    PubMed

    Lynch, Kristine E; Whitcomb, Brian W; DuVall, Scott L

    2018-01-01

    When electronic health record (EHR) data are used, multiple approaches may be available for measuring the same variable, introducing potentially confounding factors. While additional information may be gleaned and residual confounding reduced through resource-intensive assessment methods such as natural language processing (NLP), whether the added benefits offset the added cost of the additional resources is not straightforward. We evaluated the implications of misclassification of a confounder when using EHRs. Using a combination of simulations and real data surrounding hospital readmission, we considered smoking as a potential confounder. We compared ICD-9 diagnostic code assignment, which is an easily available measure but has the possibility of substantial misclassification of smoking status, with NLP, a method of determining smoking status that more expensive and time-consuming than ICD-9 code assignment but has less potential for misclassification. Classification of smoking status with NLP consistently produced less residual confounding than the use of ICD-9 codes; however, when minimal confounding was present, differences between the approaches were small. When considerable confounding is present, investing in a superior measurement tool becomes advantageous.

  5. Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network.

    PubMed

    Yang, Zhongliang; Huang, Yongfeng; Jiang, Yiran; Sun, Yuxi; Zhang, Yu-Jin; Luo, Pengcheng

    2018-04-20

    Automatically extracting useful information from electronic medical records along with conducting disease diagnoses is a promising task for both clinical decision support(CDS) and neural language processing(NLP). Most of the existing systems are based on artificially constructed knowledge bases, and then auxiliary diagnosis is done by rule matching. In this study, we present a clinical intelligent decision approach based on Convolutional Neural Networks(CNN), which can automatically extract high-level semantic information of electronic medical records and then perform automatic diagnosis without artificial construction of rules or knowledge bases. We use collected 18,590 copies of the real-world clinical electronic medical records to train and test the proposed model. Experimental results show that the proposed model can achieve 98.67% accuracy and 96.02% recall, which strongly supports that using convolutional neural network to automatically learn high-level semantic features of electronic medical records and then conduct assist diagnosis is feasible and effective.

  6. Exploiting salient semantic analysis for information retrieval

    NASA Astrophysics Data System (ADS)

    Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui

    2016-11-01

    Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.

  7. Identification and functional analysis of the NLP-encoding genes from the phytopathogenic oomycete Phytophthora capsici.

    PubMed

    Chen, Xiao-Ren; Huang, Shen-Xin; Zhang, Ye; Sheng, Gui-Lin; Li, Yan-Peng; Zhu, Feng

    2018-03-23

    Phytophthora capsici is a hemibiotrophic, phytopathogenic oomycete that infects a wide range of crops, resulting in significant economic losses worldwide. By means of a diverse arsenal of secreted effector proteins, hemibiotrophic pathogens may manipulate plant cell death to establish a successful infection and colonization. In this study, we described the analysis of the gene family encoding necrosis- and ethylene-inducing peptide 1 (Nep1)-like proteins (NLPs) in P. capsici, and identified 39 real NLP genes and 26 NLP pseudogenes. Out of the 65 predicted NLP genes, 48 occur in groups with two or more genes, whereas the remainder appears to be singletons distributed randomly among the genome. Phylogenetic analysis of the 39 real NLPs delineated three groups. Key residues/motif important for the effector activities are degenerated in most NLPs, including the nlp24 peptide consisting of the conserved region I (11-aa immunogenic part) and conserved region II (the heptapeptide GHRHDWE motif) that is important for phytotoxic activity. Transcriptional profiling of eight selected NLP genes indicated that they were differentially expressed during the developmental and plant infection phases of P. capsici. Functional analysis of ten cloned NLPs demonstrated that Pc11951, Pc107869, Pc109174 and Pc118548 were capable of inducing cell death in the Solanaceae, including Nicotiana benthamiana and hot pepper. This study provides an overview of the P. capsici NLP gene family, laying a foundation for further elucidating the pathogenicity mechanism of this devastating pathogen.

  8. Neuropeptide Secreted from a Pacemaker Activates Neurons to Control a Rhythmic Behavior

    PubMed Central

    Wang, Han; Girskis, Kelly; Janssen, Tom; Chan, Jason P.; Dasgupta, Krishnakali; Knowles, James A.; Schoofs, Liliane; Sieburth, Derek

    2013-01-01

    Summary Background Rhythmic behaviors are driven by endogenous biological clocks in pacemakers, which must reliably transmit timing information to target tissues that execute rhythmic outputs. During the defecation motor program in C. elegans, calcium oscillations in the pacemaker (intestine), which occur about every 50 seconds, trigger rhythmic enteric muscle contractions through downstream GABAergic neurons that innervate enteric muscles. However, the identity of the timing signal released by the pacemaker and the mechanism underlying the delivery of timing information to the GABAergic neurons are unknown. Results Here we show that a neuropeptide-like protein (NLP-40) released by the pacemaker triggers a single rapid calcium transient in the GABAergic neurons during each defecation cycle. We find that mutants lacking nlp-40 have normal pacemaker function, but lack enteric muscle contractions. NLP-40 undergoes calcium-dependent release that is mediated by the calcium sensor, SNT-2/synaptotagmin. We identify AEX-2, the G protein-coupled receptor on the GABAergic neurons, as the receptor of NLP-40. Functional calcium imaging reveals that NLP-40 and AEX-2/GPCR are both necessary for rhythmic activation of these neurons. Furthermore, acute application of synthetic NLP-40-derived peptide depolarizes the GABAergic neurons in vivo. Conclusions Our results show that NLP-40 carries the timing information from the pacemaker via calcium-dependent release and delivers it to the GABAergic neurons by instructing their activation. Thus, we propose that rhythmic release of neuropeptides can deliver temporal information from pacemakers to downstream neurons to execute rhythmic behaviors. PMID:23583549

  9. Nesfatin-1-Like Peptide Encoded in Nucleobindin-1 in Goldfish is a Novel Anorexigen Modulated by Sex Steroids, Macronutrients and Daily Rhythm

    PubMed Central

    Sundarrajan, Lakshminarasimhan; Blanco, Ayelén Melisa; Bertucci, Juan Ignacio; Ramesh, Naresh; Canosa, Luis Fabián; Unniappan, Suraj

    2016-01-01

    Nesfatin-1 is an 82 amino acid anorexigen encoded in a secreted precursor nucleobindin-2 (NUCB2). NUCB2 was named so due to its high sequence similarity with nucleobindin-1 (NUCB1). It was recently reported that NUCB1 encodes an insulinotropic nesfatin-1-like peptide (NLP) in mice. Here, we aimed to characterize NLP in fish. RT- qPCR showed NUCB1 expression in both central and peripheral tissues. Western blot analysis and/or fluorescence immunohistochemistry determined NUCB1/NLP in the brain, pituitary, testis, ovary and gut of goldfish. NUCB1 mRNA expression in goldfish pituitary and gut displayed a daily rhythmic pattern of expression. Pituitary NUCB1 mRNA expression was downregulated by estradiol, while testosterone upregulated its expression in female goldfish brain. High carbohydrate and fat suppressed NUCB1 mRNA expression in the brain and gut. Intraperitoneal injection of synthetic rat NLP and goldfish NLP at 10 and 100 ng/g body weight doses caused potent inhibition of food intake in goldfish. NLP injection also downregulated the expression of mRNAs encoding orexigens, preproghrelin and orexin-A, and upregulated anorexigen cocaine and amphetamine regulated transcript mRNA in goldfish brain. Collectively, these results provide the first set of results supporting the anorectic action of NLP, and the regulation of tissue specific expression of goldfish NUCB1. PMID:27329836

  10. Using the Natural Language Paradigm (NLP) to increase vocalizations of older adults with cognitive impairments.

    PubMed

    Leblanc, Linda A; Geiger, Kaneen B; Sautter, Rachael A; Sidener, Tina M

    2007-01-01

    The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated in a multiple baseline design across participants. Data were collected on appropriate and inappropriate vocalizations with appropriate vocalizations coded as prompted or unprompted during baseline and treatment sessions. All participants experienced increases in appropriate speech during NLP with variable response patterns. Additionally, the two participants with substantial inappropriate vocalizations showed decreases in inappropriate speech. Implications for intervention in day programs are discussed.

  11. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    ERIC Educational Resources Information Center

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  12. Net-centric ACT-R-Based Cognitive Architecture with DEVS Unified Process

    DTIC Science & Technology

    2011-04-01

    effort has been spent in analyzing various forms of requirement specifications, viz, state-based, Natural Language based, UML-based, Rule- based, BPMN ...requirement specifications in one of the chosen formats such as BPMN , DoDAF, Natural Language Processing (NLP) based, UML- based, DSL or simply

  13. Phosphorylation of Nlp by Plk1 negatively regulates its dynein-dynactin-dependent targeting to the centrosome.

    PubMed

    Casenghi, Martina; Barr, Francis A; Nigg, Erich A

    2005-11-01

    When cells enter mitosis the microtubule (MT) network undergoes a profound rearrangement, in part due to alterations in the MT nucleating and anchoring properties of the centrosome. Ninein and the ninein-like protein (Nlp) are centrosomal proteins involved in MT organisation in interphase cells. We show that the overexpression of these two proteins induces the fragmentation of the Golgi, and causes lysosomes to disperse toward the cell periphery. The ability of Nlp and ninein to perturb the cytoplasmic distribution of these organelles depends on their ability to interact with the dynein-dynactin motor complex. Our data also indicate that dynactin is required for the targeting of Nlp and ninein to the centrosome. Furthermore, phosphorylation of Nlp by the polo-like kinase 1 (Plk1) negatively regulates its association with dynactin. These findings uncover a mechanism through which Plk1 helps to coordinate changes in MT organisation with cell cycle progression, by controlling the dynein-dynactin-dependent transport of centrosomal proteins.

  14. Neuropeptide secreted from a pacemaker activates neurons to control a rhythmic behavior.

    PubMed

    Wang, Han; Girskis, Kelly; Janssen, Tom; Chan, Jason P; Dasgupta, Krishnakali; Knowles, James A; Schoofs, Liliane; Sieburth, Derek

    2013-05-06

    Rhythmic behaviors are driven by endogenous biological clocks in pacemakers, which must reliably transmit timing information to target tissues that execute rhythmic outputs. During the defecation motor program in C. elegans, calcium oscillations in the pacemaker (intestine), which occur about every 50 s, trigger rhythmic enteric muscle contractions through downstream GABAergic neurons that innervate enteric muscles. However, the identity of the timing signal released by the pacemaker and the mechanism underlying the delivery of timing information to the GABAergic neurons are unknown. Here, we show that a neuropeptide-like protein (NLP-40) released by the pacemaker triggers a single rapid calcium transient in the GABAergic neurons during each defecation cycle. We find that mutants lacking nlp-40 have normal pacemaker function, but lack enteric muscle contractions. NLP-40 undergoes calcium-dependent release that is mediated by the calcium sensor, SNT-2/synaptotagmin. We identify AEX-2, the G-protein-coupled receptor on the GABAergic neurons, as the receptor for NLP-40. Functional calcium imaging reveals that NLP-40 and AEX-2/GPCR are both necessary for rhythmic activation of these neurons. Furthermore, acute application of synthetic NLP-40-derived peptide depolarizes the GABAergic neurons in vivo. Our results show that NLP-40 carries the timing information from the pacemaker via calcium-dependent release and delivers it to the GABAergic neurons by instructing their activation. Thus, we propose that rhythmic release of neuropeptides can deliver temporal information from pacemakers to downstream neurons to execute rhythmic behaviors. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

    PubMed

    Munkhdalai, Tsendsuren; Li, Meijing; Batsuren, Khuyagbaatar; Park, Hyeon Ah; Choi, Nak Hyeon; Ryu, Keun Ho

    2015-01-01

    Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

  16. Behind the scenes: A medical natural language processing project.

    PubMed

    Wu, Joy T; Dernoncourt, Franck; Gehrmann, Sebastian; Tyler, Patrick D; Moseley, Edward T; Carlson, Eric T; Grant, David W; Li, Yeran; Welt, Jonathan; Celi, Leo Anthony

    2018-04-01

    Advancement of Artificial Intelligence (AI) capabilities in medicine can help address many pressing problems in healthcare. However, AI research endeavors in healthcare may not be clinically relevant, may have unrealistic expectations, or may not be explicit enough about their limitations. A diverse and well-functioning multidisciplinary team (MDT) can help identify appropriate and achievable AI research agendas in healthcare, and advance medical AI technologies by developing AI algorithms as well as addressing the shortage of appropriately labeled datasets for machine learning. In this paper, our team of engineers, clinicians and machine learning experts share their experience and lessons learned from their two-year-long collaboration on a natural language processing (NLP) research project. We highlight specific challenges encountered in cross-disciplinary teamwork, dataset creation for NLP research, and expectation setting for current medical AI technologies. Copyright © 2017. Published by Elsevier B.V.

  17. From Web Directories to Ontologies: Natural Language Processing Challenges

    NASA Astrophysics Data System (ADS)

    Zaihrayeu, Ilya; Sun, Lei; Giunchiglia, Fausto; Pan, Wei; Ju, Qi; Chi, Mingmin; Huang, Xuanjing

    Hierarchical classifications are used pervasively by humans as a means to organize their data and knowledge about the world. One of their main advantages is that natural language labels, used to describe their contents, are easily understood by human users. However, at the same time, this is also one of their main disadvantages as these same labels are ambiguous and very hard to be reasoned about by software agents. This fact creates an insuperable hindrance for classifications to being embedded in the Semantic Web infrastructure. This paper presents an approach to converting classifications into lightweight ontologies, and it makes the following contributions: (i) it identifies the main NLP problems related to the conversion process and shows how they are different from the classical problems of NLP; (ii) it proposes heuristic solutions to these problems, which are especially effective in this domain; and (iii) it evaluates the proposed solutions by testing them on DMoz data.

  18. Cell-free production of a functional oligomeric form of a Chlamydia major outer-membrane protein (MOMP) for vaccine development

    DOE PAGES

    He, Wei; Felderman, Martina; Evans, Angela C.; ...

    2017-07-24

    Chlamydia is a prevalent sexually transmitted disease that infects more than 100 million people worldwide. Although most individuals infected with Chlamydia trachomatis are initially asymptomatic, symptoms can arise if left undiagnosed. Long-term infection can result in debilitating conditions such as pelvic inflammatory disease, infertility, and blindness. Chlamydia infection, therefore, constitutes a significant public health threat, underscoring the need for a Chlamydia-specific vaccine. Chlamydia strains express a major outer-membrane protein (MOMP) that has been shown to be an effective vaccine antigen. However, approaches to produce a functional recombinant MOMP protein for vaccine development are limited by poor solubility, low yield, andmore » protein misfolding. For this study, we used an Escherichia coli-based cell-free system to express a MOMP protein from the mouse-specific species Chlamydia muridarum (MoPn-MOMP or mMOMP). The codon-optimized mMOMP gene was co-translated with Δ49apolipoprotein A1 (Δ49ApoA1), a truncated version of mouse ApoA1 in which the N-terminal 49 amino acids were removed. This co-translation process produced mMOMP supported within a telodendrimer nanolipoprotein particle (mMOMP–tNLP). The cell-free expressed mMOMP–tNLPs contain mMOMP multimers similar to the native MOMP protein. This cell-free process produced on average 1.5 mg of purified, water-soluble mMOMP–tNLP complex in a 1-ml cell-free reaction. The mMOMP–tNLP particle also accommodated the co-localization of CpG oligodeoxynucleotide 1826, a single-stranded synthetic DNA adjuvant, eliciting an enhanced humoral immune response in vaccinated mice. Using our mMOMP–tNLP formulation, we demonstrate a unique approach to solubilizing and administering membrane-bound proteins for future vaccine development. This method can be applied to other previously difficult-to-obtain antigens while maintaining full functionality and immunogenicity.« less

  19. Cell-free production of a functional oligomeric form of a Chlamydia major outer-membrane protein (MOMP) for vaccine development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Wei; Felderman, Martina; Evans, Angela C.

    Chlamydia is a prevalent sexually transmitted disease that infects more than 100 million people worldwide. Although most individuals infected with Chlamydia trachomatis are initially asymptomatic, symptoms can arise if left undiagnosed. Long-term infection can result in debilitating conditions such as pelvic inflammatory disease, infertility, and blindness. Chlamydia infection, therefore, constitutes a significant public health threat, underscoring the need for a Chlamydia-specific vaccine. Chlamydia strains express a major outer-membrane protein (MOMP) that has been shown to be an effective vaccine antigen. However, approaches to produce a functional recombinant MOMP protein for vaccine development are limited by poor solubility, low yield, andmore » protein misfolding. For this study, we used an Escherichia coli-based cell-free system to express a MOMP protein from the mouse-specific species Chlamydia muridarum (MoPn-MOMP or mMOMP). The codon-optimized mMOMP gene was co-translated with Δ49apolipoprotein A1 (Δ49ApoA1), a truncated version of mouse ApoA1 in which the N-terminal 49 amino acids were removed. This co-translation process produced mMOMP supported within a telodendrimer nanolipoprotein particle (mMOMP–tNLP). The cell-free expressed mMOMP–tNLPs contain mMOMP multimers similar to the native MOMP protein. This cell-free process produced on average 1.5 mg of purified, water-soluble mMOMP–tNLP complex in a 1-ml cell-free reaction. The mMOMP–tNLP particle also accommodated the co-localization of CpG oligodeoxynucleotide 1826, a single-stranded synthetic DNA adjuvant, eliciting an enhanced humoral immune response in vaccinated mice. Using our mMOMP–tNLP formulation, we demonstrate a unique approach to solubilizing and administering membrane-bound proteins for future vaccine development. This method can be applied to other previously difficult-to-obtain antigens while maintaining full functionality and immunogenicity.« less

  20. Combating Weapons of Mass Destruction: Models, Complexity, and Algorithms in Complex Dynamic and Evolving Networks

    DTIC Science & Technology

    2015-11-01

    NLP Blondel Oslom Infomap 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 N M I (N = 5 0 0 0 ) µ SCD SCD- NLP Blondel Oslom Infomap A...Networks with minC ,maxC unconstrained. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 N M I (N = 1 0 0 0 ) µ SCD SCD- NLP Blondel Oslom Infomap 0...0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 N M I (N = 5 0 0 0 ) µ SCD SCD- NLP Blondel Oslom Infomap B

  1. Automated encoding of clinical documents based on natural language processing.

    PubMed

    Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George

    2004-01-01

    The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.

  2. Data-Driven Approaches for Paraphrasing across Language Variations

    ERIC Educational Resources Information Center

    Xu, Wei

    2014-01-01

    Our language changes very rapidly, accompanying political, social and cultural trends, as well as the evolution of science and technology. The Internet, especially the social media, has accelerated this process of change. This poses a severe challenge for both human beings and natural language processing (NLP) systems, which usually only model a…

  3. Corpora Processing and Computational Scaffolding for a Web-Based English Learning Environment: The CANDLE Project

    ERIC Educational Resources Information Center

    Liou, Hsien-Chin; Chang, Jason S; Chen, Hao-Jan; Lin, Chih-Cheng; Liaw, Meei-Ling; Gao, Zhao-Ming; Jang, Jyh-Shing Roger; Yeh, Yuli; Chuang, Thomas C.; You, Geeng-Neng

    2006-01-01

    This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus ("Sinorama") and various natural language processing (NLP) tools to construct effective English…

  4. Expression of an oxalate decarboxylase impairs the necrotic effect induced by Nep1-like protein (NLP) of Moniliophthora perniciosa in transgenic tobacco.

    PubMed

    da Silva, Leonardo F; Dias, Cristiano V; Cidade, Luciana C; Mendes, Juliano S; Pirovani, Carlos P; Alvim, Fátima C; Pereira, Gonçalo A G; Aragão, Francisco J L; Cascardo, Júlio C M; Costa, Marcio G C

    2011-07-01

    Oxalic acid (OA) and Nep1-like proteins (NLP) are recognized as elicitors of programmed cell death (PCD) in plants, which is crucial for the pathogenic success of necrotrophic plant pathogens and involves reactive oxygen species (ROS). To determine the importance of oxalate as a source of ROS for OA- and NLP-induced cell death, a full-length cDNA coding for an oxalate decarboxylase (FvOXDC) from the basidiomycete Flammulina velutipes, which converts OA into CO(2) and formate, was overexpressed in tobacco plants. The transgenic plants contained less OA and more formic acid compared with the control plants and showed enhanced resistance to cell death induced by exogenous OA and MpNEP2, an NLP of the hemibiotrophic fungus Moniliophthora perniciosa. This resistance was correlated with the inhibition of ROS formation in the transgenic plants inoculated with OA, MpNEP2, or a combination of both PCD elicitors. Taken together, these results have established a pivotal function for oxalate as a source of ROS required for the PCD-inducing activity of OA and NLP. The results also indicate that FvOXDC represents a potentially novel source of resistance against OA- and NLP-producing pathogens such as M. perniciosa, the causal agent of witches' broom disease of cacao (Theobroma cacao L.).

  5. Integer-ambiguity resolution in astronomy and geodesy

    NASA Astrophysics Data System (ADS)

    Lannes, A.; Prieur, J.-L.

    2014-02-01

    Recent theoretical developments in astronomical aperture synthesis have revealed the existence of integer-ambiguity problems. Those problems, which appear in the self-calibration procedures of radio imaging, have been shown to be similar to the nearest-lattice point (NLP) problems encountered in high-precision geodetic positioning and in global navigation satellite systems. In this paper we analyse the theoretical aspects of the matter and propose new methods for solving those NLP~problems. The related optimization aspects concern both the preconditioning stage, and the discrete-search stage in which the integer ambiguities are finally fixed. Our algorithms, which are described in an explicit manner, can easily be implemented. They lead to substantial gains in the processing time of both stages. Their efficiency was shown via intensive numerical tests.

  6. Part-of-speech tagging for clinical text: wall or bridge between institutions?

    PubMed Central

    Fan, Jung-wei; Prasad, Rashmi; Yabut, Rommel M.; Loomis, Richard M.; Zisook, Daniel S.; Mattison, John E.; Huang, Yang

    2011-01-01

    Part-of-speech (POS) tagging is a fundamental step required by various NLP systems. The training of a POS tagger relies on sufficient quality annotations. However, the annotation process is both knowledge-intensive and time-consuming in the clinical domain. A promising solution appears to be for institutions to share their annotation efforts, and yet there is little research on associated issues. We performed experiments to understand how POS tagging performance would be affected by using a pre-trained tagger versus raw training data across different institutions. We manually annotated a set of clinical notes at Kaiser Permanente Southern California (KPSC) and a set from the University of Pittsburg Medical Center (UPMC), and trained/tested POS taggers with intra- and inter-institution settings. The cTAKES POS tagger was also included in the comparison to represent a tagger partially trained from the notes of a third institution, Mayo Clinic at Rochester. Intra-institution 5-fold cross-validation estimated an accuracy of 0.953 and 0.945 on the KPSC and UPMC notes respectively. Trained purely on KPSC notes, the accuracy was 0.897 when tested on UPMC notes. Trained purely on UPMC notes, the accuracy was 0.904 when tested on KPSC notes. Applying the cTAKES tagger pre-trained with Mayo Clinic’s notes, the accuracy was 0.881 on KPSC notes and 0.883 on UPMC notes. After adding UPMC annotations to KPSC training data, the average accuracy on tested KPSC notes increased to 0.965. After adding KPSC annotations to UPMC training data, the average accuracy on tested UPMC notes increased to 0.953. The results indicated: first, the performance of pre-trained POS taggers dropped about 5% when applied directly across the institutions; second, mixing annotations from another institution following the same guideline increased tagging accuracy for about 1%. Our findings suggest that institutions can benefit more from sharing raw annotations but less from sharing pre-trained models for the POS tagging task. We believe the study could also provide general insights on cross-institution data sharing for other types of NLP tasks. PMID:22195091

  7. Part-of-speech tagging for clinical text: wall or bridge between institutions?

    PubMed

    Fan, Jung-wei; Prasad, Rashmi; Yabut, Rommel M; Loomis, Richard M; Zisook, Daniel S; Mattison, John E; Huang, Yang

    2011-01-01

    Part-of-speech (POS) tagging is a fundamental step required by various NLP systems. The training of a POS tagger relies on sufficient quality annotations. However, the annotation process is both knowledge-intensive and time-consuming in the clinical domain. A promising solution appears to be for institutions to share their annotation efforts, and yet there is little research on associated issues. We performed experiments to understand how POS tagging performance would be affected by using a pre-trained tagger versus raw training data across different institutions. We manually annotated a set of clinical notes at Kaiser Permanente Southern California (KPSC) and a set from the University of Pittsburg Medical Center (UPMC), and trained/tested POS taggers with intra- and inter-institution settings. The cTAKES POS tagger was also included in the comparison to represent a tagger partially trained from the notes of a third institution, Mayo Clinic at Rochester. Intra-institution 5-fold cross-validation estimated an accuracy of 0.953 and 0.945 on the KPSC and UPMC notes respectively. Trained purely on KPSC notes, the accuracy was 0.897 when tested on UPMC notes. Trained purely on UPMC notes, the accuracy was 0.904 when tested on KPSC notes. Applying the cTAKES tagger pre-trained with Mayo Clinic's notes, the accuracy was 0.881 on KPSC notes and 0.883 on UPMC notes. After adding UPMC annotations to KPSC training data, the average accuracy on tested KPSC notes increased to 0.965. After adding KPSC annotations to UPMC training data, the average accuracy on tested UPMC notes increased to 0.953. The results indicated: first, the performance of pre-trained POS taggers dropped about 5% when applied directly across the institutions; second, mixing annotations from another institution following the same guideline increased tagging accuracy for about 1%. Our findings suggest that institutions can benefit more from sharing raw annotations but less from sharing pre-trained models for the POS tagging task. We believe the study could also provide general insights on cross-institution data sharing for other types of NLP tasks.

  8. A factorization approach to next-to-leading-power threshold logarithms

    NASA Astrophysics Data System (ADS)

    Bonocore, D.; Laenen, E.; Magnea, L.; Melville, S.; Vernazza, L.; White, C. D.

    2015-06-01

    Threshold logarithms become dominant in partonic cross sections when the selected final state forces gluon radiation to be soft or collinear. Such radiation factorizes at the level of scattering amplitudes, and this leads to the resummation of threshold logarithms which appear at leading power in the threshold variable. In this paper, we consider the extension of this factorization to include effects suppressed by a single power of the threshold variable. Building upon the Low-Burnett-Kroll-Del Duca (LBKD) theorem, we propose a decomposition of radiative amplitudes into universal building blocks, which contain all effects ultimately responsible for next-to-leading-power (NLP) threshold logarithms in hadronic cross sections for electroweak annihilation processes. In particular, we provide a NLO evaluation of the radiative jet function, responsible for the interference of next-to-soft and collinear effects in these cross sections. As a test, using our expression for the amplitude, we reproduce all abelian-like NLP threshold logarithms in the NNLO Drell-Yan cross section, including the interplay of real and virtual emissions. Our results are a significant step towards developing a generally applicable resummation formalism for NLP threshold effects, and illustrate the breakdown of next-to-soft theorems for gauge theory amplitudes at loop level.

  9. Collaborative human-machine analysis to disambiguate entities in unstructured text and structured datasets

    NASA Astrophysics Data System (ADS)

    Davenport, Jack H.

    2016-05-01

    Intelligence analysts demand rapid information fusion capabilities to develop and maintain accurate situational awareness and understanding of dynamic enemy threats in asymmetric military operations. The ability to extract relationships between people, groups, and locations from a variety of text datasets is critical to proactive decision making. The derived network of entities must be automatically created and presented to analysts to assist in decision making. DECISIVE ANALYTICS Corporation (DAC) provides capabilities to automatically extract entities, relationships between entities, semantic concepts about entities, and network models of entities from text and multi-source datasets. DAC's Natural Language Processing (NLP) Entity Analytics model entities as complex systems of attributes and interrelationships which are extracted from unstructured text via NLP algorithms. The extracted entities are automatically disambiguated via machine learning algorithms, and resolution recommendations are presented to the analyst for validation; the analyst's expertise is leveraged in this hybrid human/computer collaborative model. Military capability is enhanced by these NLP Entity Analytics because analysts can now create/update an entity profile with intelligence automatically extracted from unstructured text, thereby fusing entity knowledge from structured and unstructured data sources. Operational and sustainment costs are reduced since analysts do not have to manually tag and resolve entities.

  10. A homogeneous superconducting magnet design using a hybrid optimization algorithm

    NASA Astrophysics Data System (ADS)

    Ni, Zhipeng; Wang, Qiuliang; Liu, Feng; Yan, Luguang

    2013-12-01

    This paper employs a hybrid optimization algorithm with a combination of linear programming (LP) and nonlinear programming (NLP) to design the highly homogeneous superconducting magnets for magnetic resonance imaging (MRI). The whole work is divided into two stages. The first LP stage provides a global optimal current map with several non-zero current clusters, and the mathematical model for the LP was updated by taking into account the maximum axial and radial magnetic field strength limitations. In the second NLP stage, the non-zero current clusters were discretized into practical solenoids. The superconducting conductor consumption was set as the objective function both in the LP and NLP stages to minimize the construction cost. In addition, the peak-peak homogeneity over the volume of imaging (VOI), the scope of 5 Gauss fringe field, and maximum magnetic field strength within superconducting coils were set as constraints. The detailed design process for a dedicated 3.0 T animal MRI scanner was presented. The homogeneous magnet produces a magnetic field quality of 6.0 ppm peak-peak homogeneity over a 16 cm by 18 cm elliptical VOI, and the 5 Gauss fringe field was limited within a 1.5 m by 2.0 m elliptical region.

  11. Weight maintenance through behaviour modification with a cooking course or neurolinguistic programming.

    PubMed

    Sørensen, Lone Brinkmann; Greve, Tine; Kreutzer, Martin; Pedersen, Ulla; Nielsen, Claus Meyer; Toubro, Søren; Astrup, Arne

    2011-01-01

    We compared the effect on weight regain of behaviour modification consisting of either a gourmet cooking course or neurolinguistic programming (NLP) therapy. Fifty-six overweight and obese subjects participated. The first step was a 12-week weight loss program. Participants achieving at least 8% weight loss were randomized to five months of either NLP therapy or a course in gourmet cooking. Follow-up occurred after two and three years. Forty-nine participants lost at least 8% of their initial body weight and were randomized to the next step. The NLP group lost an additional 1.8 kg and the cooking group lost 0.2 kg during the five months of weight maintenance (NS). The dropout rate in the cooking group was 4%, compared with 26% in the NLP group (p=0.04). There was no difference in weight maintenance after two and three years of follow-up. In conclusion, weight loss in overweight and obese participants was maintained equally efficiently with a healthy cooking course or NLP therapy, but the dropout rate was lower during the active cooking treatment.

  12. Comparison of linear and nonlinear programming approaches for "worst case dose" and "minmax" robust optimization of intensity-modulated proton therapy dose distributions.

    PubMed

    Zaghian, Maryam; Cao, Wenhua; Liu, Wei; Kardar, Laleh; Randeniya, Sharmalee; Mohan, Radhe; Lim, Gino

    2017-03-01

    Robust optimization of intensity-modulated proton therapy (IMPT) takes uncertainties into account during spot weight optimization and leads to dose distributions that are resilient to uncertainties. Previous studies demonstrated benefits of linear programming (LP) for IMPT in terms of delivery efficiency by considerably reducing the number of spots required for the same quality of plans. However, a reduction in the number of spots may lead to loss of robustness. The purpose of this study was to evaluate and compare the performance in terms of plan quality and robustness of two robust optimization approaches using LP and nonlinear programming (NLP) models. The so-called "worst case dose" and "minmax" robust optimization approaches and conventional planning target volume (PTV)-based optimization approach were applied to designing IMPT plans for five patients: two with prostate cancer, one with skull-based cancer, and two with head and neck cancer. For each approach, both LP and NLP models were used. Thus, for each case, six sets of IMPT plans were generated and assessed: LP-PTV-based, NLP-PTV-based, LP-worst case dose, NLP-worst case dose, LP-minmax, and NLP-minmax. The four robust optimization methods behaved differently from patient to patient, and no method emerged as superior to the others in terms of nominal plan quality and robustness against uncertainties. The plans generated using LP-based robust optimization were more robust regarding patient setup and range uncertainties than were those generated using NLP-based robust optimization for the prostate cancer patients. However, the robustness of plans generated using NLP-based methods was superior for the skull-based and head and neck cancer patients. Overall, LP-based methods were suitable for the less challenging cancer cases in which all uncertainty scenarios were able to satisfy tight dose constraints, while NLP performed better in more difficult cases in which most uncertainty scenarios were hard to meet tight dose limits. For robust optimization, the worst case dose approach was less sensitive to uncertainties than was the minmax approach for the prostate and skull-based cancer patients, whereas the minmax approach was superior for the head and neck cancer patients. The robustness of the IMPT plans was remarkably better after robust optimization than after PTV-based optimization, and the NLP-PTV-based optimization outperformed the LP-PTV-based optimization regarding robustness of clinical target volume coverage. In addition, plans generated using LP-based methods had notably fewer scanning spots than did those generated using NLP-based methods. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  13. Significant lexical relationships

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedersen, T.; Kayaalp, M.; Bruce, R.

    Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.

  14. BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.

    PubMed

    Sogancioglu, Gizem; Öztürk, Hakime; Özgür, Arzucan

    2017-07-15

    The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. A reduced successive quadratic programming strategy for errors-in-variables estimation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tjoa, I.-B.; Biegler, L. T.; Carnegie-Mellon Univ.

    Parameter estimation problems in process engineering represent a special class of nonlinear optimization problems, because the maximum likelihood structure of the objective function can be exploited. Within this class, the errors in variables method (EVM) is particularly interesting. Here we seek a weighted least-squares fit to the measurements with an underdetermined process model. Thus, both the number of variables and degrees of freedom available for optimization increase linearly with the number of data sets. Large optimization problems of this type can be particularly challenging and expensive to solve because, for general-purpose nonlinear programming (NLP) algorithms, the computational effort increases atmore » least quadratically with problem size. In this study we develop a tailored NLP strategy for EVM problems. The method is based on a reduced Hessian approach to successive quadratic programming (SQP), but with the decomposition performed separately for each data set. This leads to the elimination of all variables but the model parameters, which are determined by a QP coordination step. In this way the computational effort remains linear in the number of data sets. Moreover, unlike previous approaches to the EVM problem, global and superlinear properties of the SQP algorithm apply naturally. Also, the method directly incorporates inequality constraints on the model parameters (although not on the fitted variables). This approach is demonstrated on five example problems with up to 102 degrees of freedom. Compared to general-purpose NLP algorithms, large improvements in computational performance are observed.« less

  16. Interdisciplinary Research at the Intersection of CALL, NLP, and SLA: Methodological Implications from an Input Enhancement Project

    ERIC Educational Resources Information Center

    Ziegler, Nicole; Meurers, Detmar; Rebuschat, Patrick; Ruiz, Simón; Moreno-Vega, José L.; Chinkina, Maria; Li, Wenjing; Grey, Sarah

    2017-01-01

    Despite the promise of research conducted at the intersection of computer-assisted language learning (CALL), natural language processing, and second language acquisition, few studies have explored the potential benefits of using intelligent CALL systems to deepen our understanding of the process and products of second language (L2) learning. The…

  17. Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

    ERIC Educational Resources Information Center

    Jarman, Jay

    2011-01-01

    This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…

  18. Advanced Natural Language Processing and Temporal Mining for Clinical Discovery

    ERIC Educational Resources Information Center

    Mehrabi, Saeed

    2016-01-01

    There has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered…

  19. Performance and carcass characteristics of guinea fowl fed on dietary Neem (Azadirachta indica) leaf powder as a growth promoter.

    PubMed

    Singh, M K; Singh, S K; Sharma, R K; Singh, B; Kumar, Sh; Joshi, S K; Kumar, S; Sathapathy, S

    2015-01-01

    The present work aimed at studying growth pattern and carcass traits in pearl grey guinea fowl fed on dietary Neem (Azadirachta indica) leaf powder (NLP) over a period of 12 weeks. Day old guinea fowl keets (n=120) were randomly assigned to four treatment groups, each with 3 replicates. The first treatment was designated as control (T0) in which no supplement was added to the feed, while in treatments T1, T2 and T3, NLP was provided as 1, 2 and 3 g per kg of feed, respectively. The results revealed a significant increase in body weight at 12 weeks; 1229.7 for T1, 1249.8 for T2, and 1266.2 g T3 compared to 1220.0 g for the control group (P<0.05). The results also showed that the supplementation of NLP significantly increased feed intake (P≤0.05) which might be due to the hypoglycaemic activity of Neem. A significant increase was also found in the feed conversion ratio (FCR) of the treated groups over the control, showing that feeding NLP to the treated groups has lowered their residual feed efficiency. The results of the study demonstrate the beneficial effects of supplementing NLP on body weight gain and dressed yield in the treated groups in guinea fowl. NLP is, therefore, suggested to be used as a feed supplement in guinea fowl for higher profitability.

  20. Evaluation of Nanolipoprotein Particles (NLPs) as an In Vivo Delivery Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fischer, Nicholas O.; Weilhammer, Dina R.; Dunkle, Alexis

    Nanoparticles hold great promise for the delivery of therapeutics, yet limitations remain with regards to the use of these nanosystems for efficient long-lasting targeted delivery of therapeutics, including imparting functionality to the platform, in vivo stability, drug entrapment efficiency and toxicity. In order to begin to address these limitations, we evaluated the functionality, stability, cytotoxicity, toxicity, immunogenicity and in vivo biodistribution of nanolipoprotein particles (NLPs), which are mimetics of naturally occurring high-density lipoproteins (HDLs). We also found that a wide range of molecules could be reliably conjugated to the NLP, including proteins, single-stranded DNA, and small molecules. The NLP wasmore » also found to be relatively stable in complex biological fluids and displayed no cytotoxicity in vitro at doses as high as 320 µg/ml. In addition, we observed that in vivo administration of the NLP daily for 14 consecutive days did not induce significant weight loss or result in lesions on excised organs. Furthermore, the NLPs did not display overt immunogenicity with respect to antibody generation. Finally, the biodistribution of the NLP in vivo was found to be highly dependent on the route of administration, where intranasal administration resulted in prolonged retention in the lung tissue. Though only a select number of NLP compositions were evaluated, the findings of this study suggest that the NLP platform holds promise for use as both a targeted and non-targeted in vivo delivery vehicle for a range of therapeutics.« less

  1. Evaluation of Nanolipoprotein Particles (NLPs) as an In Vivo Delivery Platform

    PubMed Central

    Fischer, Nicholas O.; Weilhammer, Dina R.; Dunkle, Alexis; Thomas, Cynthia; Hwang, Mona; Corzett, Michele; Lychak, Cheri; Mayer, Wasima; Urbin, Salustra; Collette, Nicole; Chiun Chang, Jiun; Loots, Gabriela G.; Rasley, Amy; Blanchette, Craig D.

    2014-01-01

    Nanoparticles hold great promise for the delivery of therapeutics, yet limitations remain with regards to the use of these nanosystems for efficient long-lasting targeted delivery of therapeutics, including imparting functionality to the platform, in vivo stability, drug entrapment efficiency and toxicity. To begin to address these limitations, we evaluated the functionality, stability, cytotoxicity, toxicity, immunogenicity and in vivo biodistribution of nanolipoprotein particles (NLPs), which are mimetics of naturally occurring high-density lipoproteins (HDLs). We found that a wide range of molecules could be reliably conjugated to the NLP, including proteins, single-stranded DNA, and small molecules. The NLP was also found to be relatively stable in complex biological fluids and displayed no cytotoxicity in vitro at doses as high as 320 µg/ml. In addition, we observed that in vivo administration of the NLP daily for 14 consecutive days did not induce significant weight loss or result in lesions on excised organs. Furthermore, the NLPs did not display overt immunogenicity with respect to antibody generation. Finally, the biodistribution of the NLP in vivo was found to be highly dependent on the route of administration, where intranasal administration resulted in prolonged retention in the lung tissue. Although only a select number of NLP compositions were evaluated, the findings of this study suggest that the NLP platform holds promise for use as both a targeted and non-targeted in vivo delivery vehicle for a range of therapeutics. PMID:24675794

  2. Stefanyshyn-Piper works with NLP-Vaccine-2 on MDDK

    NASA Image and Video Library

    2008-11-19

    S126-E-008302 (19 Nov. 2008) --- Astronaut Heidemarie M. Stefanyshyn-Piper, STS-126 mission specialist, works with the Microbe Group Activation Pack containing eight Fluid Processing Apparatuses on the middeck of Space Shuttle Endeavour while docked with the International Space Station.

  3. Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

    PubMed

    Liao, Katherine P; Ananthakrishnan, Ashwin N; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S; Goryachev, Sergey; Chen, Pei; Savova, Guergana K; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N; Plenge, Robert M; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y; Karlson, Elizabeth W; Cai, Tianxi

    2015-01-01

    Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.

  4. Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

    PubMed Central

    Liao, Katherine P.; Ananthakrishnan, Ashwin N.; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana K.; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N.; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y.; Karlson, Elizabeth W.; Cai, Tianxi

    2015-01-01

    Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. PMID:26301417

  5. The Old Brain, the New Mirror: Matching Teaching and Learning Styles in Foreign Language Class (Based on Neuro-Linguistic Programming).

    ERIC Educational Resources Information Center

    Knowles, John K.

    The process of matching teaching materials and methods to the student's learning style and ability level in foreign language classes is explored. The Neuro-Linguistic Programming (NLP) model offers a diagnostic process for the identification of style. This process can be applied to the language learning setting as a way of presenting material to…

  6. Neuro Linguistic Programming for Counselors.

    ERIC Educational Resources Information Center

    Harman, Robert L.; O'Neill, Charles

    1981-01-01

    Describes contributions of Neuro Linguistic Programming (NLP) to counseling practice. The Meta-Model, representational systems, anchoring, and reframing are described. Counselors interested in learning NLP can integrate many valuable new ways of communicating with clients and changing client behaviors. (Author)

  7. An effective convolutional neural network model for Chinese sentiment analysis

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Chen, Mengdong; Liu, Lianzhong; Wang, Yadong

    2017-06-01

    Nowadays microblog is getting more and more popular. People are increasingly accustomed to expressing their opinions on Twitter, Facebook and Sina Weibo. Sentiment analysis of microblog has received significant attention, both in academia and in industry. So far, Chinese microblog exploration still needs lots of further work. In recent years CNN has also been used to deal with NLP tasks, and already achieved good results. However, these methods ignore the effective use of a large number of existing sentimental resources. For this purpose, we propose a Lexicon-based Sentiment Convolutional Neural Networks (LSCNN) model focus on Weibo's sentiment analysis, which combines two CNNs, trained individually base on sentiment features and word embedding, at the fully connected hidden layer. The experimental results show that our model outperforms the CNN model only with word embedding features on microblog sentiment analysis task.

  8. DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

    PubMed Central

    Mehrabi, Saeed; Krishnan, Anand; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew

    2018-01-01

    In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. PMID:25791500

  9. Structural centrosome aberrations sensitize polarized epithelia to basal cell extrusion.

    PubMed

    Ganier, Olivier; Schnerch, Dominik; Nigg, Erich A

    2018-06-01

    Centrosome aberrations disrupt tissue architecture and may confer invasive properties to cancer cells. Here we show that structural centrosome aberrations, induced by overexpression of either Ninein-like protein (NLP) or CEP131/AZI1, sensitize polarized mammalian epithelia to basal cell extrusion. While unperturbed epithelia typically dispose of damaged cells through apical dissemination into luminal cavities, certain oncogenic mutations cause a switch in directionality towards basal cell extrusion, raising the potential for metastatic cell dissemination. Here we report that NLP-induced centrosome aberrations trigger the preferential extrusion of damaged cells towards the basal surface of epithelial monolayers. This switch in directionality from apical to basal dissemination coincides with a profound reorganization of the microtubule cytoskeleton, which in turn prevents the contractile ring repositioning that is required to support extrusion towards the apical surface. While the basal extrusion of cells harbouring NLP-induced centrosome aberrations requires exogenously induced cell damage, structural centrosome aberrations induced by excess CEP131 trigger the spontaneous dissemination of dying cells towards the basal surface from MDCK cysts. Thus, similar to oncogenic mutations, structural centrosome aberrations can favour basal extrusion of damaged cells from polarized epithelia. Assuming that additional mutations may promote cell survival, this process could sensitize epithelia to disseminate potentially metastatic cells. © 2018 The Authors.

  10. AutoMap User’s Guide

    DTIC Science & Technology

    2006-10-01

    Hierarchy of Pre-Processing Techniques 3. NLP (Natural Language Processing) Utilities 3.1 Named-Entity Recognition 3.1.1 Example for Named-Entity... Recognition 3.2 Symbol RemovalN-Gram Identification: Bi-Grams 4. Stemming 4.1 Stemming Example 5. Delete List 5.1 Open a Delete List 5.1.1 Small...iterative and involves several key processes: • Named-Entity Recognition Named-Entity Recognition is an Automap feature that allows you to

  11. Ease of adoption of clinical natural language processing software: An evaluation of five systems.

    PubMed

    Zheng, Kai; Vydiswaran, V G Vinod; Liu, Yang; Wang, Yue; Stubbs, Amber; Uzuner, Özlem; Gururaj, Anupama E; Bayer, Samuel; Aberdeen, John; Rumshisky, Anna; Pakhomov, Serguei; Liu, Hongfang; Xu, Hua

    2015-12-01

    In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Clinical Named Entity Recognition Using Deep Learning Models.

    PubMed

    Wu, Yonghui; Jiang, Min; Xu, Jun; Zhi, Degui; Xu, Hua

    2017-01-01

    Clinical Named Entity Recognition (NER) is a critical natural language processing (NLP) task to extract important concepts (named entities) from clinical narratives. Researchers have extensively investigated machine learning models for clinical NER. Recently, there have been increasing efforts to apply deep learning models to improve the performance of current clinical NER systems. This study examined two popular deep learning architectures, the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), to extract concepts from clinical texts. We compared the two deep neural network architectures with three baseline Conditional Random Fields (CRFs) models and two state-of-the-art clinical NER systems using the i2b2 2010 clinical concept extraction corpus. The evaluation results showed that the RNN model trained with the word embeddings achieved a new state-of-the- art performance (a strict F1 score of 85.94%) for the defined clinical NER task, outperforming the best-reported system that used both manually defined and unsupervised learning features. This study demonstrates the advantage of using deep neural network architectures for clinical concept extraction, including distributed feature representation, automatic feature learning, and long-term dependencies capture. This is one of the first studies to compare the two widely used deep learning models and demonstrate the superior performance of the RNN model for clinical NER.

  13. Clinical Named Entity Recognition Using Deep Learning Models

    PubMed Central

    Wu, Yonghui; Jiang, Min; Xu, Jun; Zhi, Degui; Xu, Hua

    2017-01-01

    Clinical Named Entity Recognition (NER) is a critical natural language processing (NLP) task to extract important concepts (named entities) from clinical narratives. Researchers have extensively investigated machine learning models for clinical NER. Recently, there have been increasing efforts to apply deep learning models to improve the performance of current clinical NER systems. This study examined two popular deep learning architectures, the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), to extract concepts from clinical texts. We compared the two deep neural network architectures with three baseline Conditional Random Fields (CRFs) models and two state-of-the-art clinical NER systems using the i2b2 2010 clinical concept extraction corpus. The evaluation results showed that the RNN model trained with the word embeddings achieved a new state-of-the- art performance (a strict F1 score of 85.94%) for the defined clinical NER task, outperforming the best-reported system that used both manually defined and unsupervised learning features. This study demonstrates the advantage of using deep neural network architectures for clinical concept extraction, including distributed feature representation, automatic feature learning, and long-term dependencies capture. This is one of the first studies to compare the two widely used deep learning models and demonstrate the superior performance of the RNN model for clinical NER. PMID:29854252

  14. Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries.

    PubMed

    Lin, Ching-Heng; Wu, Nai-Yuan; Lai, Wei-Shao; Liou, Der-Ming

    2015-01-01

    Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.comFor numbered affiliation see end of article.

  15. Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.

    PubMed

    He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan

    2017-05-01

    To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.

  16. A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature

    PubMed Central

    2015-01-01

    Background In order to improve information access on chemical compounds and drugs (chemical entities) described in text repositories, it is very crucial to be able to identify chemical entity mentions (CEMs) automatically within text. The CHEMDNER challenge in BioCreative IV was specially designed to promote the implementation of corresponding systems that are able to detect mentions of chemical compounds and drugs, which has two subtasks: CDI (Chemical Document Indexing) and CEM. Results Our system processing pipeline consists of three major components: pre-processing (sentence detection, tokenization), recognition (CRF-based approach), and post-processing (rule-based approach and format conversion). In our post-challenge system, the cost parameter in CRF model was optimized by 10-fold cross validation with grid search, and word representations feature induced by Brown clustering method was introduced. For the CEM subtask, our official runs were ranked in top position by obtaining maximum 88.79% precision, 69.08% recall and 77.70% balanced F-measure, which were improved further to 88.43% precision, 76.48% recall and 82.02% balanced F-measure in our post-challenge system. Conclusions In our system, instead of extracting a CEM as a whole, we regarded it as a sequence labeling problem. Though our current system has much room for improvement, our system is valuable in showing that the performance in term of balanced F-measure can be improved largely by utilizing large amounts of relatively inexpensive un-annotated PubMed abstracts and optimizing the cost parameter in CRF model. From our practice and lessons, if one directly utilizes some open-source natural language processing (NLP) toolkits, such as OpenNLP, Standford CoreNLP, false positive (FP) rate may be very high. It is better to develop some additional rules to minimize the FP rate if one does not want to re-train the related models. Our CEM recognition system is available at: http://www.SciTeMiner.org/XuShuo/Demo/CEM. PMID:25810768

  17. A la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge.

    PubMed

    Cherry, Colin; Zhu, Xiaodan; Martin, Joel; de Bruijn, Berry

    2013-01-01

    An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries. The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, 'sectime'-type relationships, non-local overlap-type relationships, and non-local causal relationships. The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date. Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.

  18. Tailoring vocabularies for NLP in sub-domains: a method to detect unused word sense.

    PubMed

    Figueroa, Rosa L; Zeng-Treitler, Qing; Goryachev, Sergey; Wiechmann, Eduardo P

    2009-11-14

    We developed a method to help tailor a comprehensive vocabulary system (e.g. the UMLS) for a sub-domain (e.g. clinical reports) in support of natural language processing (NLP). The method detects unused sense in a sub-domain by comparing the relational neighborhood of a word/term in the vocabulary with the semantic neighborhood of the word/term in the sub-domain. The semantic neighborhood of the word/term in the sub-domain is determined using latent semantic analysis (LSA). We trained and tested the unused sense detection on two clinical text corpora: one contains discharge summaries and the other outpatient visit notes. We were able to detect unused senses with precision from 79% to 87%, recall from 48% to 74%, and an area under receiver operation curve (AUC) of 72% to 87%.

  19. Using NLP to identify cancer cases in imaging reports drawn from radiology information systems.

    PubMed

    Patrick, Jon; Asgari, Pooyan; Li, Min; Nguyen, Dung

    2013-01-01

    A Natural Language processing (NLP) classifier has been developed for the Victorian and NSW Cancer Registries with the purpose of automatically identifying cancer reports from imaging services, transmitting them to the Registries and then extracting pertinent cancer information. Large scale trials conducted on over 40,000 reports show the sensitivity for identifying reportable cancer reports is above 98% with a specificity above 96%. Detection of tumour stream, report purpose, and a variety of extracted content is generally above 90% specificity. The differences between report layout and authoring strategies across imaging services appear to require different classifiers to retain this high level of accuracy. Linkage of the imaging data with existing registry records (hospital and pathology reports) to derive stage and recurrence of cancer has commenced and shown very promising results.

  20. The information exchange.

    PubMed

    Hendron, Brid

    2015-02-01

    This article has been written to highlight the importance of unconscious communication in the dental environment using Neuro-Linguistic Programming (NLP) principles. A single aspect of unconscious communication is described to demonstrate the value to dental team members of studying NLP in order to improve their communication skills.

  1. Observations concerning Research Literature on Neuro-Linguistic Programming.

    ERIC Educational Resources Information Center

    Einspruch, Eric L.; Forman, Bruce D.

    1985-01-01

    Identifies six categories of design and methodological errors contained in the 39 empirical studies of neurolinguistic programming (NLP) documented through April 1984. Representative reports reflecting each category are discussed. Suggestions are offered for improving the quality of research on NLP. (Author/MCF)

  2. Neurolinguistic Programming Examined: Imagery, Sensory Mode, and Communication.

    ERIC Educational Resources Information Center

    Fromme, Donald K.; Daniell, Jennifer

    1984-01-01

    Tested Neurolinguistic Programming (NLP) assumptions by examining intercorrelations among response times of students (N=64) for extracting visual, auditory, and kinesthetic information from alphabetic images. Large positive intercorrelations were obtained, the only outcome not compatible with NLP. Good visualizers were significantly better in…

  3. Data-Informed Language Learning

    ERIC Educational Resources Information Center

    Godwin-Jones, Robert

    2017-01-01

    Although data collection has been used in language learning settings for some time, it is only in recent decades that large corpora have become available, along with efficient tools for their use. Advances in natural language processing (NLP) have enabled rich tagging and annotation of corpus data, essential for their effective use in language…

  4. Working Effectively with People: Contributions of Neurolinguistic Programming (NLP) to Visual Literacy.

    ERIC Educational Resources Information Center

    Ragan, Janet M.; Ragan, Tillman J.

    1982-01-01

    Briefly summarizes history of neurolinguistic programing, which set out to model elements and processes of effective communication and to reduce these to formulas that can be taught to others. Potential areas of inquiry for neurolinguistic programers which should be of concern to visual literacists are discussed. (MBR)

  5. Leveraging Code Comments to Improve Software Reliability

    ERIC Educational Resources Information Center

    Tan, Lin

    2009-01-01

    Commenting source code has long been a common practice in software development. This thesis, consisting of three pieces of work, made novel use of the code comments written in natural language to improve software reliability. Our solution combines Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis techniques to…

  6. Lexical Link Analysis Application: Improving Web Service to Acquisition Visibility Portal

    DTIC Science & Technology

    2013-09-30

    during the Empire Challenge 2008 and 2009 (EC08/09) field experiments and for numerous other field experiments of new technologies during Trident Warrior...Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/ VLC -2000) (pp. 63–70). Retrieved from http://nlp.stanford.edu/manning

  7. English Complex Verb Constructions: Identification and Inference

    ERIC Educational Resources Information Center

    Tu, Yuancheng

    2012-01-01

    The fundamental problem faced by automatic text understanding in Natural Language Processing (NLP) is to identify semantically related pieces of text and integrate them together to compute the meaning of the whole text. However, the principle of compositionality runs into trouble very quickly when real language is examined with its frequent…

  8. Close association between metal allergy and nail lichen planus: detection of causative metals in nail lesions.

    PubMed

    Nishizawa, A; Satoh, T; Yokozeki, H

    2013-02-01

    Lichen planus (LP) is a common skin disorder of unknown aetiology that affects the skin, mucous membranes and nails. Although metal allergies have been implicated in the development of oral LP (OLP), the contribution of these allergies to nail LP (NLP) has yet to be studied in detail. To elucidate the link between metal allergy and NLP. We retrospectively analysed 115 LP patients with respect to the contribution of metals to either NLP or OLP. We also attempted to detect the specific metals involved in these nail lesions. Of the 79 patients that received a metal patch test (PT), 24 (30%) were positive for at least one of the metal compounds tested. Notably, the prevalence of positive reactions to metals in the NLP patients was significantly higher as compared with the OLP patients (59% vs. 27%, P < 0.05). Among the 10 PT-positive patients with NLP, improvement of the skin lesions was seen in six of the patients after removal of dental materials containing causative metals or systemic disodium cromoglycate therapy. On the other hand, only 3 of 16 PT-positive patients with OLP exhibited improvement after the removal of dental materials. Causative metals in the dental fillings/braces were detected in the involved nail tissues. This study suggests that metal allergies are more closely associated with NLP vs. OLP, and that deposited metals in the nail apparatus contribute to the development of lichenoid tissue reactions in the nail bed and matrix. © 2012 The Authors. Journal of the European Academy of Dermatology and Venereology © 2012 European Academy of Dermatology and Venereology.

  9. On the formation of noise-like pulses in fiber ring cavity configurations

    NASA Astrophysics Data System (ADS)

    Jeong, Yoonchan; Vazquez-Zuniga, Luis Alonso; Lee, Seungjong; Kwon, Youngchul

    2014-12-01

    We give an overview of the current status of fiber-based noise-like pulse (NLP) research conducted over the past decade, together with presenting the newly conducted, systematic study on their temporal, spectral, and coherence characteristics in nonlinear polarization rotation (NPR)-based erbium-doped fiber ring cavity configurations. Firstly, our study includes experimental investigations on the characteristic features of NLPs both in the net anomalous dispersion regime and in the net normal dispersion regime, in comparison with coherent optical pulses that can alternatively be obtained from the same cavity configurations, i.e., with the conventional and dissipative solitons. Secondly, our study includes numerical simulations on the formation of NLPs, utilizing a simplified, scalar-field model based on the characteristic transfer function of the NPR mechanism in conjunction with the split-step Fourier algorithm, which offer a great help in exploring the interrelationship between the NLP formation and various cavity parameters, and eventually present good agreement with the experimental results. We stress that if the cavity operates with excessively high gain, i.e., higher than the levels just required for generating coherent mode-locked pulses, i.e., conventional solitons and dissipative solitons, it may trigger NLPs, depending on the characteristic transfer function of the NPR mechanism induced in the cavity. In particular, the NPR transfer function is characterized by the critical saturation power and the linear loss ratio. Finally, we also report on the applications of the fiber-based NLP sources, including supercontinuum generation in a master-oscillator power amplifier configuration seeded by a fiber-based NLP source, as one typical example. We expect that the NLP-related research area will continue to expand, and that NLP-based sources will also find more applications in the future.

  10. A natural language processing pipeline for pairing measurements uniquely across free-text CT reports.

    PubMed

    Sevenster, Merlijn; Bozeman, Jeffrey; Cowhy, Andrea; Trost, William

    2015-02-01

    To standardize and objectivize treatment response assessment in oncology, guidelines have been proposed that are driven by radiological measurements, which are typically communicated in free-text reports defying automated processing. We study through inter-annotator agreement and natural language processing (NLP) algorithm development the task of pairing measurements that quantify the same finding across consecutive radiology reports, such that each measurement is paired with at most one other ("partial uniqueness"). Ground truth is created based on 283 abdomen and 311 chest CT reports of 50 patients each. A pre-processing engine segments reports and extracts measurements. Thirteen features are developed based on volumetric similarity between measurements, semantic similarity between their respective narrative contexts and structural properties of their report positions. A Random Forest classifier (RF) integrates all features. A "mutual best match" (MBM) post-processor ensures partial uniqueness. In an end-to-end evaluation, RF has precision 0.841, recall 0.807, F-measure 0.824 and AUC 0.971; with MBM, which performs above chance level (P<0.001), it has precision 0.899, recall 0.776, F-measure 0.833 and AUC 0.935. RF (RF+MBM) has error-free performance on 52.7% (57.4%) of report pairs. Inter-annotator agreement of three domain specialists with the ground truth (κ>0.960) indicates that the task is well defined. Domain properties and inter-section differences are discussed to explain superior performance in abdomen. Enforcing partial uniqueness has mixed but minor effects on performance. A combined machine learning-filtering approach is proposed for pairing measurements, which can support prospective (supporting treatment response assessment) and retrospective purposes (data mining). Copyright © 2014 Elsevier Inc. All rights reserved.

  11. The eyes don't have it: lie detection and Neuro-Linguistic Programming.

    PubMed

    Wiseman, Richard; Watt, Caroline; ten Brinke, Leanne; Porter, Stephen; Couper, Sara-Louise; Rankin, Calum

    2012-01-01

    Proponents of Neuro-Linguistic Programming (NLP) claim that certain eye-movements are reliable indicators of lying. According to this notion, a person looking up to their right suggests a lie whereas looking up to their left is indicative of truth telling. Despite widespread belief in this claim, no previous research has examined its validity. In Study 1 the eye movements of participants who were lying or telling the truth were coded, but did not match the NLP patterning. In Study 2 one group of participants were told about the NLP eye-movement hypothesis whilst a second control group were not. Both groups then undertook a lie detection test. No significant differences emerged between the two groups. Study 3 involved coding the eye movements of both liars and truth tellers taking part in high profile press conferences. Once again, no significant differences were discovered. Taken together the results of the three studies fail to support the claims of NLP. The theoretical and practical implications of these findings are discussed.

  12. The Eyes Don’t Have It: Lie Detection and Neuro-Linguistic Programming

    PubMed Central

    Wiseman, Richard; Watt, Caroline; ten Brinke, Leanne; Porter, Stephen; Couper, Sara-Louise; Rankin, Calum

    2012-01-01

    Proponents of Neuro-Linguistic Programming (NLP) claim that certain eye-movements are reliable indicators of lying. According to this notion, a person looking up to their right suggests a lie whereas looking up to their left is indicative of truth telling. Despite widespread belief in this claim, no previous research has examined its validity. In Study 1 the eye movements of participants who were lying or telling the truth were coded, but did not match the NLP patterning. In Study 2 one group of participants were told about the NLP eye-movement hypothesis whilst a second control group were not. Both groups then undertook a lie detection test. No significant differences emerged between the two groups. Study 3 involved coding the eye movements of both liars and truth tellers taking part in high profile press conferences. Once again, no significant differences were discovered. Taken together the results of the three studies fail to support the claims of NLP. The theoretical and practical implications of these findings are discussed. PMID:22808128

  13. NLP-12 engages different UNC-13 proteins to potentiate tonic and evoked release.

    PubMed

    Hu, Zhitao; Vashlishan-Murray, Amy B; Kaplan, Joshua M

    2015-01-21

    A neuropeptide (NLP-12) and its receptor (CKR-2) potentiate tonic and evoked ACh release at Caenorhabditis elegans neuromuscular junctions. Increased evoked release is mediated by a presynaptic pathway (egl-30 Gαq and egl-8 PLCβ) that produces DAG, and by DAG binding to short and long UNC-13 proteins. Potentiation of tonic ACh release persists in mutants deficient for egl-30 Gαq and egl-8 PLCβ and requires DAG binding to UNC-13L (but not UNC-13S). Thus, NLP-12 adjusts tonic and evoked release by distinct mechanisms. Copyright © 2015 the authors 0270-6474/15/351038-05$15.00/0.

  14. WHU at TREC KBA Vital Filtering Track 2014

    DTIC Science & Technology

    2014-11-01

    view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information. Various kinds of features are leveraged to...profile of an entity. Our approach is to view the problem as a classification problem and use Stanford NLP Toolkit to extract necessary information

  15. Neuro-Linguistic Programming: A Discussion of Why and How.

    ERIC Educational Resources Information Center

    Partridge, Susan

    Intended for teachers, this article offers a definition of neuro-linguistic programming (NLP), discusses its relevance to instruction, and provides illustrations of the implementation of neuro-linguistic programming in instructional contexts. NLP is defined as an approach to instruction that recognizes the familiar visual, auditory, and…

  16. A missense mutation in the agouti signaling protein gene (ASIP) is associated with the no light points coat phenotype in donkeys.

    PubMed

    Abitbol, Marie; Legrand, Romain; Tiret, Laurent

    2015-04-08

    Seven donkey breeds are recognized by the French studbook and are characterized by a black, bay or grey coat colour including light cream-to-white points (LP). Occasionally, Normand bay donkeys give birth to dark foals that lack LP and display the no light points (NLP) pattern. This pattern is more frequent and officially recognized in American miniature donkeys. The LP (or pangare) phenotype resembles that of the light bellied agouti pattern in mouse, while the NLP pattern resembles that of the mammalian recessive black phenotype; both phenotypes are associated with the agouti signaling protein gene (ASIP). We used a panel of 127 donkeys to identify a recessive missense c.349 T > C variant in ASIP that was shown to be in complete association with the NLP phenotype. This variant results in a cysteine to arginine substitution at position 117 in the ASIP protein. This cysteine is highly-conserved among vertebrate ASIP proteins and was previously shown by mutagenesis experiments to lie within a functional site. Altogether, our results strongly support that the identified mutation is causative of the NLP phenotype. Thus, we propose to name the c.[349 T > C] allele in donkeys, the a(nlp) allele, which enlarges the panel of coat colour alleles in donkeys and ASIP recessive loss-of-function alleles in animals.

  17. Acquisition and evaluation of verb subcategorization resources for biomedicine.

    PubMed

    Rimell, Laura; Lippincott, Thomas; Verspoor, Karin; Johnson, Helen L; Korhonen, Anna

    2013-04-01

    Biomedical natural language processing (NLP) applications that have access to detailed resources about the linguistic characteristics of biomedical language demonstrate improved performance on tasks such as relation extraction and syntactic or semantic parsing. Such applications are important for transforming the growing unstructured information buried in the biomedical literature into structured, actionable information. In this paper, we address the creation of linguistic resources that capture how individual biomedical verbs behave. We specifically consider verb subcategorization, or the tendency of verbs to "select" co-occurrence with particular phrase types, which influences the interpretation of verbs and identification of verbal arguments in context. There are currently a limited number of biomedical resources containing information about subcategorization frames (SCFs), and these are the result of either labor-intensive manual collation, or automatic methods that use tools adapted to a single biomedical subdomain. Either method may result in resources that lack coverage. Moreover, the quality of existing verb SCF resources for biomedicine is unknown, due to a lack of available gold standards for evaluation. This paper presents three new resources related to verb subcategorization frames in biomedicine, and four experiments making use of the new resources. We present the first biomedical SCF gold standards, capturing two different but widely-used definitions of subcategorization, and a new SCF lexicon, BioCat, covering a large number of biomedical sub-domains. We evaluate the SCF acquisition methodologies for BioCat with respect to the gold standards, and compare the results with the accuracy of the only previously existing automatically-acquired SCF lexicon for biomedicine, the BioLexicon. Our results show that the BioLexicon has greater precision while BioCat has better coverage of SCFs. Finally, we explore the definition of subcategorization using these resources and its implications for biomedical NLP. All resources are made publicly available. The SCF resources we have evaluated still show considerably lower accuracy than that reported with general English lexicons, demonstrating the need for domain- and subdomain-specific SCF acquisition tools for biomedicine. Our new gold standards reveal major differences when annotators use the different definitions. Moreover, evaluation of BioCat yields major differences in accuracy depending on the gold standard, demonstrating that the definition of subcategorization adopted will have a direct impact on perceived system accuracy for specific tasks. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. 49 CFR 563.8 - Data format.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... the first acceleration data point; (3) The number of the last point (NLP), which is an integer that...; and (4) NLP—NFP + 1 acceleration values sequentially beginning with the acceleration at time NFP * TS and continue sampling the acceleration at TS increments in time until the time NLP * TS is reached...

  19. 49 CFR 563.8 - Data format

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... number of the last point (NLP), which is an integer that when multiplied by the TS equals the time relative to time zero of the last acceleration data point; and (4) NLP—NFP + 1 acceleration values... increments in time until the time NLP * TS is reached. [73 FR 2183, Jan. 14, 2008] ...

  20. SUBTLE: Situation Understanding Bot through Language and Environment

    DTIC Science & Technology

    2016-01-06

    a 4 day “hackathon” by Stuart Young’s small robots group which successfully ported the SUBTLE MURI NLP robot interface to the Packbot platform they...null element restoration, a step typically ig- nored in NLP systems, allows for correct parsing of im- peratives and questions, critical structures

  1. Instructor-Aided Asynchronous Question Answering System for Online Education and Distance Learning

    ERIC Educational Resources Information Center

    Wen, Dunwei; Cuzzola, John; Brown, Lorna; Kinshuk

    2012-01-01

    Question answering systems have frequently been explored for educational use. However, their value was somewhat limited due to the quality of the answers returned to the student. Recent question answering (QA) research has started to incorporate deep natural language processing (NLP) in order to improve these answers. However, current NLP…

  2. Automatic Selection of Suitable Sentences for Language Learning Exercises

    ERIC Educational Resources Information Center

    Pilán, Ildikó; Volodina, Elena; Johansson, Richard

    2013-01-01

    In our study we investigated second and foreign language (L2) sentence readability, an area little explored so far in the case of several languages, including Swedish. The outcome of our research consists of two methods for sentence selection from native language corpora based on Natural Language Processing (NLP) and machine learning (ML)…

  3. Enhancing Grammatical Structures in Web-Based Texts

    ERIC Educational Resources Information Center

    Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick

    2017-01-01

    Presentation of raw text to language learners is not enough to ensure learning. Thus, we present the Smart and Immersive Language Learning Environment (SMILLE), a system that uses Natural Language Processing (NLP) for enhancing grammatical information in texts chosen by a given user. The enhancements, carried out by means of text highlighting, are…

  4. Human-Level Natural Language Understanding: False Progress and Real Challenges

    ERIC Educational Resources Information Center

    Bignoli, Perrin G.

    2013-01-01

    The field of Natural Language Processing (NLP) focuses on the study of how utterances composed of human-level languages can be understood and generated. Typically, there are considered to be three intertwined levels of structure that interact to create meaning in language: syntax, semantics, and pragmatics. Not only is a large amount of…

  5. An Intelligent Computer Assisted Language Learning System for Arabic Learners

    ERIC Educational Resources Information Center

    Shaalan, Khaled F.

    2005-01-01

    This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning…

  6. Natural language processing and advanced information management

    NASA Technical Reports Server (NTRS)

    Hoard, James E.

    1989-01-01

    Integrating diverse information sources and application software in a principled and general manner will require a very capable advanced information management (AIM) system. In particular, such a system will need a comprehensive addressing scheme to locate the material in its docuverse. It will also need a natural language processing (NLP) system of great sophistication. It seems that the NLP system must serve three functions. First, it provides an natural language interface (NLI) for the users. Second, it serves as the core component that understands and makes use of the real-world interpretations (RWIs) contained in the docuverse. Third, it enables the reasoning specialists (RSs) to arrive at conclusions that can be transformed into procedures that will satisfy the users' requests. The best candidate for an intelligent agent that can satisfactorily make use of RSs and transform documents (TDs) appears to be an object oriented data base (OODB). OODBs have, apparently, an inherent capacity to use the large numbers of RSs and TDs that will be required by an AIM system and an inherent capacity to use them in an effective way.

  7. Feasibility and Utility of Lexical Analysis for Occupational Health Text.

    PubMed

    Harber, Philip; Leroy, Gondy

    2017-06-01

    Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data. Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure-health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed. The methods efficiently demonstrated common exposures, health effects, and exposure-injury relationships. Many workplace terms are not present in UMLS or map inaccurately. Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.

  8. Nonlinear-programming mathematical modeling of coal blending for power plant

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang Longhua; Zhou Junhu; Yao Qiang

    At present most of the blending works are guided by experience or linear-programming (LP) which can not reflect the coal complicated characteristics properly. Experimental and theoretical research work shows that most of the coal blend properties can not always be measured as a linear function of the properties of the individual coals in the blend. The authors introduced nonlinear functions or processes (including neural network and fuzzy mathematics), established on the experiments directed by the authors and other researchers, to quantitatively describe the complex coal blend parameters. Finally nonlinear-programming (NLP) mathematical modeling of coal blend is introduced and utilized inmore » the Hangzhou Coal Blending Center. Predictions based on the new method resulted in different results from the ones based on LP modeling. The authors concludes that it is very important to introduce NLP modeling, instead of NL modeling, into the work of coal blending.« less

  9. Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

    PubMed Central

    Cai, Tianxi; Karlson, Elizabeth W.

    2013-01-01

    Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955

  10. Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

    PubMed

    Santos, Carlos; Eggle, Daniela; States, David J

    2005-04-15

    Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases. The pipeline software components are freely available on request to the authors. dstates@umich.edu http://stateslab.bioinformatics.med.umich.edu/software.html.

  11. Speaker Recognition Through NLP and CWT Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

  12. Speaker recognition through NLP and CWT modeling.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.

    The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

  13. Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P; Amrhein, Timothy J; Befera, Nicholas T; Lungren, Matthew P

    2017-04-01

    The purpose of this study is to evaluate the performance of a natural language processing (NLP) system in classifying a database of free-text knee MRI reports at two separate academic radiology practices. An NLP system that uses terms and patterns in manually classified narrative knee MRI reports was constructed. The NLP system was trained and tested on expert-classified knee MRI reports from two major health care organizations. Radiology reports were modeled in the training set as vectors, and a support vector machine framework was used to train the classifier. A separate test set from each organization was used to evaluate the performance of the system. We evaluated the performance of the system both within and across organizations. Standard evaluation metrics, such as accuracy, precision, recall, and F1 score (i.e., the weighted average of the precision and recall), and their respective 95% CIs were used to measure the efficacy of our classification system. The accuracy for radiology reports that belonged to the model's clinically significant concept classes after training data from the same institution was good, yielding an F1 score greater than 90% (95% CI, 84.6-97.3%). Performance of the classifier on cross-institutional application without institution-specific training data yielded F1 scores of 77.6% (95% CI, 69.5-85.7%) and 90.2% (95% CI, 84.5-95.9%) at the two organizations studied. The results show excellent accuracy by the NLP machine learning classifier in classifying free-text knee MRI reports, supporting the institution-independent reproducibility of knee MRI report classification. Furthermore, the machine learning classifier performed well on free-text knee MRI reports from another institution. These data support the feasibility of multiinstitutional classification of radiologic imaging text reports with a single machine learning classifier without requiring institution-specific training data.

  14. Extracting semantically enriched events from biomedical literature.

    PubMed

    Miwa, Makoto; Thompson, Paul; McNaught, John; Kell, Douglas B; Ananiadou, Sophia

    2012-05-23

    Research into event-based text mining from the biomedical literature has been growing in popularity to facilitate the development of advanced biomedical text mining systems. Such technology permits advanced search, which goes beyond document or sentence-based retrieval. However, existing event-based systems typically ignore additional information within the textual context of events that can determine, amongst other things, whether an event represents a fact, hypothesis, experimental result or analysis of results, whether it describes new or previously reported knowledge, and whether it is speculated or negated. We refer to such contextual information as meta-knowledge. The automatic recognition of such information can permit the training of systems allowing finer-grained searching of events according to the meta-knowledge that is associated with them. Based on a corpus of 1,000 MEDLINE abstracts, fully manually annotated with both events and associated meta-knowledge, we have constructed a machine learning-based system that automatically assigns meta-knowledge information to events. This system has been integrated into EventMine, a state-of-the-art event extraction system, in order to create a more advanced system (EventMine-MK) that not only extracts events from text automatically, but also assigns five different types of meta-knowledge to these events. The meta-knowledge assignment module of EventMine-MK performs with macro-averaged F-scores in the range of 57-87% on the BioNLP'09 Shared Task corpus. EventMine-MK has been evaluated on the BioNLP'09 Shared Task subtask of detecting negated and speculated events. Our results show that EventMine-MK can outperform other state-of-the-art systems that participated in this task. We have constructed the first practical system that extracts both events and associated, detailed meta-knowledge information from biomedical literature. The automatically assigned meta-knowledge information can be used to refine search systems, in order to provide an extra search layer beyond entities and assertions, dealing with phenomena such as rhetorical intent, speculations, contradictions and negations. This finer grained search functionality can assist in several important tasks, e.g., database curation (by locating new experimental knowledge) and pathway enrichment (by providing information for inference). To allow easy integration into text mining systems, EventMine-MK is provided as a UIMA component that can be used in the interoperable text mining infrastructure, U-Compare.

  15. Neuro-linguistic programming and application in treatment of phobias.

    PubMed

    Karunaratne, Mahishika

    2010-11-01

    Phobias are a prevalent and often debilitating mental health problem all over the world. This article aims to explore what is known about the use of Neuro-linguistic Programming (NLP) as a treatment for this condition. Whilst there is abundant experiential evidence from NLP practitioners attesting to the efficacy of this method as a treatment for phobias, experimental research in this area is somewhat limited. This paper reviews evidence available in literature produced in the UK and US and reveals that NLP is a successful treatment for phobias as well as being particularly efficient due to the relatively brief time period it takes to effect an improvement. Copyright © 2010 Elsevier Ltd. All rights reserved.

  16. DE and NLP Based QPLS Algorithm

    NASA Astrophysics Data System (ADS)

    Yu, Xiaodong; Huang, Dexian; Wang, Xiong; Liu, Bo

    As a novel evolutionary computing technique, Differential Evolution (DE) has been considered to be an effective optimization method for complex optimization problems, and achieved many successful applications in engineering. In this paper, a new algorithm of Quadratic Partial Least Squares (QPLS) based on Nonlinear Programming (NLP) is presented. And DE is used to solve the NLP so as to calculate the optimal input weights and the parameters of inner relationship. The simulation results based on the soft measurement of diesel oil solidifying point on a real crude distillation unit demonstrate that the superiority of the proposed algorithm to linear PLS and QPLS which is based on Sequential Quadratic Programming (SQP) in terms of fitting accuracy and computational costs.

  17. Neuro-Linguistic Programming: Developing Effective Communication in the Classroom.

    ERIC Educational Resources Information Center

    Torres, Cresencio; Katz, Judy H.

    Neuro-Linguistic Programming (NLP) is a method that teachers can use to increase their communication effectiveness by matching their communication patterns with those of their students. The basic premise of NLP is that people operate and make sense of their experience through information received from the world around them. This information is…

  18. Research Findings on Neurolinguistic Programming: Nonsupportive Data or an Untestable Theory?

    ERIC Educational Resources Information Center

    Sharpley, Christopher F.

    1987-01-01

    Examines the experimental literature on neurolinguistic programming (NLP). Sharpley (l984) and Einspruch and Forman (l985) concluded that the effectiveness of this therapy was yet to be demonstrated. Presents data from seven recent studies that further question the basic tenets of NLP and their application in counseling situations. (Author/KS)

  19. Neuro-Linguistic Programming as an Innovation in Education and Teaching

    ERIC Educational Resources Information Center

    Tosey, Paul; Mathison, Jane

    2010-01-01

    Neuro-linguistic programming (NLP)--an emergent, contested approach to communication and personal development created in the 1970s--has become increasingly familiar in education and teaching. There is little academic work on NLP to date. This article offers an informed introduction to, and appraisal of, the field for educators. We review the…

  20. 49 CFR 563.8 - Data format.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... point (NLP), which is an integer that when multiplied by the TS equals the time relative to time zero of the last acceleration data point; and (4) NLP—NFP + 1 acceleration values sequentially beginning with... until the time NLP * TS is reached. [73 FR 2183, Jan. 14, 2008] § 563.8, Nt. Effective Date Note: At 76...

  1. Noise-like pulse generation in an ytterbium-doped fiber laser using tungsten disulphide

    NASA Astrophysics Data System (ADS)

    Zhang, Wenping; Song, Yanrong; Guoyu, Heyang; Xu, Runqin; Dong, Zikai; Li, Kexuan; Tian, Jinrong; Gong, Shuang

    2017-12-01

    We demonstrated the noise-like pulse (NLP) generation in an ytterbium-doped fiber (YDF) laser with tungsten disulphide (WS2). Stable fundamental mode locking and second-order harmonic mode locking were observed. The saturable absorber (SA) was a WS2-polyvinyl alcohol film. The modulation depth of the WS2 film was 2.4%, and the saturable optical intensity was 155 MW cm-2. Based on this SA, the fundamental NLP with a pulse width of 20 ns and repetition rate of 7 MHz were observed. The autocorrelation trace of output pulses had a coherent spike, which came from NLP. The average pulse width of the spike was 550 fs on the top of a broad pedestal. The second-order harmonic NLP had a spectral bandwidth of 1.3 nm and pulse width of 10 ns. With the pump power of 400 mW, the maximum output power was 22.2 mW. To the best of our knowledge, this is the first time a noise-like mode locking in an YDF laser based on WS2-SA in an all normal dispersion regime was obtained.

  2. The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

    ERIC Educational Resources Information Center

    Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

    2012-01-01

    Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…

  3. Adaptive Reading and Writing Instruction in iSTART and W-Pal

    ERIC Educational Resources Information Center

    Johnson, Amy M.; McCarthy, Kathryn S.; Kopp, Kristopher J.; Perret, Cecile A.; McNamara, Danielle S.

    2017-01-01

    Intelligent tutoring systems for ill-defined domains, such as reading and writing, are critically needed, yet uncommon. Two such systems, the Interactive Strategy Training for Active Reading and Thinking (iSTART) and Writing Pal (W-Pal) use natural language processing (NLP) to assess learners' written (i.e., typed) responses and provide immediate,…

  4. The Impact of Anonymization for Automated Essay Scoring

    ERIC Educational Resources Information Center

    Shermis, Mark D.; Lottridge, Sue; Mayfield, Elijah

    2015-01-01

    This study investigated the impact of anonymizing text on predicted scores made by two kinds of automated scoring engines: one that incorporates elements of natural language processing (NLP) and one that does not. Eight data sets (N = 22,029) were used to form both training and test sets in which the scoring engines had access to both text and…

  5. Entity Relation Detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations

    ERIC Educational Resources Information Center

    Li, Dingcheng

    2011-01-01

    Coreference resolution (CR) and entity relation detection (ERD) aim at finding predefined relations between pairs of entities in text. CR focuses on resolving identity relations while ERD focuses on detecting non-identity relations. Both CR and ERD are important as they can potentially improve other natural language processing (NLP) related tasks…

  6. Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

    PubMed

    Song, Min; Yu, Hwanjo; Han, Wook-Shin

    2011-11-24

    Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.

  7. Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

    PubMed

    Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua

    2015-01-01

    Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.

  8. A Sibling-Mediated Intervention for Children with Autism Spectrum Disorder: Using the Natural Language Paradigm (NLP)

    ERIC Educational Resources Information Center

    Spector, Vicki; Charlop, Marjorie H.

    2018-01-01

    We taught three typically developing siblings to occasion speech by implementing the Natural Language Paradigm (NLP) with their brothers with autism spectrum disorder (ASD). A non-concurrent multiple baseline design across children with ASD and sibling dyads was used. Ancillary behaviors of happiness, play, and joint attention for the children…

  9. Applications of NLP Techniques to Computer-Assisted Authoring of Test Items for Elementary Chinese

    ERIC Educational Resources Information Center

    Liu, Chao-Lin; Lin, Jen-Hsiang; Wang, Yu-Chun

    2010-01-01

    The authors report an implemented environment for computer-assisted authoring of test items and provide a brief discussion about the applications of NLP techniques for computer assisted language learning. Test items can serve as a tool for language learners to examine their competence in the target language. The authors apply techniques for…

  10. 49 CFR 563.8 - Data format.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... number of the last point (NLP), which is an integer that when multiplied by the TS equals the time relative to time zero of the last acceleration data point; and (4) NLP—NFP + 1 acceleration values... increments in time until the time NLP * TS is reached. [73 FR 2183, Jan. 14, 2008, as amended at 76 FR 47488...

  11. Parent-Implemented Natural Language Paradigm to Increase Language and Play in Children with Autism

    ERIC Educational Resources Information Center

    Gillett, Jill N.; LeBlanc, Linda A.

    2007-01-01

    Three parents of children with autism were taught to implement the Natural Language Paradigm (NLP). Data were collected on parent implementation, multiple measures of child language, and play. The parents were able to learn to implement the NLP procedures quickly and accurately with beneficial results for their children. Increases in the overall…

  12. Applying "What Works" in Psychology to Enhancing Examination Success in Schools: The Potential Contribution of NLP

    ERIC Educational Resources Information Center

    Kudliskis, Voldis; Burden, Robert

    2009-01-01

    The strengths and weaknesses of Neuro-Linguistic Programming (NLP) are described with reference to its origins, previous research and comments from critics and supporters. A case is made for this allegedly theoretical approach to provide the kind of outcomes focused intervention that psychology and psychologists can offer to schools. In…

  13. Research on trust-region algorithms for nonlinear programming. Final technical report, 1 January 1990--31 December 1992

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dennis, J.E. Jr.; Tapia, R.A.

    Goal of the research was to develop and test effective, robust algorithms for general nonlinear programming (NLP) problems, particularly large or otherwise expensive NLP problems. We discuss the research conducted over the 3-year period Jan. 1990-Dec. 1992. We also describe current and future directions of our research.

  14. 49 CFR 563.8 - Data format.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... number of the last point (NLP), which is an integer that when multiplied by the TS equals the time relative to time zero of the last acceleration data point; and (4) NLP—NFP + 1 acceleration values... increments in time until the time NLP * TS is reached. [73 FR 2183, Jan. 14, 2008, as amended at 76 FR 47488...

  15. A hybrid nonlinear programming method for design optimization

    NASA Technical Reports Server (NTRS)

    Rajan, S. D.

    1986-01-01

    Solutions to engineering design problems formulated as nonlinear programming (NLP) problems usually require the use of more than one optimization technique. Moreover, the interaction between the user (analysis/synthesis) program and the NLP system can lead to interface, scaling, or convergence problems. An NLP solution system is presented that seeks to solve these problems by providing a programming system to ease the user-system interface. A simple set of rules is used to select an optimization technique or to switch from one technique to another in an attempt to detect, diagnose, and solve some potential problems. Numerical examples involving finite element based optimal design of space trusses and rotor bearing systems are used to illustrate the applicability of the proposed methodology.

  16. Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.

    PubMed

    Raja, Kalpana; Natarajan, Jeyakumar

    2018-07-01

    Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Towards a semantic lexicon for biological language processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verspoor, K.

    It is well understood that natural language processing (NLP) applications require sophisticated lexical resources to support their processing goals. In the biomedical domain, we are privileged to have access to extensive terminological resources in the form of controlled vocabularies and ontologies, which have been integrated into the framework of the National Library of Medicine's Unified Medical Language System's (UMLS) Metathesaurus. However, the existence of such terminological resources does not guarantee their utility for NLP. In particular, we have two core requirements for lexical resources for NLP in addition to the basic enumeration of important domain terms: representation of morphosyntactic informationmore » about those terms, specifically part of speech information and inflectional patterns to support parsing and lemma assignment, and representation of semantic information indicating general categorical information about terms, and significant relations between terms to support text understanding and inference (Hahn et at, 1999). Biomedical vocabularies by and large commonly leave out morphosyntactic information, and where they address semantic considerations, they often do so in an unprincipled manner, for instance by indicating a relation between two concepts without indicating the type of that relation. But all is not lost. The UMLS knowledge sources include two additional resources which are relevant - the SPECIALIST lexicon, a lexicon addressing our morphosyntactic requirements, and the Semantic Network, a representation of core conceptual categories in the biomedical domain. The coverage of these two knowledge sources with respect to the full coverage of the Metathesaurus is, however, not entirely clear. Furthermore, when our goals are specifically to process biological text - and often more specifically, text in the molecular biology domain - it is difficult to say whether the coverage of these resources is meaningful. The utility of the UMLS knowledge sources for medical language processing (MLP) has been explored (Johnson, 1999; Friedman et al 2001); the time has now come to repeat these experiments with respect to biological language processing (BLP). To that end, this paper presents an analysis of ihe UMLS resources, specifically with an eye towards constructing lexical resources suitable for BLP. We follow the paradigm presented in Johnson (1999) for medical language, exploring overlap between the UMLS Metathesaurus and SPECIALIST lexicon to construct a morphosyntactic and semantically-specified lexicon, and then further explore the overlap with a relevant domain corpus for molecular biology.« less

  18. Efficacy of neurolinguistic programming training on mental health in nursing and midwifery students.

    PubMed

    Sahebalzamani, Mohammad

    2014-09-01

    Neurolinguistic programming (NLP) refers to the science and art of reaching success and perfection. It is a collection of the skills based on human beings' psychological characteristics through which the individuals obtain the ability to use their personal capabilities as much as possible. This study aimed to investigate the efficacy of NLP training on mental health in nursing and midwifery students in Islamic Azad University Tehran Medical Sciences branch. In this quasi-experimental study, the study population comprised all nursing and midwifery students in Islamic Azad University, Tehran Medical branch, of whom 52 were selected and assigned to two groups through random sampling. Data collection tool was Goldberg General Health Questionnaire (28-item version). After primary evaluation, NLP training was given in five 120-min sessions and the groups were re-evaluated. The obtained data were analyzed. In the nursing group, paired t-test showed a significant difference in the scores of mental health (with 39 points decrease), physical signs (with 7.96 scores decrease), anxiety (with 10.75 scores decrease), social function (with 7.05 scores decrease) and depression (with 9.38 scores decrease). In the midwifery group, it showed a significant difference in mental health (with 22.63 scores decrease), physical signs (with 6.54 scores decrease), anxiety (with nine scores decrease), and depression (with 8.38 scores decrease). This study showed that NLP strategies are effective in the improvement of general health and its various dimensions. Therefore, it is essential to conduct structured and executive programs concerning NLP among the students.

  19. The effect of neuro-linguistic programming on occupational stress in critical care nurses

    PubMed Central

    HemmatiMaslakpak, Masumeh; Farhadi, Masumeh; Fereidoni, Javid

    2016-01-01

    Background: The use of coping strategies in reducing the adverse effects of stress can be helpful. Nero-linguistic programming (NLP) is one of the modern methods of psychotherapy. This study aimed to determine the effect of NLP on occupational stress in nurses working in critical care units of Urmia. Materials and Methods: This study was carried out quasi-experimentally (before–after) with control and experimental groups. Of all the nurses working in the critical care units of Urmia Imam Khomeini and Motahari educational/therapeutic centers, 60 people participated in this survey. Eighteen sessions of intervention were done, each for 180 min. The experimental group received NLP program (such as goal setting, time management, assertiveness skills, representational system, and neurological levels, as well as some practical and useful NLP techniques). Expanding Nursing Stress Scale (ENSS) was used as the data gathering tool. Data were analyzed using SPSS version 16. Descriptive statistics and Chi-square test, Mann–Whitney test, and independent t-test were used to analyze the data. Results: The baseline score average of job stress was 120.88 and 121.36 for the intervention and control groups, respectively (P = 0.65). After intervention, the score average of job stress decreased to 64.53 in the experimental group while that of control group remained relatively unchanged (120.96). Mann–Whitney test results showed that stress scores between the two groups was statistically significant (P = 0.0001). Conclusions: The results showed that the use of NLP can increase coping with stressful situations, and it can reduce the adverse effects of occupational stress. PMID:26985221

  20. Efficacy of neurolinguistic programming training on mental health in nursing and midwifery students

    PubMed Central

    Sahebalzamani, Mohammad

    2014-01-01

    Background: Neurolinguistic programming (NLP) refers to the science and art of reaching success and perfection. It is a collection of the skills based on human beings’ psychological characteristics through which the individuals obtain the ability to use their personal capabilities as much as possible. This study aimed to investigate the efficacy of NLP training on mental health in nursing and midwifery students in Islamic Azad University Tehran Medical Sciences branch. Materials and Methods: In this quasi-experimental study, the study population comprised all nursing and midwifery students in Islamic Azad University, Tehran Medical branch, of whom 52 were selected and assigned to two groups through random sampling. Data collection tool was Goldberg General Health Questionnaire (28-item version). After primary evaluation, NLP training was given in five 120-min sessions and the groups were re-evaluated. The obtained data were analyzed. Results: In the nursing group, paired t-test showed a significant difference in the scores of mental health (with 39 points decrease), physical signs (with 7.96 scores decrease), anxiety (with 10.75 scores decrease), social function (with 7.05 scores decrease) and depression (with 9.38 scores decrease). In the midwifery group, it showed a significant difference in mental health (with 22.63 scores decrease), physical signs (with 6.54 scores decrease), anxiety (with nine scores decrease), and depression (with 8.38 scores decrease). Conclusions: This study showed that NLP strategies are effective in the improvement of general health and its various dimensions. Therefore, it is essential to conduct structured and executive programs concerning NLP among the students. PMID:25400679

  1. Basic quantitative assessment of visual performance in patients with very low vision.

    PubMed

    Bach, Michael; Wilke, Michaela; Wilhelm, Barbara; Zrenner, Eberhart; Wilke, Robert

    2010-02-01

    A variety of approaches to developing visual prostheses are being pursued: subretinal, epiretinal, via the optic nerve, or via the visual cortex. This report presents a method of comparing their efficacy at genuinely improving visual function, starting at no light perception (NLP). A test battery (a computer program, Basic Assessment of Light and Motion [BaLM]) was developed in four basic visual dimensions: (1) light perception (light/no light), with an unstructured large-field stimulus; (2) temporal resolution, with single versus double flash discrimination; (3) localization of light, where a wedge extends from the center into four possible directions; and (4) motion, with a coarse pattern moving in one of four directions. Two- or four-alternative, forced-choice paradigms were used. The participants' responses were self-paced and delivered with a keypad. The feasibility of the BaLM was tested in 73 eyes of 51 patients with low vision. The light and time test modules discriminated between NLP and light perception (LP). The localization and motion modules showed no significant response for NLP but discriminated between LP and hand movement (HM). All four modules reached their ceilings in the acuity categories higher than HM. BaLM results systematically differed between the very-low-acuity categories NLP, LP, and HM. Light and time yielded similar results, as did localization and motion; still, for assessing the visual prostheses with differing temporal characteristics, they are not redundant. The results suggest that this simple test battery provides a quantitative assessment of visual function in the very-low-vision range from NLP to HM.

  2. Teaching Assistants, Neuro-Linguistic Programming (NLP) and Special Educational Needs: "Reframing" the Learning Experience for Students with Mild SEN

    ERIC Educational Resources Information Center

    Kudliskis, Voldis

    2014-01-01

    This study examines how an understanding of two NLP concepts, the meta-model of language and the implementation of reframing, could be used to help teaching assistants enhance class-based interactions with students with mild SEN. Participants (students) completed a pre-intervention and a post-intervention "Beliefs About my Learning…

  3. Application of Sequential Quadratic Programming to Minimize Smart Active Flap Rotor Hub Loads

    NASA Technical Reports Server (NTRS)

    Kottapalli, Sesi; Leyland, Jane

    2014-01-01

    In an analytical study, SMART active flap rotor hub loads have been minimized using nonlinear programming constrained optimization methodology. The recently developed NLPQLP system (Schittkowski, 2010) that employs Sequential Quadratic Programming (SQP) as its core algorithm was embedded into a driver code (NLP10x10) specifically designed to minimize active flap rotor hub loads (Leyland, 2014). Three types of practical constraints on the flap deflections have been considered. To validate the current application, two other optimization methods have been used: i) the standard, linear unconstrained method, and ii) the nonlinear Generalized Reduced Gradient (GRG) method with constraints. The new software code NLP10x10 has been systematically checked out. It has been verified that NLP10x10 is functioning as desired. The following are briefly covered in this paper: relevant optimization theory; implementation of the capability of minimizing a metric of all, or a subset, of the hub loads as well as the capability of using all, or a subset, of the flap harmonics; and finally, solutions for the SMART rotor. The eventual goal is to implement NLP10x10 in a real-time wind tunnel environment.

  4. Single-shot spectroscopy of broadband Yb fiber laser

    NASA Astrophysics Data System (ADS)

    Suzuki, Masayuki; Yoneya, Shin; Kuroda, Hiroto

    2017-02-01

    We have experimentally reported on a real-time single-shot spectroscopy of a broadband Yb-doped fiber (YDF) laser which based on a nonlinear polarization evolution by using a time-stretched dispersive Fourier transformation technique. We have measured an 8000 consecutive single-shot spectra of mode locking and noise-like pulse (NLP), because our developed broadband YDF oscillator can individually operate the mode locking and NLP by controlling a pump LD power and angle of waveplates. A shot-to-shot spectral fluctuation was observed in NLP. For the investigation of pulse formation dynamics, we have measured the spectral evolution in an initial fluctuations of mode locked broadband YDF laser at an intracavity dispersion of 1500 and 6200 fs2 for the first time. In both case, a build-up time between cw and steady-state mode locking was estimated to be 50 us, the dynamics of spectral evolution between cw and mode locking, however, was completely different. A shot-to-shot strong spectral fluctuation, as can be seen in NLP spectra, was observed in the initial timescale of 20 us at the intracavity dispersion of 1500 fs2. These new findings would impact on understanding the birth of the broadband spectral formation in fiber laser oscillator.

  5. Natural language processing-based COTS software and related technologies survey.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stickland, Michael G.; Conrad, Gregory N.; Eaton, Shelley M.

    Natural language processing-based knowledge management software, traditionally developed for security organizations, is now becoming commercially available. An informal survey was conducted to discover and examine current NLP and related technologies and potential applications for information retrieval, information extraction, summarization, categorization, terminology management, link analysis, and visualization for possible implementation at Sandia National Laboratories. This report documents our current understanding of the technologies, lists software vendors and their products, and identifies potential applications of these technologies.

  6. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores.

    PubMed

    Rios, Anthony; Kavuluru, Ramakanth

    2017-11-01

    The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Training parents to use the natural language paradigm to increase their autistic children's speech.

    PubMed Central

    Laski, K E; Charlop, M H; Schreibman, L

    1988-01-01

    Parents of four nonverbal and four echolalic autistic children were trained to increase their children's speech by using the Natural Language Paradigm (NLP), a loosely structured procedure conducted in a play environment with a variety of toys. Parents were initially trained to use the NLP in a clinic setting, with subsequent parent-child speech sessions occurring at home. The results indicated that following training, parents increased the frequency with which they required their children to speak (i.e., modeled words and phrases, prompted answers to questions). Correspondingly, all children increased the frequency of their verbalizations in three nontraining settings. Thus, the NLP appears to be an efficacious program for parents to learn and use in the home to increase their children's speech. PMID:3225256

  8. A Sibling-Mediated Intervention for Children with Autism Spectrum Disorder: Using the Natural Language Paradigm (NLP).

    PubMed

    Spector, Vicki; Charlop, Marjorie H

    2018-05-01

    We taught three typically developing siblings to occasion speech by implementing the Natural Language Paradigm (NLP) with their brothers with autism spectrum disorder (ASD). A non-concurrent multiple baseline design across children with ASD and sibling dyads was used. Ancillary behaviors of happiness, play, and joint attention for the children with ASD were recorded. Generalization of speech for the children with ASD across setting and peers was also measured. During baseline, the children with ASD displayed few target speech behaviors and the siblings inconsistently occasioned speech from their brothers. After sibling training, however, they successfully delivered NLP, and in turn, for two of the brothers with ASD, speech reached criterion. Implications of this research suggest the inclusion of siblings in interventions.

  9. Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing.

    PubMed

    Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David

    2018-05-09

    Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.

  10. Applying quality by design (QbD) concept for fabrication of chitosan coated nanoliposomes.

    PubMed

    Pandey, Abhijeet P; Karande, Kiran P; Sonawane, Raju O; Deshmukh, Prashant K

    2014-03-01

    In the present investigation, a quality by design (QbD) strategy was successfully applied to the fabrication of chitosan-coated nanoliposomes (CH-NLPs) encapsulating a hydrophilic drug. The effects of the processing variables on the particle size, encapsulation efficiency (%EE) and coating efficiency (%CE) of CH-NLPs (prepared using a modified ethanol injection method) were investigated. The concentrations of lipid, cholesterol, drug and chitosan; stirring speed, sonication time; organic:aqueous phase ratio; and temperature were identified as the key factors after risk analysis for conducting a screening design study. A separate study was designed to investigate the robustness of the predicted design space. The particle size, %EE and %CE of the optimized CH-NLPs were 111.3 nm, 33.4% and 35.2%, respectively. The observed responses were in accordance with the predicted response, which confirms the suitability and robustness of the design space for CH-NLP formulation. In conclusion, optimization of the selected key variables will help minimize the problems related to size, %EE and %CE that are generally encountered when scaling up processes for NLP formulations. The robustness of the design space will help minimize both intra-batch and inter-batch variations, which are quite common in the pharmaceutical industry.

  11. Concept-Based Retrieval from Critical Incident Reports.

    PubMed

    Denecke, Kerstin

    2017-01-01

    Critical incident reporting systems (CIRS) are used as a means to collect anonymously entered information of incidents that occurred for example in a hospital. Analyzing this information helps to identify among others problems in the workflow, in the infrastructure or in processes. The entire potential of these sources of experiential knowledge remains often unconsidered since retrieval of relevant reports and their analysis is difficult and time-consuming, and the reporting systems often do not provide support for these tasks. The objective of this work is to develop a method for retrieving reports from the CIRS related to a specific user query. atural language processing (NLP) and information retrieval (IR) methods are exploited for realizing the retrieval. We compare standard retrieval methods that rely upon frequency of words with an approach that includes a semantic mapping of natural language to concepts of a medical ontology. By an evaluation, we demonstrate the feasibility of semantic document enrichment to improve recall in incident reporting retrieval. It is shown that a combination of standard keyword-based retrieval with semantic search results in highly satisfactory recall values. In future work, the evaluation should be repeated on a larger data set and real-time user evaluation need to be performed to assess user satisfactory with the system and results.

  12. A bootstrapping method for development of Treebank

    NASA Astrophysics Data System (ADS)

    Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

    2017-01-01

    Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

  13. Multilingual Information Retrieval in Thoracic Radiology: Feasibility Study

    PubMed Central

    Castilla, André Coutinho; Furuie, Sérgio Shiguemi; Mendonça, Eneida A.

    2014-01-01

    Most of essential information contained on Electronic Medical Record is stored as text, imposing several difficulties on automated data extraction and retrieval. Natural language processing is an approach that can unlock clinical information from free texts. The proposed methodology uses the specialized natural language processor MEDLEE developed for English language. To use this processor on Portuguese medical texts, chest x-ray reports were Machine Translated into English. The result of serial coupling of MT an NLP is tagged text which needs further investigation for extracting clinical findings. The objective of this experiment was to investigate normal reports and reports with device description on a set of 165 chest x-ray reports. We obtained sensitivity and specificity of 1 and 0.71 for the first condition and 0.97 and 0.97 for the second respectively. The reference was formed by the opinion of two radiologists. The results of this experiment indicate the viability of extracting clinical findings from chest x-ray reports through coupling MT and NLP. PMID:17911745

  14. Automated Non-Alphanumeric Symbol Resolution in Clinical Texts

    PubMed Central

    Moon, SungRim; Pakhomov, Serguei; Ryan, James; Melton, Genevieve B.

    2011-01-01

    Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols (‘+’, ‘–’, ‘/’, and ‘#’) were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for ‘+’, ‘–’, ‘/’, and ‘#’ were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems. PMID:22195157

  15. Automated Assessment of Medical Students' Clinical Exposures according to AAMC Geriatric Competencies.

    PubMed

    Chen, Yukun; Wrenn, Jesse; Xu, Hua; Spickard, Anderson; Habermann, Ralf; Powers, James; Denny, Joshua C

    2014-01-01

    Competence is essential for health care professionals. Current methods to assess competency, however, do not efficiently capture medical students' experience. In this preliminary study, we used machine learning and natural language processing (NLP) to identify geriatric competency exposures from students' clinical notes. The system applied NLP to generate the concepts and related features from notes. We extracted a refined list of concepts associated with corresponding competencies. This system was evaluated through 10-fold cross validation for six geriatric competency domains: "medication management (MedMgmt)", "cognitive and behavioral disorders (CBD)", "falls, balance, gait disorders (Falls)", "self-care capacity (SCC)", "palliative care (PC)", "hospital care for elders (HCE)" - each an American Association of Medical Colleges competency for medical students. The systems could accurately assess MedMgmt, SCC, HCE, and Falls competencies with F-measures of 0.94, 0.86, 0.85, and 0.84, respectively, but did not attain good performance for PC and CBD (0.69 and 0.62 in F-measure, respectively).

  16. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  17. TextHunter – A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research

    PubMed Central

    Jackson MSc, Richard G.; Ball, Michael; Patel, Rashmi; Hayes, Richard D.; Dobson, Richard J.B.; Stewart, Robert

    2014-01-01

    Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises both increased sample size and data richness - therefore unprecedented study power. However, in many medical domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data, construction of concept extraction machine learning models and their application to documents. Using confidence thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use cases. PMID:25954379

  18. Automated extraction of family history information from clinical notes.

    PubMed

    Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S; Winden, Tamara J; Carter, Elizabeth W; Melton, Genevieve B

    2014-01-01

    Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication ("indicator phrases"), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications.

  19. Automated Extraction of Family History Information from Clinical Notes

    PubMed Central

    Bill, Robert; Pakhomov, Serguei; Chen, Elizabeth S.; Winden, Tamara J.; Carter, Elizabeth W.; Melton, Genevieve B.

    2014-01-01

    Despite increased functionality for obtaining family history in a structured format within electronic health record systems, clinical notes often still contain this information. We developed and evaluated an Unstructured Information Management Application (UIMA)-based natural language processing (NLP) module for automated extraction of family history information with functionality for identifying statements, observations (e.g., disease or procedure), relative or side of family with attributes (i.e., vital status, age of diagnosis, certainty, and negation), and predication (“indicator phrases”), the latter of which was used to establish relationships between observations and family member. The family history NLP system demonstrated F-scores of 66.9, 92.4, 82.9, 57.3, 97.7, and 61.9 for detection of family history statements, family member identification, observation identification, negation identification, vital status, and overall extraction of the predications between family members and observations, respectively. While the system performed well for detection of family history statements and predication constituents, further work is needed to improve extraction of certainty and temporal modifications. PMID:25954443

  20. Bengali-English Relevant Cross Lingual Information Access Using Finite Automata

    NASA Astrophysics Data System (ADS)

    Banerjee, Avishek; Bhattacharyya, Swapan; Hazra, Simanta; Mondal, Shatabdi

    2010-10-01

    CLIR techniques searches unrestricted texts and typically extract term and relationships from bilingual electronic dictionaries or bilingual text collections and use them to translate query and/or document representations into a compatible set of representations with a common feature set. In this paper, we focus on dictionary-based approach by using a bilingual data dictionary with a combination to statistics-based methods to avoid the problem of ambiguity also the development of human computer interface aspects of NLP (Natural Language processing) is the approach of this paper. The intelligent web search with regional language like Bengali is depending upon two major aspect that is CLIA (Cross language information access) and NLP. In our previous work with IIT, KGP we already developed content based CLIA where content based searching in trained on Bengali Corpora with the help of Bengali data dictionary. Here we want to introduce intelligent search because to recognize the sense of meaning of a sentence and it has a better real life approach towards human computer interactions.

  1. Processing biological literature with customizable Web services supporting interoperable formats.

    PubMed

    Rak, Rafal; Batista-Navarro, Riza Theresa; Carter, Jacob; Rowley, Andrew; Ananiadou, Sophia

    2014-01-01

    Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk. © The Author(s) 2014. Published by Oxford University Press.

  2. Processing biological literature with customizable Web services supporting interoperable formats

    PubMed Central

    Rak, Rafal; Batista-Navarro, Riza Theresa; Carter, Jacob; Rowley, Andrew; Ananiadou, Sophia

    2014-01-01

    Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk. PMID:25006225

  3. Our personal space.

    PubMed

    Suthers, M

    2000-10-01

    Neuro Linguistic Programming (NLP) as a model of human behaviour is presented. Its basic tenets and the factors that give rise to the physiological and emotional response to an external event are described. A number of psychotherapeutic interventions are also described, along with the influence of NLP on sporting and academic success. Finally, an exploration of these ideas for the purpose of contributing to personal well-being is given.

  4. Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

    PubMed

    Lin, Chin; Hsu, Chia-Jung; Lou, Yu-Sheng; Yeh, Shih-Jen; Lee, Chia-Cheng; Su, Sui-Lung; Chen, Hsiang-Cheng

    2017-11-06

    Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data. ©Chin Lin, Chia-Jung Hsu, Yu-Sheng Lou, Shih-Jen Yeh, Chia-Cheng Lee, Sui-Lung Su, Hsiang-Cheng Chen. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 06.11.2017.

  5. Enhancing the photoelectrochemical response of TiO2 nanotubes through their nanodecoration by pulsed-laser-deposited Ag nanoparticles

    NASA Astrophysics Data System (ADS)

    Trabelsi, K.; Hajjaji, A.; Gaidi, M.; Bessais, B.; El Khakani, M. A.

    2017-08-01

    We report on the pulsed laser deposition (PLD) based nanodecoration of titanium dioxide (TiO2) nanotube arrays (NTAs) by Ag nanoparticles (NPs). We focus here on the investigation of the effect of the number of laser ablation pulses (NLP) of the silver target on both the average size of the Ag-NPs and the photoelectrochemical conversion efficiency of the Ag-NP decorated TiO2-NT based photoanodes. By varying the NLP, we were able to not only control the size of the PLD-deposited Ag nanoparticles from 20 to ˜50 nm, but also to increase concomitantly the surface coverage of the TiO2 NTAs by Ag-NPs. The red-shifting of the surface plasmon resonance peak of the PLD-deposited Ag-NPs deposited onto quartz substrates confirmed the increase of their size as the NLP is increased from 500 to 10 000. By investigating the photo-electrochemical properties of Ag-NP decorated TiO2-NTAs, by means of linear sweep cyclic voltammetry under UV-Vis illumination, we found that the generated photocurrent is sensitive to the size of the Ag-NPs and reaches a maximum value at NLP =500 (i.e.,; Ag-NP size of ˜20 nm). For NLP = 500, the photoconversion efficiency of the Ag-NP decorated TiO2-NTAs is shown to reach a maximum of 4.5% (at 0.5 V vs Ag/AgCl). The photocurrent enhancement of Ag-NP decorated TiO2-NTAs is believed to result from the additional light harvesting enabled by the ability of Ag-NPs to absorb visible irradiation caused by various localized surface plasmon resonances, which in turn depend on the size and interdistance of the Ag nanoparticles.

  6. The nucleoplasmin homolog NLP mediates centromere clustering and anchoring to the nucleolus.

    PubMed

    Padeken, Jan; Mendiburo, María José; Chlamydas, Sarantis; Schwarz, Hans-Jürgen; Kremmer, Elisabeth; Heun, Patrick

    2013-04-25

    Centromere clustering during interphase is a phenomenon known to occur in many different organisms and cell types, yet neither the factors involved nor their physiological relevance is well understood. Using Drosophila tissue culture cells and flies, we identified a network of proteins, including the nucleoplasmin-like protein (NLP), the insulator protein CTCF, and the nucleolus protein Modulo, to be essential for the positioning of centromeres. Artificial targeting further demonstrated that NLP and CTCF are sufficient for clustering, while Modulo serves as the anchor to the nucleolus. Centromere clustering was found to depend on centric chromatin rather than specific DNA sequences. Moreover, unclustering of centromeres results in the spatial destabilization of pericentric heterochromatin organization, leading to partial defects in the silencing of repetitive elements, defects during chromosome segregation, and genome instability. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Comparison of Natural Language Processing Rules-based and Machine-learning Systems to Identify Lumbar Spine Imaging Findings Related to Low Back Pain.

    PubMed

    Tan, W Katherine; Hassanpour, Saeed; Heagerty, Patrick J; Rundell, Sean D; Suri, Pradeep; Huhdanpaa, Hannu T; James, Kathryn; Carrell, David S; Langlotz, Curtis P; Organ, Nancy L; Meier, Eric N; Sherman, Karen J; Kallmes, David F; Luetmer, Patrick H; Griffith, Brent; Nerenz, David R; Jarvik, Jeffrey G

    2018-03-28

    To evaluate a natural language processing (NLP) system built with open-source tools for identification of lumbar spine imaging findings related to low back pain on magnetic resonance and x-ray radiology reports from four health systems. We used a limited data set (de-identified except for dates) sampled from lumbar spine imaging reports of a prospectively assembled cohort of adults. From N = 178,333 reports, we randomly selected N = 871 to form a reference-standard dataset, consisting of N = 413 x-ray reports and N = 458 MR reports. Using standardized criteria, four spine experts annotated the presence of 26 findings, where 71 reports were annotated by all four experts and 800 were each annotated by two experts. We calculated inter-rater agreement and finding prevalence from annotated data. We randomly split the annotated data into development (80%) and testing (20%) sets. We developed an NLP system from both rule-based and machine-learned models. We validated the system using accuracy metrics such as sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). The multirater annotated dataset achieved inter-rater agreement of Cohen's kappa > 0.60 (substantial agreement) for 25 of 26 findings, with finding prevalence ranging from 3% to 89%. In the testing sample, rule-based and machine-learned predictions both had comparable average specificity (0.97 and 0.95, respectively). The machine-learned approach had a higher average sensitivity (0.94, compared to 0.83 for rules-based), and a higher overall AUC (0.98, compared to 0.90 for rules-based). Our NLP system performed well in identifying the 26 lumbar spine findings, as benchmarked by reference-standard annotation by medical experts. Machine-learned models provided substantial gains in model sensitivity with slight loss of specificity, and overall higher AUC. Copyright © 2018 The Association of University Radiologists. All rights reserved.

  8. Informatics can identify systemic sclerosis (SSc) patients at risk for scleroderma renal crisis.

    PubMed

    Redd, Doug; Frech, Tracy M; Murtaugh, Maureen A; Rhiannon, Julia; Zeng, Qing T

    2014-10-01

    Electronic medical records (EMR) provide an ideal opportunity for the detection, diagnosis, and management of systemic sclerosis (SSc) patients within the Veterans Health Administration (VHA). The objective of this project was to use informatics to identify potential SSc patients in the VHA that were on prednisone, in order to inform an outreach project to prevent scleroderma renal crisis (SRC). The electronic medical data for this study came from Veterans Informatics and Computing Infrastructure (VINCI). For natural language processing (NLP) analysis, a set of retrieval criteria was developed for documents expected to have a high correlation to SSc. The two annotators reviewed the ratings to assemble a single adjudicated set of ratings, from which a support vector machine (SVM) based document classifier was trained. Any patient having at least one document positively classified for SSc was considered positive for SSc and the use of prednisone≥10mg in the clinical document was reviewed to determine whether it was an active medication on the prescription list. In the VHA, there were 4272 patients that have a diagnosis of SSc determined by the presence of an ICD-9 code. From these patients, 1118 patients (21%) had the use of prednisone≥10mg. Of these patients, 26 had a concurrent diagnosis of hypertension, thus these patients should not be on prednisone. By the use of natural language processing (NLP) an additional 16,522 patients were identified as possible SSc, highlighting that cases of SSc in the VHA may exist that are unidentified by ICD-9. A 10-fold cross validation of the classifier resulted in a precision (positive predictive value) of 0.814, recall (sensitivity) of 0.973, and f-measure of 0.873. Our study demonstrated that current clinical practice in the VHA includes the potentially dangerous use of prednisone for veterans with SSc. This present study also suggests there may be many undetected cases of SSc and NLP can successfully identify these patients. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Optimization-Based Selection of Influential Agents in a Rural Afghan Social Network

    DTIC Science & Technology

    2010-06-01

    nonlethal targeting model, a nonlinear programming ( NLP ) optimization formulation that identifies the k US agent assignment strategy producing the greatest...leader social network, and 3) the nonlethal targeting model, a nonlinear programming ( NLP ) optimization formulation that identifies the k US agent...NATO Coalition in Afghanistan. 55 for Afghanistan ( [54], [31], [48], [55], [30]). While Arab tribes tend to be more hierarchical, Pashtun tribes are

  10. NLP as a communication strategy tool in libraries

    NASA Astrophysics Data System (ADS)

    Koulouris, Alexandros; Sakas, Damianos P.; Giannakopoulos, Georgios

    2015-02-01

    The role of communication is a catalyst for the proper function of an organization. This paper focuses on libraries, where the communication is crucial for their success. In our opinion, libraries in Greece are suffering from the lack of communication and marketing strategy. Communication has many forms and manifestations. A key aspect of communication is body language, which has a dominant communication tool the neuro-linguistic programming (NLP). The body language is a system that expresses and transfers messages, thoughts and emotions. More and more organizations in the public sector and companies in the private sector base their success on the communication skills of their personnel. The NLP suggests several methods to obtain excellent relations in the workplace and to develop ideal communication. The NLP theory is mainly based on the development of standards (communication model) that guarantees the expected results. This research was conducted and analyzed in two parts, the qualitative and the quantitative. The findings mainly confirm the need for proper communication within libraries. In the qualitative research, the interviewees were aware of communication issues, although some gaps in that knowledge were observed. Even this slightly lack of knowledge, highlights the need for constant information through educational programs. This is particularly necessary for senior executives of libraries, who should attend relevant seminars and refresh their knowledge on communication related issues.

  11. Interpreting Hypernymic Propositions in an Online Medical Encyclopedia

    PubMed Central

    Fiszman, Marcelo; Rindflesch, Thomas C.; Kilicoglu, Halil

    2003-01-01

    Interpretation of semantic propositions from biomedical texts documents would provide valuable support to natural language processing (NLP) applications. We are developing a methodology to interpret a kind of semantic proposition, the hypernymic proposition, in MEDLINE abstracts. In this paper, we expanded the system to identify these structures in a different discourse domain: the Medical Encyclopedia from the National Library of Medicine’s MEDLINEplus® Website. PMID:14728345

  12. Interpreting hypernymic propositions in an online medical encyclopedia.

    PubMed

    Fiszman, Marcelo; Rindflesch, Thomas C; Kilicoglu, Halil

    2003-01-01

    Interpretation of semantic propositions from bio-medical texts documents would provide valuable support to natural language processing (NLP) applications. We are developing a methodology to interpret a kind of semantic proposition, the hypernymic proposition, in MEDLINE abstracts. In this paper, we expanded the system to identify these structures in a different discourse domain: the Medical Encyclopedia from the National Library of Medi-cine's MEDLINEplus Website.

  13. Advocate: A Distributed Architecture for Speech-to-Speech Translation

    DTIC Science & Technology

    2009-01-01

    tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were

  14. Automating Assessment of Lifestyle Counseling in Electronic Health Records

    PubMed Central

    Hazlehurst, Brian L.; Lawrence, Jean M.; Donahoo, William T.; Sherwood, Nancy E; Kurtz, Stephen E; Xu, Stan; Steiner, John F

    2015-01-01

    Background Numerous population-based surveys indicate that overweight and obese patients can benefit from lifestyle counseling during routine clinical care. Purpose To determine if natural language processing (NLP) could be applied to information in the electronic health record (EHR) to automatically assess delivery of counseling related to weight management in clinical health care encounters. Methods The MediClass system with NLP capabilities was used to identify weight management counseling in EHR encounter records. Knowledge for the NLP application was derived from the 5As framework for behavior counseling: Ask (evaluate weight and related disease), Advise at-risk patients to lose weight, Assess patients’ readiness to change behavior, Assist through discussion of weight loss methods and programs and Arrange follow-up efforts including referral. Using samples of EHR data in 1/1/2007-3/31/2011 period from two health systems, the accuracy of the MediClass processor for identifying these counseling elements was evaluated in post-partum visits of 600 women with gestational diabetes mellitus (GDM) compared to manual chart review as gold standard. Data were analyzed in 2013. Results Mean sensitivity and specificity for each of the 5As compared to the gold standard was at or above 85%, with the exception of sensitivity for Assist which was measured at 40% and 60% respectively for each of the two health systems. The automated method identified many valid cases of Assist not identified in the gold standard. Conclusions The MediClass processor has performance capability sufficiently similar to human abstractors to permit automated assessment of counseling for weight loss in post-partum encounter records. PMID:24745635

  15. Automating assessment of lifestyle counseling in electronic health records.

    PubMed

    Hazlehurst, Brian L; Lawrence, Jean M; Donahoo, William T; Sherwood, Nancy E; Kurtz, Stephen E; Xu, Stan; Steiner, John F

    2014-05-01

    Numerous population-based surveys indicate that overweight and obese patients can benefit from lifestyle counseling during routine clinical care. To determine if natural language processing (NLP) could be applied to information in the electronic health record (EHR) to automatically assess delivery of weight management-related counseling in clinical healthcare encounters. The MediClass system with NLP capabilities was used to identify weight-management counseling in EHRs. Knowledge for the NLP application was derived from the 5As framework for behavior counseling: Ask (evaluate weight and related disease), Advise at-risk patients to lose weight, Assess patients' readiness to change behavior, Assist through discussion of weight-loss methods and programs, and Arrange follow-up efforts including referral. Using samples of EHR data between January 1, 2007, and March 31, 2011, from two health systems, the accuracy of the MediClass processor for identifying these counseling elements was evaluated in postpartum visits of 600 women with gestational diabetes mellitus (GDM) compared to manual chart review as the gold standard. Data were analyzed in 2013. Mean sensitivity and specificity for each of the 5As compared to the gold standard was at or above 85%, with the exception of sensitivity for Assist, which was 40% and 60% for each of the two health systems. The automated method identified many valid Assist cases not identified in the gold standard. The MediClass processor has performance capability sufficiently similar to human abstractors to permit automated assessment of counseling for weight loss in postpartum encounter records. Copyright © 2014 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  16. Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals.

    PubMed

    Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John

    2014-01-01

    The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  17. Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

    PubMed Central

    Perlis, R. H.; Iosifescu, D. V.; Castro, V. M.; Murphy, S. N.; Gainer, V. S.; Minnier, J.; Cai, T.; Goryachev, S.; Zeng, Q.; Gallagher, P. J.; Fava, M.; Weilburg, J. B.; Churchill, S. E.; Kohane, I. S.; Smoller, J. W.

    2013-01-01

    Background Electronic medical records (EMR) provide a unique opportunity for efficient, large-scale clinical investigation in psychiatry. However, such studies will require development of tools to define treatment outcome. Method Natural language processing (NLP) was applied to classify notes from 127 504 patients with a billing diagnosis of major depressive disorder, drawn from out-patient psychiatry practices affiliated with multiple, large New England hospitals. Classifications were compared with results using billing data (ICD-9 codes) alone and to a clinical gold standard based on chart review by a panel of senior clinicians. These cross-sectional classifications were then used to define longitudinal treatment outcomes, which were compared with a clinician-rated gold standard. Results Models incorporating NLP were superior to those relying on billing data alone for classifying current mood state (area under receiver operating characteristic curve of 0.85–0.88 v. 0.54–0.55). When these cross-sectional visits were integrated to define longitudinal outcomes and incorporate treatment data, 15% of the cohort remitted with a single antidepressant treatment, while 13% were identified as failing to remit despite at least two antidepressant trials. Non-remitting patients were more likely to be non-Caucasian (p<0.001). Conclusions The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations. Continued development will be required to better address moderators of outcome such as adherence and co-morbidity. PMID:21682950

  18. Creation of a simple natural language processing tool to support an imaging utilization quality dashboard.

    PubMed

    Swartz, Jordan; Koziatek, Christian; Theobald, Jason; Smith, Silas; Iturrate, Eduardo

    2017-05-01

    Testing for venous thromboembolism (VTE) is associated with cost and risk to patients (e.g. radiation). To assess the appropriateness of imaging utilization at the provider level, it is important to know that provider's diagnostic yield (percentage of tests positive for the diagnostic entity of interest). However, determining diagnostic yield typically requires either time-consuming, manual review of radiology reports or the use of complex and/or proprietary natural language processing software. The objectives of this study were twofold: 1) to develop and implement a simple, user-configurable, and open-source natural language processing tool to classify radiology reports with high accuracy and 2) to use the results of the tool to design a provider-specific VTE imaging dashboard, consisting of both utilization rate and diagnostic yield. Two physicians reviewed a training set of 400 lower extremity ultrasound (UTZ) and computed tomography pulmonary angiogram (CTPA) reports to understand the language used in VTE-positive and VTE-negative reports. The insights from this review informed the arguments to the five modifiable parameters of the NLP tool. A validation set of 2,000 studies was then independently classified by the reviewers and by the tool; the classifications were compared and the performance of the tool was calculated. The tool was highly accurate in classifying the presence and absence of VTE for both the UTZ (sensitivity 95.7%; 95% CI 91.5-99.8, specificity 100%; 95% CI 100-100) and CTPA reports (sensitivity 97.1%; 95% CI 94.3-99.9, specificity 98.6%; 95% CI 97.8-99.4). The diagnostic yield was then calculated at the individual provider level and the imaging dashboard was created. We have created a novel NLP tool designed for users without a background in computer programming, which has been used to classify venous thromboembolism reports with a high degree of accuracy. The tool is open-source and available for download at http://iturrate.com/simpleNLP. Results obtained using this tool can be applied to enhance quality by presenting information about utilization and yield to providers via an imaging dashboard. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.

    PubMed

    Wu, Y; Denny, J C; Rosenbloom, S T; Miller, R A; Giuse, D A; Song, M; Xu, H

    2015-01-01

    To save time, healthcare providers frequently use abbreviations while authoring clinical documents. Nevertheless, abbreviations that authors deem unambiguous often confuse other readers, including clinicians, patients, and natural language processing (NLP) systems. Most current clinical NLP systems "post-process" notes long after clinicians enter them into electronic health record systems (EHRs). Such post-processing cannot guarantee 100% accuracy in abbreviation identification and disambiguation, since multiple alternative interpretations exist. Authors describe a prototype system for real-time Clinical Abbreviation Recognition and Disambiguation (rCARD) - i.e., a system that interacts with authors during note generation to verify correct abbreviation senses. The rCARD system design anticipates future integration with web-based clinical documentation systems to improve quality of healthcare records. When clinicians enter documents, rCARD will automatically recognize each abbreviation. For abbreviations with multiple possible senses, rCARD will show a ranked list of possible meanings with the best predicted sense at the top. The prototype application embodies three word sense disambiguation (WSD) methods to predict the correct senses of abbreviations. We then conducted three experments to evaluate rCARD, including 1) a performance evaluation of different WSD methods; 2) a time evaluation of real-time WSD methods; and 3) a user study of typing clinical sentences with abbreviations using rCARD. Using 4,721 sentences containing 25 commonly observed, highly ambiguous clinical abbreviations, our evaluation showed that the best profile-based method implemented in rCARD achieved a reasonable WSD accuracy of 88.8% (comparable to SVM - 89.5%) and the cost of time for the different WSD methods are also acceptable (ranging from 0.630 to 1.649 milliseconds within the same network). The preliminary user study also showed that the extra time costs by rCARD were about 5% of total document entry time and users did not feel a significant delay when using rCARD for clinical document entry. The study indicates that it is feasible to integrate a real-time, NLP-enabled abbreviation recognition and disambiguation module with clinical documentation systems.

  20. Applying Natural Language Processing to Understand Motivational Profiles for Maintaining Physical Activity After a Mobile App and Accelerometer-Based Intervention: The mPED Randomized Controlled Trial.

    PubMed

    Fukuoka, Yoshimi; Lindgren, Teri G; Mintz, Yonatan Dov; Hooper, Julie; Aswani, Anil

    2018-06-20

    Regular physical activity is associated with reduced risk of chronic illnesses. Despite various types of successful physical activity interventions, maintenance of activity over the long term is extremely challenging. The aims of this original paper are to 1) describe physical activity engagement post intervention, 2) identify motivational profiles using natural language processing (NLP) and clustering techniques in a sample of women who completed the physical activity intervention, and 3) compare sociodemographic and clinical data among these identified cluster groups. In this cross-sectional analysis of 203 women completing a 12-month study exit (telephone) interview in the mobile phone-based physical activity education study were examined. The mobile phone-based physical activity education study was a randomized, controlled trial to test the efficacy of the app and accelerometer intervention and its sustainability over a 9-month period. All subjects returned the accelerometer and stopped accessing the app at the last 9-month research office visit. Physical engagement and motivational profiles were assessed by both closed and open-ended questions, such as "Since your 9-month study visit, has your physical activity been more, less, or about the same (compared to the first 9 months of the study)?" and, "What motivates you the most to be physically active?" NLP and cluster analysis were used to classify motivational profiles. Descriptive statistics were used to compare participants' baseline characteristics among identified groups. Approximately half of the 2 intervention groups (Regular and Plus) reported that they were still wearing an accelerometer and engaging in brisk walking as they were directed during the intervention phases. These numbers in the 2 intervention groups were much higher than the control group (overall P=.01 and P=.003, respectively). Three clusters were identified through NLP and named as the Weight Loss group (n=19), the Illness Prevention group (n=138), and the Health Promotion group (n=46). The Weight Loss group was significantly younger than the Illness Prevention and Health Promotion groups (overall P<.001). The Illness Prevention group had a larger number of Caucasians as compared to the Weight Loss group (P=.001), which was composed mostly of those who identified as African American, Hispanic, or mixed race. Additionally, the Health Promotion group tended to have lower BMI scores compared to the Illness Prevention group (overall P=.02). However, no difference was noted in the baseline moderate-to-vigorous intensity activity level among the 3 groups (overall P>.05). The findings could be relevant to tailoring a physical activity maintenance intervention. Furthermore, the findings from NLP and cluster analysis are useful methods to analyze short free text to differentiate motivational profiles. As more sophisticated NL tools are developed in the future, the potential of NLP application in behavioral research will broaden. ClinicalTrials.gov NCT01280812; https://clinicaltrials.gov/ct2/show/NCT01280812 (Archived by WebCite at http://www.webcitation.org/70IkGagAJ). ©Yoshimi Fukuoka, Teri G Lindgren, Yonatan Dov Mintz, Julie Hooper, Anil Aswani. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 20.06.2018.

  1. Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task

    PubMed Central

    2015-01-01

    Background The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. To further explore the grand theme, we extended the evaluation from a perspective of KB construction. Also, the Gene Regulation Ontology (GRO) task was newly introduced in the third edition. The final evaluation of the participating systems resulted in relatively low performance. The reason was attributed to the large size and complex semantic representation of the ontology. To investigate potential benefits of resource exchange between the presumably similar tasks, we measured the overlap between the datasets of the two tasks, and tested whether the dataset for one task can be used to enhance performance on the other. Results We report an extended evaluation on all the participating systems in the GE task, incoporating a KB perspective. For the evaluation, the final submission of each participant was converted to RDF statements, and evaluated using 8 queries that were formulated in SPARQL. The results suggest that the evaluation may be concluded differently between the two different perspectives, annotation vs. KB. We also provide a comparison of the GE and GRO tasks by converting their datasets into each other's format. More than 90% of the GE data could be converted into the GRO task format, while only half of the GRO data could be mapped to the GE task format. The imbalance in conversion indicates that the GRO is a comprehensive extension of the GE task ontology. We further used the converted GRO data as additional training data for the GE task, which helped improve GE task participant system performance. However, the converted GE data did not help GRO task participants, due to overfitting and the ontology gap. PMID:26202680

  2. Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file.

    PubMed

    Do, Bao H; Wu, Andrew; Biswal, Sandip; Kamaya, Aya; Rubin, Daniel L

    2010-11-01

    Storing and retrieving radiology cases is an important activity for education and clinical research, but this process can be time-consuming. In the process of structuring reports and images into organized teaching files, incidental pathologic conditions not pertinent to the primary teaching point can be omitted, as when a user saves images of an aortic dissection case but disregards the incidental osteoid osteoma. An alternate strategy for identifying teaching cases is text search of reports in radiology information systems (RIS), but retrieved reports are unstructured, teaching-related content is not highlighted, and patient identifying information is not removed. Furthermore, searching unstructured reports requires sophisticated retrieval methods to achieve useful results. An open-source, RadLex(®)-compatible teaching file solution called RADTF, which uses natural language processing (NLP) methods to process radiology reports, was developed to create a searchable teaching resource from the RIS and the picture archiving and communication system (PACS). The NLP system extracts and de-identifies teaching-relevant statements from full reports to generate a stand-alone database, thus converting existing RIS archives into an on-demand source of teaching material. Using RADTF, the authors generated a semantic search-enabled, Web-based radiology archive containing over 700,000 cases with millions of images. RADTF combines a compact representation of the teaching-relevant content in radiology reports and a versatile search engine with the scale of the entire RIS-PACS collection of case material. ©RSNA, 2010

  3. C-5M Super Galaxy Utilization with Joint Precision Airdrop System

    DTIC Science & Technology

    2012-03-22

    System Notes FireFly 900-2,200 Steerable Parafoil Screamer 500-2,200 Steerable Parafoil w/additional chutes to slow touchdown Dragonfly...setting . This initial feasible solution provides the Nonlinear Program algorithm a starting point to continue its calculations. The model continues...provides the NLP with a starting point of 1. This provides the NLP algorithm a point within the feasible region to begin its calculations in an attempt

  4. An exploratory study of neuro linguistic programming and communication anxiety

    NASA Astrophysics Data System (ADS)

    Brunner, Lois M.

    1993-12-01

    This thesis is an exploratory study of Neuro-Linguistic Programming (NLP), and its capabilities to provide a technique or a composite technique that will reduce the anxiety associated with making an oral brief or presentation before a group, sometimes referred to as Communication Apprehension. The composite technique comes from NLP and Time Line Therapy, which is an extension to NLP. Student volunteers (17) from a Communications course given by the Administrative Sciences Department were taught this technique. For each volunteer, an informational oral presentation was made and videotaped before the training and another informational oral presentation made and videotaped following the training. The before and after training presentations for each individual volunteer were evaluated against criteria for communications anxiety and analyzed to determine if there was a noticeable reduction of anxiety after the training. Anxiety was reduced in all of the volunteers in this study.

  5. Global optimization algorithm for heat exchanger networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quesada, I.; Grossmann, I.E.

    This paper deals with the global optimization of heat exchanger networks with fixed topology. It is shown that if linear area cost functions are assumed, as well as arithmetic mean driving force temperature differences in networks with isothermal mixing, the corresponding nonlinear programming (NLP) optimization problem involves linear constraints and a sum of linear fractional functions in the objective which are nonconvex. A rigorous algorithm is proposed that is based on a convex NLP underestimator that involves linear and nonlinear estimators for fractional and bilinear terms which provide a tight lower bound to the global optimum. This NLP problem ismore » used within a spatial branch and bound method for which branching rules are given. Basic properties of the proposed method are presented, and its application is illustrated with several example problems. The results show that the proposed method only requires few nodes in the branch and bound search.« less

  6. Launch flexibility using NLP guidance and remote wind sensing

    NASA Technical Reports Server (NTRS)

    Cramer, Evin J.; Bradt, Jerre E.; Hardtla, John W.

    1990-01-01

    This paper examines the use of lidar wind measurements in the implementation of a guidance strategy for a nonlinear programming (NLP) launch guidance algorithm. The NLP algorithm uses B-spline command function representation for flexibility in the design of the guidance steering commands. Using this algorithm, the guidance system solves a two-point boundary value problem at each guidance update. The specification of different boundary value problems at each guidance update provides flexibility that can be used in the design of the guidance strategy. The algorithm can use lidar wind measurements for on pad guidance retargeting and for load limiting guidance steering commands. Examples presented in the paper use simulated wind updates to correct wind induced final orbit errors and to adjust the guidance steering commands to limit the product of the dynamic pressure and angle-of-attack for launch vehicle load alleviation.

  7. Angular momentum projection for a Nilsson mean-field plus pairing model

    NASA Astrophysics Data System (ADS)

    Wang, Yin; Pan, Feng; Launey, Kristina D.; Luo, Yan-An; Draayer, J. P.

    2016-06-01

    The angular momentum projection for the axially deformed Nilsson mean-field plus a modified standard pairing (MSP) or the nearest-level pairing (NLP) model is proposed. Both the exact projection, in which all intrinsic states are taken into consideration, and the approximate projection, in which only intrinsic states with K = 0 are taken in the projection, are considered. The analysis shows that the approximate projection with only K = 0 intrinsic states seems reasonable, of which the configuration subspace considered is greatly reduced. As simple examples for the model application, low-lying spectra and electromagnetic properties of 18O and 18Ne are described by using both the exact and approximate angular momentum projection of the MSP or the NLP, while those of 20Ne and 24Mg are described by using the approximate angular momentum projection of the MSP or NLP.

  8. Double Parton Fragmentation Function and its Evolution in Quarkonium Production

    NASA Astrophysics Data System (ADS)

    Kang, Zhong-Bo

    2014-01-01

    We summarize the results of a recent study on a new perturbative QCD factorization formalism for the production of heavy quarkonia of large transverse momentum pT at collider energies. Such a new factorization formalism includes both the leading power (LP) and next-to-leading power (NLP) contributions to the cross section in the mQ2/p_T^2 expansion for heavy quark mass mQ. For the NLP contribution, the so-called double parton fragmentation functions are involved, whose evolution equations have been derived. We estimate fragmentation functions in the non-relativistic QCD formalism, and found that their contribution reproduce the bulk of the large enhancement found in explicit NLO calculations in the color singlet model. Heavy quarkonia produced from NLP channels prefer longitudinal polarization, in contrast to the single parton fragmentation function. This might shed some light on the heavy quarkonium polarization puzzle.

  9. Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

    PubMed

    Mehryary, Farrokh; Kaewphan, Suwisa; Hakala, Kai; Ginter, Filip

    2016-01-01

    Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction. Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation. The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database. The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.

  10. An Investigation of the "e-rater"® Automated Scoring Engine's Grammar, Usage, Mechanics, and Style Microfeatures and Their Aggregation Model. Research Report. ETS RR-17-04

    ERIC Educational Resources Information Center

    Chen, Jing; Zhang, Mo; Bejar, Isaac I.

    2017-01-01

    Automated essay scoring (AES) generally computes essay scores as a function of macrofeatures derived from a set of microfeatures extracted from the text using natural language processing (NLP). In the "e-rater"® automated scoring engine, developed at "Educational Testing Service" (ETS) for the automated scoring of essays, each…

  11. Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.

    PubMed

    Liu, H; Lussier, Y A; Friedman, C

    2001-08-01

    With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

  12. A Novel Approach to Semantic and Coreference Annotation at LLNL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Firpo, M

    A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have amore » system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.« less

  13. Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution

    PubMed Central

    Liu, Hongfang; Maxwell, Kara N.; Pathak, Jyotishman; Zhang, Rui

    2018-01-01

    Abstract Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource‐intense and not readily scalable. Informatics‐based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule‐based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor‐positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart‐reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR‐based precision medicine phenotypes. PMID:29084368

  14. Linguistic analysis of large-scale medical incident reports for patient safety.

    PubMed

    Fujita, Katsuhide; Akiyama, Masanori; Park, Keunsik; Yamaguchi, Etsuko Nakagami; Furukawa, Hiroyuki

    2012-01-01

    The analysis of medical incident reports is indispensable for patient safety. The cycles between analysis of incident reports and proposals to medical staffs are a key point for improving the patient safety in the hospital. Most incident reports are composed from freely written descriptions, but an analysis of such free descriptions is not sufficient in the medical field. In this study, we aim to accumulate and reinterpret findings using structured incident information, to clarify improvements that should be made to solve the root cause of the accident, and to ensure safe medical treatment through such improvements. We employ natural language processing (NLP) and network analysis to identify effective categories of medical incident reports. Network analysis can find various relationships that are not only direct but also indirect. In addition, we compare bottom-up results obtained by NLP with existing categories based on experts' judgment. By the bottom-up analysis, the class of patient managements regarding patients' fallings and medicines in top-down analysis is created clearly. Finally, we present new perspectives on ways of improving patient safety.

  15. Automated Assessment of Medical Students’ Clinical Exposures according to AAMC Geriatric Competencies

    PubMed Central

    Chen, Yukun; Wrenn, Jesse; Xu, Hua; Spickard, Anderson; Habermann, Ralf; Powers, James; Denny, Joshua C.

    2014-01-01

    Competence is essential for health care professionals. Current methods to assess competency, however, do not efficiently capture medical students’ experience. In this preliminary study, we used machine learning and natural language processing (NLP) to identify geriatric competency exposures from students’ clinical notes. The system applied NLP to generate the concepts and related features from notes. We extracted a refined list of concepts associated with corresponding competencies. This system was evaluated through 10-fold cross validation for six geriatric competency domains: “medication management (MedMgmt)”, “cognitive and behavioral disorders (CBD)”, “falls, balance, gait disorders (Falls)”, “self-care capacity (SCC)”, “palliative care (PC)”, “hospital care for elders (HCE)” – each an American Association of Medical Colleges competency for medical students. The systems could accurately assess MedMgmt, SCC, HCE, and Falls competencies with F-measures of 0.94, 0.86, 0.85, and 0.84, respectively, but did not attain good performance for PC and CBD (0.69 and 0.62 in F-measure, respectively). PMID:25954341

  16. Integration of Neuroimaging and Microarray Datasets through Mapping and Model-Theoretic Semantic Decomposition of Unstructured Phenotypes

    PubMed Central

    Pantazatos, Spiro P.; Li, Jianrong; Pavlidis, Paul; Lussier, Yves A.

    2009-01-01

    An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets. PMID:20495688

  17. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department

    PubMed Central

    Ni, Yizhao; Kennebeck, Stephanie; Dexheimer, Judith W; McAneney, Constance M; Tang, Huaxiu; Lingren, Todd; Li, Qi; Zhai, Haijun; Solti, Imre

    2015-01-01

    Objectives (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials. Data and methods We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial–patient matches and a reference standard of historical trial–patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed. Results Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial–patient matches. Discussion and conclusion By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials. PMID:25030032

  18. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  19. Healthcare costs and resource utilization of patients with binge-eating disorder and eating disorder not otherwise specified in the Department of Veterans Affairs.

    PubMed

    Bellows, Brandon K; DuVall, Scott L; Kamauu, Aaron W C; Supina, Dylan; Babcock, Thomas; LaFleur, Joanne

    2015-12-01

    The objective of this study was to compare the one-year healthcare costs and utilization of patients with binge-eating disorder (BED) to patients with eating disorder not otherwise specified without BED (EDNOS-only) and to matched patients without an eating disorder (NED). A natural language processing (NLP) algorithm identified adults with BED from clinical notes in the Department of Veterans Affairs (VA) electronic health record database from 2000 to 2011. Patients with EDNOS-only were identified using ICD-9 code (307.50) and those with NLP-identified BED were excluded. First diagnosis date defined the index date for both groups. Patients with NED were randomly matched 4:1, as available, to patients with BED on age, sex, BMI, depression diagnosis, and index month. Patients with cost data (2005-2011) were included. Total healthcare, inpatient, outpatient, and pharmacy costs were examined. Generalized linear models were used to compare total one-year healthcare costs while adjusting for baseline patient characteristics. There were 257 BED, 743 EDNOS-only, and 823 matched NED patients identified. The mean (SD) total unadjusted one-year costs, in 2011 US dollars, were $33,716 ($38,928) for BED, $37,052 ($40,719) for EDNOS-only, and $19,548 ($35,780) for NED patients. When adjusting for patient characteristics, BED patients had one-year total healthcare costs $5,589 higher than EDNOS-only (p = 0.06) and $18,152 higher than matched NED patients (p < 0.001). This study is the first to use NLP to identify BED patients and quantify their healthcare costs and utilization. Patients with BED had similar one-year total healthcare costs to EDNOS-only patients, but significantly higher costs than patients with NED. © 2015 Wiley Periodicals, Inc.

  20. Using Information from the Electronic Health Record to Improve Measurement of Unemployment in Service Members and Veterans with mTBI and Post-Deployment Stress

    PubMed Central

    Dillahunt-Aspillaga, Christina; Finch, Dezon; Massengale, Jill; Kretzmer, Tracy; Luther, Stephen L.; McCart, James A.

    2014-01-01

    Objective The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). Design Retrospective cohort study using data from selected progress notes stored in the EHR. Setting Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. Participants Service members and Veterans with TBI who participated in the PREP program (N = 60). Main Outcome Measures Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. Results Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. Conclusions Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort. PMID:25541956

Top