An Overview of Computer-Based Natural Language Processing.
ERIC Educational Resources Information Center
Gevarter, William B.
Computer-based Natural Language Processing (NLP) is the key to enabling humans and their computer-based creations to interact with machines using natural languages (English, Japanese, German, etc.) rather than formal computer languages. NLP is a major research area in the fields of artificial intelligence and computational linguistics. Commercial…
Emerging Approach of Natural Language Processing in Opinion Mining: A Review
NASA Astrophysics Data System (ADS)
Kim, Tai-Hoon
Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. This paper outlines a framework to use computer and natural language techniques for various levels of learners to learn foreign languages in Computer-based Learning environment. We propose some ideas for using the computer as a practical tool for learning foreign language where the most of courseware is generated automatically. We then describe how to build Computer Based Learning tools, discuss its effectiveness, and conclude with some possibilities using on-line resources.
Net-centric ACT-R-Based Cognitive Architecture with DEVS Unified Process
2011-04-01
effort has been spent in analyzing various forms of requirement specifications, viz, state-based, Natural Language based, UML-based, Rule- based, BPMN ...requirement specifications in one of the chosen formats such as BPMN , DoDAF, Natural Language Processing (NLP) based, UML- based, DSL or simply
Semi-Automated Methods for Refining a Domain-Specific Terminology Base
2011-02-01
only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National
Generating and Executing Complex Natural Language Queries across Linked Data.
Hamon, Thierry; Mougin, Fleur; Grabar, Natalia
2015-01-01
With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.
An overview of computer-based natural language processing
NASA Technical Reports Server (NTRS)
Gevarter, W. B.
1983-01-01
Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds.
NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.
ERIC Educational Resources Information Center
Zhou, Lina; Zhang, Dongsong
2003-01-01
Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…
A Natural Language Interface Concordant with a Knowledge Base.
Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young
2016-01-01
The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.
ERIC Educational Resources Information Center
Jarman, Jay
2011-01-01
This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…
Video to Text (V2T) in Wide Area Motion Imagery
2015-09-01
microtext) or a document (e.g., using Sphinx or Apache NLP ) as an automated approach [102]. Previous work in natural language full-text searching...language processing ( NLP ) based module. The heart of the structured text processing module includes the following seven key word banks...Features Tracker MHT Multiple Hypothesis Tracking MIL Multiple Instance Learning NLP Natural Language Processing OAB Online AdaBoost OF Optic Flow
Applying language technology to nursing documents: pros and cons with a focus on ethics.
Suominen, Hanna; Lehtikunnas, Tuija; Back, Barbro; Karsten, Helena; Salakoski, Tapio; Salanterä, Sanna
2007-10-01
The present study discusses ethics in building and using applications based on natural language processing in electronic nursing documentation. Specifically, we first focus on the question of how patient confidentiality can be ensured in developing language technology for the nursing documentation domain. Then, we identify and theoretically analyze the ethical outcomes which arise when using natural language processing to support clinical judgement and decision-making. In total, we put forward and justify 10 claims related to ethics in applying language technology to nursing documents. A review of recent scientific articles related to ethics in electronic patient records or in the utilization of large databases was conducted. Then, the results were compared with ethical guidelines for nurses and the Finnish legislation covering health care and processing of personal data. Finally, the practical experiences of the authors in applying the methods of natural language processing to nursing documents were appended. Patient records supplemented with natural language processing capabilities may help nurses give better, more efficient and more individualized care for their patients. In addition, language technology may facilitate patients' possibility to receive truthful information about their health and improve the nature of narratives. Because of these benefits, research about the use of language technology in narratives should be encouraged. In contrast, privacy-sensitive health care documentation brings specific ethical concerns and difficulties to the natural language processing of nursing documents. Therefore, when developing natural language processing tools, patient confidentiality must be ensured. While using the tools, health care personnel should always be responsible for the clinical judgement and decision-making. One should also consider that the use of language technology in nursing narratives may threaten patients' rights by using documentation collected for other purposes. Applying language technology to nursing documents may, on the one hand, contribute to the quality of care, but, on the other hand, threaten patient confidentiality. As an overall conclusion, natural language processing of nursing documents holds the promise of great benefits if the potential risks are taken into consideration.
ERIC Educational Resources Information Center
Rahimi, Zahra; Litman, Diane; Correnti, Richard; Wang, Elaine; Matsumura, Lindsay Clare
2017-01-01
This paper presents an investigation of score prediction based on natural language processing for two targeted constructs within analytic text-based writing: 1) students' effective use of evidence and, 2) their organization of ideas and evidence in support of their claim. With the long-term goal of producing feedback for students and teachers, we…
An expert system for natural language processing
NASA Technical Reports Server (NTRS)
Hennessy, John F.
1988-01-01
A solution to the natural language processing problem that uses a rule based system, written in OPS5, to replace the traditional parsing method is proposed. The advantage to using a rule based system are explored. Specifically, the extensibility of a rule based solution is discussed as well as the value of maintaining rules that function independently. Finally, the power of using semantics to supplement the syntactic analysis of a sentence is considered.
A Large-Scale Analysis of Variance in Written Language.
Johns, Brendan T; Jamieson, Randall K
2018-01-22
The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing. Copyright © 2018 Cognitive Science Society, Inc.
Generating structure from experience: A retrieval-based model of language processing.
Johns, Brendan T; Jones, Michael N
2015-09-01
Standard theories of language generally assume that some abstraction of linguistic input is necessary to create higher level representations of linguistic structures (e.g., a grammar). However, the importance of individual experiences with language has recently been emphasized by both usage-based theories (Tomasello, 2003) and grounded and situated theories (e.g., Zwaan & Madden, 2005). Following the usage-based approach, we present a formal exemplar model that stores instances of sentences across a natural language corpus, applying recent advances from models of semantic memory. In this model, an exemplar memory is used to generate expectations about the future structure of sentences, using a mechanism for prediction in language processing (Altmann & Mirković, 2009). The model successfully captures a broad range of behavioral effects-reduced relative clause processing (Reali & Christiansen, 2007), the role of contextual constraint (Rayner & Well, 1996), and event knowledge activation (Ferretti, Kutas, & McRae, 2007), among others. We further demonstrate how perceptual knowledge could be integrated into this exemplar-based framework, with the goal of grounding language processing in perception. Finally, we illustrate how an exemplar memory system could have been used in the cultural evolution of language. The model provides evidence that an impressive amount of language processing may be bottom-up in nature, built on the storage and retrieval of individual linguistic experiences. (c) 2015 APA, all rights reserved).
ERIC Educational Resources Information Center
Chowdhury, Gobinda G.
2003-01-01
Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…
Natural language processing-based COTS software and related technologies survey.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stickland, Michael G.; Conrad, Gregory N.; Eaton, Shelley M.
Natural language processing-based knowledge management software, traditionally developed for security organizations, is now becoming commercially available. An informal survey was conducted to discover and examine current NLP and related technologies and potential applications for information retrieval, information extraction, summarization, categorization, terminology management, link analysis, and visualization for possible implementation at Sandia National Laboratories. This report documents our current understanding of the technologies, lists software vendors and their products, and identifies potential applications of these technologies.
CPP-TRS(C): On using visual cognitive symbols to enhance communication effectiveness
NASA Technical Reports Server (NTRS)
Tonfoni, Graziella
1994-01-01
Communicative Positioning Program/Text Representation Systems (CPP-TRS) is a visual language based on a system of 12 canvasses, 10 signals and 14 symbols. CPP-TRS is based on the fact that every communication action is the result of a set of cognitive processes and the whole system is based on the concept that you can enhance communication by visually perceiving text. With a simple syntax, CPP-TRS is capable of representing meaning and intention as well as communication functions visually. Those are precisely invisible aspects of natural language that are most relevant to getting the global meaning of a text. CPP-TRS reinforces natural language in human machine interaction systems. It complements natural language by adding certain important elements that are not represented by natural language by itself. These include communication intention and function of the text expressed by the sender, as well as the role the reader is supposed to play. The communication intention and function of a text and the reader's role are invisible in natural language because neither specific words nor punctuation conveys them sufficiently and unambiguously; they are therefore non-transparent.
A Qualitative Analysis Framework Using Natural Language Processing and Graph Theory
ERIC Educational Resources Information Center
Tierney, Patrick J.
2012-01-01
This paper introduces a method of extending natural language-based processing of qualitative data analysis with the use of a very quantitative tool--graph theory. It is not an attempt to convert qualitative research to a positivist approach with a mathematical black box, nor is it a "graphical solution". Rather, it is a method to help qualitative…
2012-01-01
Background We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. Results Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. Conclusions The finding that some systems were able to train high-performing models based on this corpus is additional evidence, beyond high inter-annotator agreement, that the quality of the CRAFT corpus is high. The overall poor performance of various systems indicates that considerable work needs to be done to enable natural language processing systems to work well when the input is full-text journal articles. The CRAFT corpus provides a valuable resource to the biomedical natural language processing community for evaluation and training of new models for biomedical full text publications. PMID:22901054
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro; Tanaka-Ishii, Kumiko
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks. PMID:29287076
ERIC Educational Resources Information Center
Liou, Hsien-Chin; Chang, Jason S; Chen, Hao-Jan; Lin, Chih-Cheng; Liaw, Meei-Ling; Gao, Zhao-Ming; Jang, Jyh-Shing Roger; Yeh, Yuli; Chuang, Thomas C.; You, Geeng-Neng
2006-01-01
This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus ("Sinorama") and various natural language processing (NLP) tools to construct effective English…
Jednoróg, Katarzyna; Bola, Łukasz; Mostowski, Piotr; Szwed, Marcin; Boguszewski, Paweł M; Marchewka, Artur; Rutkowski, Paweł
2015-05-01
In several countries natural sign languages were considered inadequate for education. Instead, new sign-supported systems were created, based on the belief that spoken/written language is grammatically superior. One such system called SJM (system językowo-migowy) preserves the grammatical and lexical structure of spoken Polish and since 1960s has been extensively employed in schools and on TV. Nevertheless, the Deaf community avoids using SJM for everyday communication, its preferred language being PJM (polski język migowy), a natural sign language, structurally and grammatically independent of spoken Polish and featuring classifier constructions (CCs). Here, for the first time, we compare, with fMRI method, the neural bases of natural vs. devised communication systems. Deaf signers were presented with three types of signed sentences (SJM and PJM with/without CCs). Consistent with previous findings, PJM with CCs compared to either SJM or PJM without CCs recruited the parietal lobes. The reverse comparison revealed activation in the anterior temporal lobes, suggesting increased semantic combinatory processes in lexical sign comprehension. Finally, PJM compared with SJM engaged left posterior superior temporal gyrus and anterior temporal lobe, areas crucial for sentence-level speech comprehension. We suggest that activity in these two areas reflects greater processing efficiency for naturally evolved sign language. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Vlas, Radu Eduard
2012-01-01
Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects,…
Automatic Selection of Suitable Sentences for Language Learning Exercises
ERIC Educational Resources Information Center
Pilán, Ildikó; Volodina, Elena; Johansson, Richard
2013-01-01
In our study we investigated second and foreign language (L2) sentence readability, an area little explored so far in the case of several languages, including Swedish. The outcome of our research consists of two methods for sentence selection from native language corpora based on Natural Language Processing (NLP) and machine learning (ML)…
An ontology model for nursing narratives with natural language generation technology.
Min, Yul Ha; Park, Hyeoun-Ae; Jeon, Eunjoo; Lee, Joo Yun; Jo, Soo Jung
2013-01-01
The purpose of this study was to develop an ontology model to generate nursing narratives as natural as human language from the entity-attribute-value triplets of a detailed clinical model using natural language generation technology. The model was based on the types of information and documentation time of the information along the nursing process. The typesof information are data characterizing the patient status, inferences made by the nurse from the patient data, and nursing actions selected by the nurse to change the patient status. This information was linked to the nursing process based on the time of documentation. We describe a case study illustrating the application of this model in an acute-care setting. The proposed model provides a strategy for designing an electronic nursing record system.
ERIC Educational Resources Information Center
Harbusch, Karin; Itsova, Gergana; Koch, Ulrich; Kuhner, Christine
2009-01-01
We built a natural language processing (NLP) system implementing a "virtual writing conference" for elementary-school children, with German as the target language. Currently, state-of-the-art computer support for writing tasks is restricted to multiple-choice questions or quizzes because automatic parsing of the often ambiguous and fragmentary…
Reconciliation of ontology and terminology to cope with linguistics.
Baud, Robert H; Ceusters, Werner; Ruch, Patrick; Rassinoux, Anne-Marie; Lovis, Christian; Geissbühler, Antoine
2007-01-01
To discuss the relationships between ontologies, terminologies and language in the context of Natural Language Processing (NLP) applications in order to show the negative consequences of confusing them. The viewpoints of the terminologist and (computational) linguist are developed separately, and then compared, leading to the presentation of reconciliation among these points of view, with consideration of the role of the ontologist. In order to encourage appropriate usage of terminologies, guidelines are presented advocating the simultaneous publication of pragmatic vocabularies supported by terminological material based on adequate ontological analysis. Ontologies, terminologies and natural languages each have their own purpose. Ontologies support machine understanding, natural languages support human communication, and terminologies should form the bridge between them. Therefore, future terminology standards should be based on sound ontology and do justice to the diversities in natural languages. Moreover, they should support local vocabularies, in order to be easily adaptable to local needs and practices.
Natural Language Processing and Game-Based Practice in iSTART
ERIC Educational Resources Information Center
Jackson, G. Tanner; Boonthum-Denecke, Chutima; McNamara, Danielle S.
2015-01-01
Intelligent Tutoring Systems (ITSs) are situated in a potential struggle between effective pedagogy and system enjoyment and engagement. iSTART, a reading strategy tutoring system in which students practice generating self-explanations and using reading strategies, employs two devices to engage the user. The first is natural language processing…
Linguistically Motivated Features for CCG Realization Ranking
ERIC Educational Resources Information Center
Rajkumar, Rajakrishnan
2012-01-01
Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface…
2017-03-01
Warfare. 14. SUBJECT TERMS data mining, natural language processing, machine learning, algorithm design , information warfare, propaganda 15. NUMBER OF...Speech Tags. Adapted from [12]. CC Coordinating conjunction PRP$ Possessive pronoun CD Cardinal number RB Adverb DT Determiner RBR Adverb, comparative ... comparative UH Interjection JJS Adjective, superlative VB Verb, base form LS List item marker VBD Verb, past tense MD Modal VBG Verb, gerund or
ERIC Educational Resources Information Center
Vilaseca, R.M.; Del Rio, M-J.
2004-01-01
Many child language studies emphasize the value of verbal and social support, of 'scaffolding' processes and mutual adjustments that naturally occur in adult-child interactions in everyday contexts. Based on such theories, this study attempted to improve the language and communication skills in children with special educational needs through…
Natural Language Processing: Toward Large-Scale, Robust Systems.
ERIC Educational Resources Information Center
Haas, Stephanie W.
1996-01-01
Natural language processing (NLP) is concerned with getting computers to do useful things with natural language. Major applications include machine translation, text generation, information retrieval, and natural language interfaces. Reviews important developments since 1987 that have led to advances in NLP; current NLP applications; and problems…
ERIC Educational Resources Information Center
Snyder, Robin M.
2015-01-01
In 2014, in conjunction with doing research in natural language processing and attending a global conference on computational linguistics, the author decided to learn a new foreign language, Greek, that uses a non-English character set. This paper/session will present/discuss an overview of the current state of natural language processing and…
Using natural language processing techniques to inform research on nanotechnology.
Lewinski, Nastassja A; McInnes, Bridget T
2015-01-01
Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics.
ERIC Educational Resources Information Center
Klein, Ariel; Badia, Toni
2015-01-01
In this study we show how complex creative relations can arise from fairly frequent semantic relations observed in everyday language. By doing this, we reflect on some key cognitive aspects of linguistic and general creativity. In our experimentation, we automated the process of solving a battery of Remote Associates Test tasks. By applying…
Survey of Natural Language Processing Techniques in Bioinformatics.
Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling
2015-01-01
Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.
Khalifa, Abdulrahman; Meystre, Stéphane
2015-12-01
The 2014 i2b2 natural language processing shared task focused on identifying cardiovascular risk factors such as high blood pressure, high cholesterol levels, obesity and smoking status among other factors found in health records of diabetic patients. In addition, the task involved detecting medications, and time information associated with the extracted data. This paper presents the development and evaluation of a natural language processing (NLP) application conceived for this i2b2 shared task. For increased efficiency, the application main components were adapted from two existing NLP tools implemented in the Apache UIMA framework: Textractor (for dictionary-based lookup) and cTAKES (for preprocessing and smoking status detection). The application achieved a final (micro-averaged) F1-measure of 87.5% on the final evaluation test set. Our attempt was mostly based on existing tools adapted with minimal changes and allowed for satisfying performance with limited development efforts. Copyright © 2015 Elsevier Inc. All rights reserved.
Trombert-Paviot, B; Rodrigues, J M; Rogers, J E; Baud, R; van der Haring, E; Rassinoux, A M; Abrial, V; Clavel, L; Idir, H
2000-09-01
Generalised architecture for languages, encyclopedia and nomenclatures in medicine (GALEN) has developed a new generation of terminology tools based on a language independent model describing the semantics and allowing computer processing and multiple reuses as well as natural language understanding systems applications to facilitate the sharing and maintaining of consistent medical knowledge. During the European Union 4 Th. framework program project GALEN-IN-USE and later on within two contracts with the national health authorities we applied the modelling and the tools to the development of a new multipurpose coding system for surgical procedures named CCAM in a minority language country, France. On one hand, we contributed to a language independent knowledge repository and multilingual semantic dictionaries for multicultural Europe. On the other hand, we support the traditional process for creating a new coding system in medicine which is very much labour consuming by artificial intelligence tools using a medically oriented recursive ontology and natural language processing. We used an integrated software named CLAW (for classification workbench) to process French professional medical language rubrics produced by the national colleges of surgeons domain experts into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation, on one hand, we generate with the LNAT natural language generator controlled French natural language to support the finalization of the linguistic labels (first generation) in relation with the meanings of the conceptual system structure. On the other hand, the Claw classification manager proves to be very powerful to retrieve the initial domain experts rubrics list with different categories of concepts (second generation) within a semantic structured representation (third generation) bridge to the electronic patient record detailed terminology.
Natural language information retrieval in digital libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.
In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less
Laboratory process control using natural language commands from a personal computer
NASA Technical Reports Server (NTRS)
Will, Herbert A.; Mackin, Michael A.
1989-01-01
PC software is described which provides flexible natural language process control capability with an IBM PC or compatible machine. Hardware requirements include the PC, and suitable hardware interfaces to all controlled devices. Software required includes the Microsoft Disk Operating System (MS-DOS) operating system, a PC-based FORTRAN-77 compiler, and user-written device drivers. Instructions for use of the software are given as well as a description of an application of the system.
Using natural language processing techniques to inform research on nanotechnology
Lewinski, Nastassja A
2015-01-01
Summary Literature in the field of nanotechnology is exponentially increasing with more and more engineered nanomaterials being created, characterized, and tested for performance and safety. With the deluge of published data, there is a need for natural language processing approaches to semi-automate the cataloguing of engineered nanomaterials and their associated physico-chemical properties, performance, exposure scenarios, and biological effects. In this paper, we review the different informatics methods that have been applied to patent mining, nanomaterial/device characterization, nanomedicine, and environmental risk assessment. Nine natural language processing (NLP)-based tools were identified: NanoPort, NanoMapper, TechPerceptor, a Text Mining Framework, a Nanodevice Analyzer, a Clinical Trial Document Classifier, Nanotoxicity Searcher, NanoSifter, and NEIMiner. We conclude with recommendations for sharing NLP-related tools through online repositories to broaden participation in nanoinformatics. PMID:26199848
TOWARDS A MULTI-SCALE AGENT-BASED PROGRAMMING LANGUAGE METHODOLOGY
Somogyi, Endre; Hagar, Amit; Glazier, James A.
2017-01-01
Living tissues are dynamic, heterogeneous compositions of objects, including molecules, cells and extra-cellular materials, which interact via chemical, mechanical and electrical process and reorganize via transformation, birth, death and migration processes. Current programming language have difficulty describing the dynamics of tissues because: 1: Dynamic sets of objects participate simultaneously in multiple processes, 2: Processes may be either continuous or discrete, and their activity may be conditional, 3: Objects and processes form complex, heterogeneous relationships and structures, 4: Objects and processes may be hierarchically composed, 5: Processes may create, destroy and transform objects and processes. Some modeling languages support these concepts, but most cannot translate models into executable simulations. We present a new hybrid executable modeling language paradigm, the Continuous Concurrent Object Process Methodology (CCOPM) which naturally expresses tissue models, enabling users to visually create agent-based models of tissues, and also allows computer simulation of these models. PMID:29282379
TOWARDS A MULTI-SCALE AGENT-BASED PROGRAMMING LANGUAGE METHODOLOGY.
Somogyi, Endre; Hagar, Amit; Glazier, James A
2016-12-01
Living tissues are dynamic, heterogeneous compositions of objects , including molecules, cells and extra-cellular materials, which interact via chemical, mechanical and electrical process and reorganize via transformation, birth, death and migration processes . Current programming language have difficulty describing the dynamics of tissues because: 1: Dynamic sets of objects participate simultaneously in multiple processes, 2: Processes may be either continuous or discrete, and their activity may be conditional, 3: Objects and processes form complex, heterogeneous relationships and structures, 4: Objects and processes may be hierarchically composed, 5: Processes may create, destroy and transform objects and processes. Some modeling languages support these concepts, but most cannot translate models into executable simulations. We present a new hybrid executable modeling language paradigm, the Continuous Concurrent Object Process Methodology ( CCOPM ) which naturally expresses tissue models, enabling users to visually create agent-based models of tissues, and also allows computer simulation of these models.
The Function of Semantics in Automated Language Processing.
ERIC Educational Resources Information Center
Pacak, Milos; Pratt, Arnold W.
This paper is a survey of some of the major semantic models that have been developed for automated semantic analysis of natural language. Current approaches to semantic analysis and logical interference are based mainly on models of human cognitive processes such as Quillian's semantic memory, Simmon's Protosynthex III and others. All existing…
Zheng, Kai; Mei, Qiaozhu; Yang, Lei; Manion, Frank J.; Balis, Ulysses J.; Hanauer, David A.
2011-01-01
In this study, we comparatively examined the linguistic properties of narrative clinician notes created through voice dictation versus those directly entered by clinicians via a computer keyboard. Intuitively, the nature of voice-dictated notes would resemble that of natural language, while typed-in notes may demonstrate distinctive language features for reasons such as intensive usage of acronyms. The study analyses were based on an empirical dataset retrieved from our institutional electronic health records system. The dataset contains 30,000 voice-dictated notes and 30,000 notes that were entered manually; both were encounter notes generated in ambulatory care settings. The results suggest that between the narrative clinician notes created via these two different methods, there exists a considerable amount of lexical and distributional differences. Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated. PMID:22195229
Language Analysis Package (L.A.P.) Version I System Design.
ERIC Educational Resources Information Center
Porch, Ann
To permit researchers to use the speed and versatility of the computer to process natural language text as well as numerical data without undergoing special training in programing or computer operations, a language analysis package has been developed partially based on several existing programs. An overview of the design is provided and system…
Enhancing Grammatical Structures in Web-Based Texts
ERIC Educational Resources Information Center
Zilio, Leonardo; Wilkens, Rodrigo; Fairon, Cédrick
2017-01-01
Presentation of raw text to language learners is not enough to ensure learning. Thus, we present the Smart and Immersive Language Learning Environment (SMILLE), a system that uses Natural Language Processing (NLP) for enhancing grammatical information in texts chosen by a given user. The enhancements, carried out by means of text highlighting, are…
A Morphological Analyzer for Vocalized or Not Vocalized Arabic Language
NASA Astrophysics Data System (ADS)
El Amine Abderrahim, Med; Breksi Reguig, Fethi
This research has been to show the realization of a morphological analyzer of the Arabic language (vocalized or not vocalized). This analyzer is based upon our object model for the Arabic Natural Language Processing (NLP) and can be exploited by NLP applications such as translation machine, orthographical correction and the search for information.
The "Anchor" Method: Principle and Practice.
ERIC Educational Resources Information Center
Selgin, Paul
This report discusses the "anchor" language learning method that is based upon derivation rather than construction, using Italian as an example of a language to be learned. This method borrows from the natural process of language learning as it asks the student to remember whole expressions that serve as vehicles for learning both words…
The semantic web and computer vision: old AI meets new AI
NASA Astrophysics Data System (ADS)
Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.
2018-04-01
There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.
NASA Technical Reports Server (NTRS)
Gomez, Fernando
1989-01-01
It is shown how certain kinds of domain independent expert systems based on classification problem-solving methods can be constructed directly from natural language descriptions by a human expert. The expert knowledge is not translated into production rules. Rather, it is mapped into conceptual structures which are integrated into long-term memory (LTM). The resulting system is one in which problem-solving, retrieval and memory organization are integrated processes. In other words, the same algorithm and knowledge representation structures are shared by these processes. As a result of this, the system can answer questions, solve problems or reorganize LTM.
Efficient Caption-Based Retrieval of Multimedia Information
1993-10-09
in the design of transportable natural language interfaces. Artifcial Intelligence , 32 (1987), 173-243. - 13- (101 Jones, M. and Eisner, J. A...systems for multimedia data . They exploit captions on the data and perform natural-language processing of them and English retrieval requests. Some...content analysis of the data is also performed to obtain additional descriptive information. The key to getting this approach to work is sufficiently
A Cognitive Neural Architecture Able to Learn and Communicate through Natural Language.
Golosio, Bruno; Cangelosi, Angelo; Gamotina, Olesya; Masala, Giovanni Luca
2015-01-01
Communicative interactions involve a kind of procedural knowledge that is used by the human brain for processing verbal and nonverbal inputs and for language production. Although considerable work has been done on modeling human language abilities, it has been difficult to bring them together to a comprehensive tabula rasa system compatible with current knowledge of how verbal information is processed in the brain. This work presents a cognitive system, entirely based on a large-scale neural architecture, which was developed to shed light on the procedural knowledge involved in language elaboration. The main component of this system is the central executive, which is a supervising system that coordinates the other components of the working memory. In our model, the central executive is a neural network that takes as input the neural activation states of the short-term memory and yields as output mental actions, which control the flow of information among the working memory components through neural gating mechanisms. The proposed system is capable of learning to communicate through natural language starting from tabula rasa, without any a priori knowledge of the structure of phrases, meaning of words, role of the different classes of words, only by interacting with a human through a text-based interface, using an open-ended incremental learning process. It is able to learn nouns, verbs, adjectives, pronouns and other word classes, and to use them in expressive language. The model was validated on a corpus of 1587 input sentences, based on literature on early language assessment, at the level of about 4-years old child, and produced 521 output sentences, expressing a broad range of language processing functionalities.
NASA Astrophysics Data System (ADS)
Rustamov, Samir; Mustafayev, Elshan; Clements, Mark A.
2018-04-01
The context analysis of customer requests in a natural language call routing problem is investigated in the paper. One of the most significant problems in natural language call routing is a comprehension of client request. With the aim of finding a solution to this issue, the Hybrid HMM and ANFIS models become a subject to an examination. Combining different types of models (ANFIS and HMM) can prevent misunderstanding by the system for identification of user intention in dialogue system. Based on these models, the hybrid system may be employed in various language and call routing domains due to nonusage of lexical or syntactic analysis in classification process.
The KIT Motion-Language Dataset.
Plappert, Matthias; Mandery, Christian; Asfour, Tamim
2016-12-01
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.
Integrated Intelligence: Robot Instruction via Interactive Grounded Learning
2016-02-14
ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Robotics; Natural Language Processing ; Grounded Language ...Logical Forms for Referring Expression Generation, Emperical Methods in Natural Language Processing (EMNLP). 18-OCT-13, . : , Tom Kwiatkowska, Eunsol...Choi, Yoav Artzi, Luke Zettlemoyer. Scaling Semantic Parsers with On-the-fly Ontology Matching, Emperical Methods in Natural Langauge Processing
ERIC Educational Resources Information Center
Russell, Dale W.
An obstacle in Natural Language understanding is the existence of lexical gaps, i.e. words or word senses that are not in the lexicon of the system. This thesis describes the implementation of MURRAY, a learning mechanism which infers the properties of a new lexical item from its syntactical environment and infers its meaning based on context and…
Sharma, Vivekanand; Law, Wayne; Balick, Michael J; Sarkar, Indra Neil
2017-01-01
The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement.
Sharma, Vivekanand; Law, Wayne; Balick, Michael J.; Sarkar, Indra Neil
2017-01-01
The growing amount of data describing historical medicinal uses of plants from digitization efforts provides the opportunity to develop systematic approaches for identifying potential plant-based therapies. However, the task of cataloguing plant use information from natural language text is a challenging task for ethnobotanists. To date, there have been only limited adoption of informatics approaches used for supporting the identification of ethnobotanical information associated with medicinal uses. This study explored the feasibility of using biomedical terminologies and natural language processing approaches for extracting relevant plant-associated therapeutic use information from historical biodiversity literature collection available from the Biodiversity Heritage Library. The results from this preliminary study suggest that there is potential utility of informatics methods to identify medicinal plant knowledge from digitized resources as well as highlight opportunities for improvement. PMID:29854223
Automated speech understanding: the next generation
NASA Astrophysics Data System (ADS)
Picone, J.; Ebel, W. J.; Deshmukh, N.
1995-04-01
Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
Neural Network Computing and Natural Language Processing.
ERIC Educational Resources Information Center
Borchardt, Frank
1988-01-01
Considers the application of neural network concepts to traditional natural language processing and demonstrates that neural network computing architecture can: (1) learn from actual spoken language; (2) observe rules of pronunciation; and (3) reproduce sounds from the patterns derived by its own processes. (Author/CB)
Text Information Extraction System (TIES) | Informatics Technology for Cancer Research (ITCR)
TIES is a service based software system for acquiring, deidentifying, and processing clinical text reports using natural language processing, and also for querying, sharing and using this data to foster tissue and image based research, within and between institutions.
Danforth, Kim N; Early, Megan I; Ngan, Sharon; Kosco, Anne E; Zheng, Chengyi; Gould, Michael K
2012-08-01
Lung nodules are commonly encountered in clinical practice, yet little is known about their management in community settings. An automated method for identifying patients with lung nodules would greatly facilitate research in this area. Using members of a large, community-based health plan from 2006 to 2010, we developed a method to identify patients with lung nodules, by combining five diagnostic codes, four procedural codes, and a natural language processing algorithm that performed free text searches of radiology transcripts. An experienced pulmonologist reviewed a random sample of 116 radiology transcripts, providing a reference standard for the natural language processing algorithm. With the use of an automated method, we identified 7112 unique members as having one or more incident lung nodules. The mean age of the patients was 65 years (standard deviation 14 years). There were slightly more women (54%) than men, and Hispanics and non-whites comprised 45% of the lung nodule cohort. Thirty-six percent were never smokers whereas 11% were current smokers. Fourteen percent of the patients were subsequently diagnosed with lung cancer. The sensitivity and specificity of the natural language processing algorithm for identifying the presence of lung nodules were 96% and 86%, respectively, compared with clinician review. Among the true positive transcripts in the validation sample, only 35% were solitary and unaccompanied by one or more associated findings, and 56% measured 8 to 30 mm in diameter. A combination of diagnostic codes, procedural codes, and a natural language processing algorithm for free text searching of radiology reports can accurately and efficiently identify patients with incident lung nodules, many of whom are subsequently diagnosed with lung cancer.
Policy Process Editor for P3BM Software
NASA Technical Reports Server (NTRS)
James, Mark; Chang, Hsin-Ping; Chow, Edward T.; Crichton, Gerald A.
2010-01-01
A computer program enables generation, in the form of graphical representations of process flows with embedded natural-language policy statements, input to a suite of policy-, process-, and performance-based management (P3BM) software. This program (1) serves as an interface between users and the Hunter software, which translates the input into machine-readable form; and (2) enables users to initialize and monitor the policy-implementation process. This program provides an intuitive graphical interface for incorporating natural-language policy statements into business-process flow diagrams. Thus, the program enables users who dictate policies to intuitively embed their intended process flows as they state the policies, reducing the likelihood of errors and reducing the time between declaration and execution of policy.
Rodrigues, J M; Trombert-Paviot, B; Baud, R; Wagner, J; Meusnier-Carriot, F
1998-01-01
GALEN has developed a language independent common reference model based on a medically oriented ontology and practical tools and techniques for managing healthcare terminology including natural language processing. GALEN-IN-USE is the current phase which applied the modelling and the tools to the development or the updating of coding systems for surgical procedures in different national coding centers co-operating within the European Federation of Coding Centre (EFCC) to create a language independent knowledge repository for multicultural Europe. We used an integrated set of artificial intelligence terminology tools named CLAssification Manager workbench to process French professional medical language rubrics into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation we generate controlled French natural language. The French national coding centre is then able to retrieve the initial professional rubrics with different categories of concepts, to compare the professional language proposed by expert clinicians to the French generated controlled vocabulary and to finalize the linguistic labels of the coding system in relation with the meanings of the conceptual system structure.
Natural Language Processing in Game Studies Research: An Overview
ERIC Educational Resources Information Center
Zagal, Jose P.; Tomuro, Noriko; Shepitsen, Andriy
2012-01-01
Natural language processing (NLP) is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language as input and/or output. The authors propose that NLP can also be used for game studies research. In this article, the authors provide an overview of NLP and describe some research possibilities…
Neural correlates of fixation duration in natural reading: Evidence from fixation-related fMRI.
Henderson, John M; Choi, Wonil; Luke, Steven G; Desai, Rutvik H
2015-10-01
A key assumption of current theories of natural reading is that fixation duration reflects underlying attentional, language, and cognitive processes associated with text comprehension. The neurocognitive correlates of this relationship are currently unknown. To investigate this relationship, we compared neural activation associated with fixation duration in passage reading and a pseudo-reading control condition. The results showed that fixation duration was associated with activation in oculomotor and language areas during text reading. Fixation duration during pseudo-reading, on the other hand, showed greater involvement of frontal control regions, suggesting flexibility and task dependency of the eye movement network. Consistent with current models, these results provide support for the hypothesis that fixation duration in reading reflects attentional engagement and language processing. The results also demonstrate that fixation-related fMRI provides a method for investigating the neurocognitive bases of natural reading. Copyright © 2015 Elsevier Inc. All rights reserved.
Paradigms of Evaluation in Natural Language Processing: Field Linguistics for Glass Box Testing
ERIC Educational Resources Information Center
Cohen, Kevin Bretonnel
2010-01-01
Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of…
ERIC Educational Resources Information Center
Serna Dimas, Hector Manuel
2013-01-01
Literacy is one of the most fundamental processes in the life of people. It is complex enough when people develop these processes in their first language, and the nature of the task becomes even more challenging when it is developed with students in a second language within the context of a bilingual setting. Bilingual education has been based on…
Somogyi, Endre; Glazier, James A.
2017-01-01
Biological cells are the prototypical example of active matter. Cells sense and respond to mechanical, chemical and electrical environmental stimuli with a range of behaviors, including dynamic changes in morphology and mechanical properties, chemical uptake and secretion, cell differentiation, proliferation, death, and migration. Modeling and simulation of such dynamic phenomena poses a number of computational challenges. A modeling language describing cellular dynamics must naturally represent complex intra and extra-cellular spatial structures and coupled mechanical, chemical and electrical processes. Domain experts will find a modeling language most useful when it is based on concepts, terms and principles native to the problem domain. A compiler must then be able to generate an executable model from this physically motivated description. Finally, an executable model must efficiently calculate the time evolution of such dynamic and inhomogeneous phenomena. We present a spatial hybrid systems modeling language, compiler and mesh-free Lagrangian based simulation engine which will enable domain experts to define models using natural, biologically motivated constructs and to simulate time evolution of coupled cellular, mechanical and chemical processes acting on a time varying number of cells and their environment. PMID:29303160
Somogyi, Endre; Glazier, James A
2017-04-01
Biological cells are the prototypical example of active matter. Cells sense and respond to mechanical, chemical and electrical environmental stimuli with a range of behaviors, including dynamic changes in morphology and mechanical properties, chemical uptake and secretion, cell differentiation, proliferation, death, and migration. Modeling and simulation of such dynamic phenomena poses a number of computational challenges. A modeling language describing cellular dynamics must naturally represent complex intra and extra-cellular spatial structures and coupled mechanical, chemical and electrical processes. Domain experts will find a modeling language most useful when it is based on concepts, terms and principles native to the problem domain. A compiler must then be able to generate an executable model from this physically motivated description. Finally, an executable model must efficiently calculate the time evolution of such dynamic and inhomogeneous phenomena. We present a spatial hybrid systems modeling language, compiler and mesh-free Lagrangian based simulation engine which will enable domain experts to define models using natural, biologically motivated constructs and to simulate time evolution of coupled cellular, mechanical and chemical processes acting on a time varying number of cells and their environment.
Acquiring and processing verb argument structure: distributional learning in a miniature language.
Wonnacott, Elizabeth; Newport, Elissa L; Tanenhaus, Michael K
2008-05-01
Adult knowledge of a language involves correctly balancing lexically-based and more language-general patterns. For example, verb argument structures may sometimes readily generalize to new verbs, yet with particular verbs may resist generalization. From the perspective of acquisition, this creates significant learnability problems, with some researchers claiming a crucial role for verb semantics in the determination of when generalization may and may not occur. Similarly, there has been debate regarding how verb-specific and more generalized constraints interact in sentence processing and on the role of semantics in this process. The current work explores these issues using artificial language learning. In three experiments using languages without semantic cues to verb distribution, we demonstrate that learners can acquire both verb-specific and verb-general patterns, based on distributional information in the linguistic input regarding each of the verbs as well as across the language as a whole. As with natural languages, these factors are shown to affect production, judgments and real-time processing. We demonstrate that learners apply a rational procedure in determining their usage of these different input statistics and conclude by suggesting that a Bayesian perspective on statistical learning may be an appropriate framework for capturing our findings.
Trombert-Paviot, B; Rodrigues, J M; Rogers, J E; Baud, R; van der Haring, E; Rassinoux, A M; Abrial, V; Clavel, L; Idir, H
1999-01-01
GALEN has developed a new generation of terminology tools based on a language independent concept reference model using a compositional formalism allowing computer processing and multiple reuses. During the 4th framework program project Galen-In-Use we applied the modelling and the tools to the development of a new multipurpose coding system for surgical procedures (CCAM) in France. On one hand we contributed to a language independent knowledge repository for multicultural Europe. On the other hand we support the traditional process for creating a new coding system in medicine which is very much labour consuming by artificial intelligence tools using a medically oriented recursive ontology and natural language processing. We used an integrated software named CLAW to process French professional medical language rubrics produced by the national colleges of surgeons into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation on one hand we generate controlled French natural language to support the finalization of the linguistic labels in relation with the meanings of the conceptual system structure. On the other hand the classification manager of third generation proves to be very powerful to retrieve the initial professional rubrics with different categories of concepts within a semantic network.
NASA Astrophysics Data System (ADS)
Vasishth, Shravan
2017-07-01
This interesting and informative review by Liu and colleagues [17] in this issue covers the full spectrum of research on the idea that in natural language, dependency distance tends to be small. The authors discuss two distinct research threads: experimental work from psycholinguistics on online processes in comprehension and production, and text-corpus studies of dependency length distributions.
Integration of Speech and Natural Language
1988-04-01
major activities: • Development of the syntax and semantics components for natural language processing. • Integration of the developed syntax and...evaluating the performance of speech recognition algonthms developed K» under the Strategic Computing Program. grs Our work on natural language processing...included the developement of a grammar (syntax) that uses the Uiuficanon gnmmaj formaMsm (an augmented context free formalism). The Unification
Chen, W; Kowatch, R; Lin, S; Splaingard, M; Huang, Y
2015-01-01
Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.
Chen, W.; Kowatch, R.; Lin, S.; Splaingard, M.
2015-01-01
Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. PMID:26171080
Massaro, Dominic W
2012-01-01
I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally.
Modelling of internal architecture of kinesin nanomotor as a machine language.
Khataee, H R; Ibrahim, M Y
2012-09-01
Kinesin is a protein-based natural nanomotor that transports molecular cargoes within cells by walking along microtubules. Kinesin nanomotor is considered as a bio-nanoagent which is able to sense the cell through its sensors (i.e. its heads and tail), make the decision internally and perform actions on the cell through its actuator (i.e. its motor domain). The study maps the agent-based architectural model of internal decision-making process of kinesin nanomotor to a machine language using an automata algorithm. The applied automata algorithm receives the internal agent-based architectural model of kinesin nanomotor as a deterministic finite automaton (DFA) model and generates a regular machine language. The generated regular machine language was acceptable by the architectural DFA model of the nanomotor and also in good agreement with its natural behaviour. The internal agent-based architectural model of kinesin nanomotor indicates the degree of autonomy and intelligence of the nanomotor interactions with its cell. Thus, our developed regular machine language can model the degree of autonomy and intelligence of kinesin nanomotor interactions with its cell as a language. Modelling of internal architectures of autonomous and intelligent bio-nanosystems as machine languages can lay the foundation towards the concept of bio-nanoswarms and next phases of the bio-nanorobotic systems development.
Selecting the Best Mobile Information Service with Natural Language User Input
NASA Astrophysics Data System (ADS)
Feng, Qiangze; Qi, Hongwei; Fukushima, Toshikazu
Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75-95% accuracies and 85-95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.
NASA Astrophysics Data System (ADS)
Liu, Bingli; Chen, Xinying
2017-07-01
In the target article [1], Liu et al. provide an informative introduction to the dependency distance studies and proclaim that language syntactic patterns, that relate to the dependency distance, are associated with human cognitive mechanisms, such as limited working memory and syntax processing. Therefore, such syntactic patterns are probably 'human-driven' language universals. Sufficient evidence based on big data analysis is also given in the article for supporting this idea. The hypotheses generally seem very convincing yet still need further tests from various perspectives. Diachronic linguistic study based on authentic language data, on our opinion, can be one of those 'further tests'.
A grammar-based semantic similarity algorithm for natural language sentences.
Lee, Ming Che; Chang, Jia Wei; Hsieh, Tung Cheng
2014-01-01
This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
Perceptual Decoding Processes for Language in a Visual Mode and for Language in an Auditory Mode.
ERIC Educational Resources Information Center
Myerson, Rosemarie Farkas
The purpose of this paper is to gain insight into the nature of the reading process through an understanding of the general nature of sensory processing mechanisms which reorganize and restructure input signals for central recognition, and an understanding of how the grammar of the language functions in defining the set of possible sentences in…
The neurobiology of syntax: beyond string sets.
Petersson, Karl Magnus; Hagoort, Peter
2012-07-19
The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty.
The neurobiology of syntax: beyond string sets
Petersson, Karl Magnus; Hagoort, Peter
2012-01-01
The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty. PMID:22688633
The crustal dynamics intelligent user interface anthology
NASA Technical Reports Server (NTRS)
Short, Nicholas M., Jr.; Campbell, William J.; Roelofs, Larry H.; Wattawa, Scott L.
1987-01-01
The National Space Science Data Center (NSSDC) has initiated an Intelligent Data Management (IDM) research effort which has, as one of its components, the development of an Intelligent User Interface (IUI). The intent of the IUI is to develop a friendly and intelligent user interface service based on expert systems and natural language processing technologies. The purpose of such a service is to support the large number of potential scientific and engineering users that have need of space and land-related research and technical data, but have little or no experience in query languages or understanding of the information content or architecture of the databases of interest. This document presents the design concepts, development approach and evaluation of the performance of a prototype IUI system for the Crustal Dynamics Project Database, which was developed using a microcomputer-based expert system tool (M. 1), the natural language query processor THEMIS, and the graphics software system GSS. The IUI design is based on a multiple view representation of a database from both the user and database perspective, with intelligent processes to translate between the views.
Natural language processing and the Now-or-Never bottleneck.
Gómez-Rodríguez, Carlos
2016-01-01
Researchers, motivated by the need to improve the efficiency of natural language processing tools to handle web-scale data, have recently arrived at models that remarkably match the expected features of human language processing under the Now-or-Never bottleneck framework. This provides additional support for said framework and highlights the research potential in the interaction between applied computational linguistics and cognitive science.
Development and Evaluation of a Thai Learning System on the Web Using Natural Language Processing.
ERIC Educational Resources Information Center
Dansuwan, Suyada; Nishina, Kikuko; Akahori, Kanji; Shimizu, Yasutaka
2001-01-01
Describes the Thai Learning System, which is designed to help learners acquire the Thai word order system. The system facilitates the lessons on the Web using HyperText Markup Language and Perl programming, which interfaces with natural language processing by means of Prolog. (Author/VWL)
Automatic Item Generation via Frame Semantics: Natural Language Generation of Math Word Problems.
ERIC Educational Resources Information Center
Deane, Paul; Sheehan, Kathleen
This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…
DOE Office of Scientific and Technical Information (OSTI.GOV)
McHale, M.L.
The field of artificial Intelligence strives to produce computer programs that exhibit intelligent behavior. One of the areas of interest is the processing of natural language. This report discusses the role of the computer language PROLOG in Natural Language Processing (NLP) both from theoretic and pragmatic viewpoints. The reasons for using PROLOG for NLP are numerous. First, linguists can write natural-language grammars almost directly as PROLOG programs; this allows fast-prototyping of NLP systems and facilitates analysis of NLP theories. Second, semantic representations of natural-language texts that use logic formalisms are readily produced in PROLOG because of PROLOG's logical foundations. Third,more » PROLOG's built-in inferencing mechanisms are often sufficient for inferences on the logical forms produced by NLPs. Fourth, the logical, declarative nature of PROLOG may make it the language of choice for parallel computing systems. Finally, the fact that PROLOG has a de facto standard (Edinburgh) makes the porting of code from one computer system to another virtually trouble free. Perhaps the strongest tie one could make between NLP and PROLOG was stated by John Stuart Mill in his inaugural Address at St. Andrews: The structure of every sentence is a lesson in logic.« less
Natural Resource Information System, design analysis
NASA Technical Reports Server (NTRS)
1972-01-01
The computer-based system stores, processes, and displays map data relating to natural resources. The system was designed on the basis of requirements established in a user survey and an analysis of decision flow. The design analysis effort is described, and the rationale behind major design decisions, including map processing, cell vs. polygon, choice of classification systems, mapping accuracy, system hardware, and software language is summarized.
CE-SAM: a conversational interface for ISR mission support
NASA Astrophysics Data System (ADS)
Pizzocaro, Diego; Parizas, Christos; Preece, Alun; Braines, Dave; Mott, David; Bakdash, Jonathan Z.
2013-05-01
There is considerable interest in natural language conversational interfaces. These allow for complex user interactions with systems, such as fulfilling information requirements in dynamic environments, without requiring extensive training or a technical background (e.g. in formal query languages or schemas). To leverage the advantages of conversational interactions we propose CE-SAM (Controlled English Sensor Assignment to Missions), a system that guides users through refining and satisfying their information needs in the context of Intelligence, Surveillance, and Reconnaissance (ISR) operations. The rapidly-increasing availability of sensing assets and other information sources poses substantial challenges to effective ISR resource management. In a coalition context, the problem is even more complex, because assets may be "owned" by different partners. We show how CE-SAM allows a user to refine and relate their ISR information needs to pre-existing concepts in an ISR knowledge base, via conversational interaction implemented on a tablet device. The knowledge base is represented using Controlled English (CE) - a form of controlled natural language that is both human-readable and machine processable (i.e. can be used to implement automated reasoning). Users interact with the CE-SAM conversational interface using natural language, which the system converts to CE for feeding-back to the user for confirmation (e.g. to reduce misunderstanding). We show that this process not only allows users to access the assets that can support their mission needs, but also assists them in extending the CE knowledge base with new concepts.
NASA Technical Reports Server (NTRS)
2001-01-01
In this contract, which is a component of a larger contract that we plan to submit in the coming months, we plan to study the preprocessing issues which arise in applying natural language processing techniques to NASA-KSC problem reports. The goals of this work will be to deal with the issues of: a) automatically obtaining the problem reports from NASA-KSC data bases, b) the format of these reports and c) the conversion of these reports to a format that will be adequate for our natural language software. At the end of this contract, we expect that these problems will be solved and that we will be ready to apply our natural language software to a text database of over 1000 KSC problem reports.
Semantic biomedical resource discovery: a Natural Language Processing framework.
Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis
2015-09-30
A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.
Clinical Natural Language Processing in languages other than English: opportunities and challenges.
Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre
2018-03-30
Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Dutta, Sayon; Long, William J; Brown, David F M; Reisner, Andrew T
2013-08-01
As use of radiology studies increases, there is a concurrent increase in incidental findings (eg, lung nodules) for which the radiologist issues recommendations for additional imaging for follow-up. Busy emergency physicians may be challenged to carefully communicate recommendations for additional imaging not relevant to the patient's primary evaluation. The emergence of electronic health records and natural language processing algorithms may help address this quality gap. We seek to describe recommendations for additional imaging from our institution and develop and validate an automated natural language processing algorithm to reliably identify recommendations for additional imaging. We developed a natural language processing algorithm to detect recommendations for additional imaging, using 3 iterative cycles of training and validation. The third cycle used 3,235 radiology reports (1,600 for algorithm training and 1,635 for validation) of discharged emergency department (ED) patients from which we determined the incidence of discharge-relevant recommendations for additional imaging and the frequency of appropriate discharge documentation. The test characteristics of the 3 natural language processing algorithm iterations were compared, using blinded chart review as the criterion standard. Discharge-relevant recommendations for additional imaging were found in 4.5% (95% confidence interval [CI] 3.5% to 5.5%) of ED radiology reports, but 51% (95% CI 43% to 59%) of discharge instructions failed to note those findings. The final natural language processing algorithm had 89% (95% CI 82% to 94%) sensitivity and 98% (95% CI 97% to 98%) specificity for detecting recommendations for additional imaging. For discharge-relevant recommendations for additional imaging, sensitivity improved to 97% (95% CI 89% to 100%). Recommendations for additional imaging are common, and failure to document relevant recommendations for additional imaging in ED discharge instructions occurs frequently. The natural language processing algorithm's performance improved with each iteration and offers a promising error-prevention tool. Copyright © 2013 American College of Emergency Physicians. Published by Mosby, Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azevedo, S.G.; Fitch, J.P.
1987-10-21
Conventional software interfaces that use imperative computer commands or menu interactions are often restrictive environments when used for researching new algorithms or analyzing processed experimental data. We found this to be true with current signal-processing software (SIG). As an alternative, ''functional language'' interfaces provide features such as command nesting for a more natural interaction with the data. The Image and Signal LISP Environment (ISLE) is an example of an interpreted functional language interface based on common LISP. Advantages of ISLE include multidimensional and multiple data-type independence through dispatching functions, dynamic loading of new functions, and connections to artificial intelligence (AI)more » software. 10 refs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azevedo, S.G.; Fitch, J.P.
1987-05-01
Conventional software interfaces which utilize imperative computer commands or menu interactions are often restrictive environments when used for researching new algorithms or analyzing processed experimental data. We found this to be true with current signal processing software (SIG). Existing ''functional language'' interfaces provide features such as command nesting for a more natural interaction with the data. The Image and Signal Lisp Environment (ISLE) will be discussed as an example of an interpreted functional language interface based on Common LISP. Additional benefits include multidimensional and multiple data-type independence through dispatching functions, dynamic loading of new functions, and connections to artificial intelligencemore » software.« less
Storytelling, behavior planning, and language evolution in context.
McBride, Glen
2014-01-01
An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes.
Storytelling, behavior planning, and language evolution in context
McBride, Glen
2014-01-01
An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes. PMID:25360123
Three Dimensions of Reproducibility in Natural Language Processing.
Cohen, K Bretonnel; Xia, Jingbo; Zweigenbaum, Pierre; Callahan, Tiffany J; Hargraves, Orin; Goss, Foster; Ide, Nancy; Névéol, Aurélie; Grouin, Cyril; Hunter, Lawrence E
2018-05-01
Despite considerable recent attention to problems with reproducibility of scientific research, there is a striking lack of agreement about the definition of the term. That is a problem, because the lack of a consensus definition makes it difficult to compare studies of reproducibility, and thus to have even a broad overview of the state of the issue in natural language processing. This paper proposes an ontology of reproducibility in that field. Its goal is to enhance both future research and communication about the topic, and retrospective meta-analyses. We show that three dimensions of reproducibility, corresponding to three kinds of claims in natural language processing papers, can account for a variety of types of research reports. These dimensions are reproducibility of a conclusion , of a finding , and of a value. Three biomedical natural language processing papers by the authors of this paper are analyzed with respect to these dimensions.
Thought beyond language: neural dissociation of algebra and natural language.
Monti, Martin M; Parsons, Lawrence M; Osherson, Daniel N
2012-08-01
A central question in cognitive science is whether natural language provides combinatorial operations that are essential to diverse domains of thought. In the study reported here, we addressed this issue by examining the role of linguistic mechanisms in forging the hierarchical structures of algebra. In a 3-T functional MRI experiment, we showed that processing of the syntax-like operations of algebra does not rely on the neural mechanisms of natural language. Our findings indicate that processing the syntax of language elicits the known substrate of linguistic competence, whereas algebraic operations recruit bilateral parietal brain regions previously implicated in the representation of magnitude. This double dissociation argues against the view that language provides the structure of thought across all cognitive domains.
Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.
Becker, Matthias; Böckmann, Britta
2016-01-01
Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.
Proposal: A Hybrid Dictionary Modelling Approach for Malay Tweet Normalization
NASA Astrophysics Data System (ADS)
Muhamad, Nor Azlizawati Binti; Idris, Norisma; Arshi Saloot, Mohammad
2017-02-01
Malay Twitter message presents a special deviation from the original language. Malay Tweet widely used currently by Twitter users, especially at Malaya archipelago. Thus, it is important to make a normalization system which can translated Malay Tweet language into the standard Malay language. Some researchers have conducted in natural language processing which mainly focuses on normalizing English Twitter messages, while few studies have been done for normalize Malay Tweets. This paper proposes an approach to normalize Malay Twitter messages based on hybrid dictionary modelling methods. This approach normalizes noisy Malay twitter messages such as colloquially language, novel words, and interjections into standard Malay language. This research will be used Language Model and N-grams model.
ERIC Educational Resources Information Center
Wood, Peter
2011-01-01
"QuickAssist," the program presented in this paper, uses natural language processing (NLP) technologies. It places a range of NLP tools at the disposal of learners, intended to enable them to independently read and comprehend a German text of their choice while they extend their vocabulary, learn about different uses of particular words,…
Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S
2012-01-01
The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.
Analyzing Discourse Processing Using a Simple Natural Language Processing Tool
ERIC Educational Resources Information Center
Crossley, Scott A.; Allen, Laura K.; Kyle, Kristopher; McNamara, Danielle S.
2014-01-01
Natural language processing (NLP) provides a powerful approach for discourse processing researchers. However, there remains a notable degree of hesitation by some researchers to consider using NLP, at least on their own. The purpose of this article is to introduce and make available a "simple" NLP (SiNLP) tool. The overarching goal of…
Relaxation of selection, niche construction, and the Baldwin effect in language evolution.
Yamauchi, Hajime; Hashimoto, Takashi
2010-01-01
Deacon has suggested that one of the key factors of language evolution is not characterized by an increase in genetic contribution, often known as the Baldwin effect, but rather by a decrease. This process effectively increases linguistic learning capability by organizing a novel synergy of multiple lower-order functions previously irrelevant to the process of language acquisition. Deacon posits that this transition is not caused by natural selection. Rather, it is due to the relaxation of natural selection. While there are some cases in which relaxation caused by some external factors indeed induces the transition, we do not know what kind of relaxation has worked in language evolution. In this article, a genetic-algorithm-based computer simulation is used to investigate how the niche-constructing aspect of linguistic behavior may trigger the degradation of genetic predisposition related to language learning. The results show that agents initially increase their genetic predisposition for language learning—the Baldwin effect. They create a highly uniform sociolinguistic environment—a linguistic niche construction. This means that later generations constantly receive very similar inputs from adult agents, and subsequently the selective pressure to retain the genetic predisposition is relaxed.
A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
Chang, Jia Wei; Hsieh, Tung Cheng
2014-01-01
This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. PMID:24982952
Deciphering the language of nature: cryptography, secrecy, and alterity in Francis Bacon.
Clody, Michael C
2011-01-01
The essay argues that Francis Bacon's considerations of parables and cryptography reflect larger interpretative concerns of his natural philosophic project. Bacon describes nature as having a language distinct from those of God and man, and, in so doing, establishes a central problem of his natural philosophy—namely, how can the language of nature be accessed through scientific representation? Ultimately, Bacon's solution relies on a theory of differential and duplicitous signs that conceal within them the hidden voice of nature, which is best recognized in the natural forms of efficient causality. The "alphabet of nature"—those tables of natural occurrences—consequently plays a central role in his program, as it renders nature's language susceptible to a process and decryption that mirrors the model of the bilateral cipher. It is argued that while the writing of Bacon's natural philosophy strives for literality, its investigative process preserves a space for alterity within scientific representation, that is made accessible to those with the interpretative key.
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H.
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language. PMID:28118370
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning.
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.
Using Language Learning Conditions in Mathematics. PEN 68.
ERIC Educational Resources Information Center
Stoessiger, Rex
This pamphlet reports on a project in Tasmania exploring whether the "natural learning conditions" approach to language learning could be adapted for mathematics. The connections between language and mathematics, as well as the natural learning processes of language learning are described in the pamphlet. The project itself is…
Cognition-Based Approaches for High-Precision Text Mining
ERIC Educational Resources Information Center
Shannon, George John
2017-01-01
This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both…
Computer Aided Management for Information Processing Projects.
ERIC Educational Resources Information Center
Akman, Ibrahim; Kocamustafaogullari, Kemal
1995-01-01
Outlines the nature of information processing projects and discusses some project management programming packages. Describes an in-house interface program developed to utilize a selected project management package (TIMELINE) by using Oracle Data Base Management System tools and Pascal programming language for the management of information system…
Testing of a Natural Language Retrieval System for a Full Text Knowledge Base.
ERIC Educational Resources Information Center
Bernstein, Lionel M.; Williamson, Robert E.
1984-01-01
The Hepatitis Knowledge Base (text of prototype information system) was used for modifying and testing "A Navigator of Natural Language Organized (Textual) Data" (ANNOD), a retrieval system which combines probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for similarity to natural language queries…
Patterson, Olga V; Freiberg, Matthew S; Skanderson, Melissa; J Fodeh, Samah; Brandt, Cynthia A; DuVall, Scott L
2017-06-12
In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
Tasking and sharing sensing assets using controlled natural language
NASA Astrophysics Data System (ADS)
Preece, Alun; Pizzocaro, Diego; Braines, David; Mott, David
2012-06-01
We introduce an approach to representing intelligence, surveillance, and reconnaissance (ISR) tasks at a relatively high level in controlled natural language. We demonstrate that this facilitates both human interpretation and machine processing of tasks. More specically, it allows the automatic assignment of sensing assets to tasks, and the informed sharing of tasks between collaborating users in a coalition environment. To enable automatic matching of sensor types to tasks, we created a machine-processable knowledge representation based on the Military Missions and Means Framework (MMF), and implemented a semantic reasoner to match task types to sensor types. We combined this mechanism with a sensor-task assignment procedure based on a well-known distributed protocol for resource allocation. In this paper, we re-formulate the MMF ontology in Controlled English (CE), a type of controlled natural language designed to be readable by a native English speaker whilst representing information in a structured, unambiguous form to facilitate machine processing. We show how CE can be used to describe both ISR tasks (for example, detection, localization, or identication of particular kinds of object) and sensing assets (for example, acoustic, visual, or seismic sensors, mounted on motes or unmanned vehicles). We show how these representations enable an automatic sensor-task assignment process. Where a group of users are cooperating in a coalition, we show how CE task summaries give users in the eld a high-level picture of ISR coverage of an area of interest. This allows them to make ecient use of sensing resources by sharing tasks.
Weng, Chunhua; Payne, Philip R O; Velez, Mark; Johnson, Stephen B; Bakken, Suzanne
2014-01-01
The successful adoption by clinicians of evidence-based clinical practice guidelines (CPGs) contained in clinical information systems requires efficient translation of free-text guidelines into computable formats. Natural language processing (NLP) has the potential to improve the efficiency of such translation. However, it is laborious to develop NLP to structure free-text CPGs using existing formal knowledge representations (KR). In response to this challenge, this vision paper discusses the value and feasibility of supporting symbiosis in text-based knowledge acquisition (KA) and KR. We compare two ontologies: (1) an ontology manually created by domain experts for CPG eligibility criteria and (2) an upper-level ontology derived from a semantic pattern-based approach for automatic KA from CPG eligibility criteria text. Then we discuss the strengths and limitations of interweaving KA and NLP for KR purposes and important considerations for achieving the symbiosis of KR and NLP for structuring CPGs to achieve evidence-based clinical practice.
The Promise of NLP and Speech Processing Technologies in Language Assessment
ERIC Educational Resources Information Center
Chapelle, Carol A.; Chung, Yoo-Ree
2010-01-01
Advances in natural language processing (NLP) and automatic speech recognition and processing technologies offer new opportunities for language testing. Despite their potential uses on a range of language test item types, relatively little work has been done in this area, and it is therefore not well understood by test developers, researchers or…
LSTM-CRF | Informatics Technology for Cancer Research (ITCR)
LSTM-CRF uses Natural Language Processing methods for detecting Adverse Drug Events, Drugname, Indication and other medically relevant information from Electronic Health Records. It implements Recurrent Neural Networks using several CRF based inference methods.
Boguslav, Mayla; Cohen, Kevin Bretonnel
2017-01-01
Human-annotated data is a fundamental part of natural language processing system development and evaluation. The quality of that data is typically assessed by calculating the agreement between the annotators. It is widely assumed that this agreement between annotators is the upper limit on system performance in natural language processing: if humans can't agree with each other about the classification more than some percentage of the time, we don't expect a computer to do any better. We trace the logical positivist roots of the motivation for measuring inter-annotator agreement, demonstrate the prevalence of the widely-held assumption about the relationship between inter-annotator agreement and system performance, and present data that suggest that inter-annotator agreement is not, in fact, an upper bound on language processing system performance.
CONSTRUCT: In Search of a Theory of Meaning. Technical Report No. 238.
ERIC Educational Resources Information Center
Smith, R. L.; And Others
A new language-processing system, CONSTRUCT, is described and defined as a question-answering system for elementary mathematical language using natural language input. The primary goal is said to be an attempt to reach a better understanding of the relationship between syntactic and semantic components of natural language. The "meaning…
Human-Level Natural Language Understanding: False Progress and Real Challenges
ERIC Educational Resources Information Center
Bignoli, Perrin G.
2013-01-01
The field of Natural Language Processing (NLP) focuses on the study of how utterances composed of human-level languages can be understood and generated. Typically, there are considered to be three intertwined levels of structure that interact to create meaning in language: syntax, semantics, and pragmatics. Not only is a large amount of…
Baneyx, Audrey; Charlet, Jean; Jaulent, Marie-Christine
2007-01-01
Pathologies and acts are classified in thesauri to help physicians to code their activity. In practice, the use of thesauri is not sufficient to reduce variability in coding and thesauri are not suitable for computer processing. We think the automation of the coding task requires a conceptual modeling of medical items: an ontology. Our task is to help lung specialists code acts and diagnoses with software that represents medical knowledge of this concerned specialty by an ontology. The objective of the reported work was to build an ontology of pulmonary diseases dedicated to the coding process. To carry out this objective, we develop a precise methodological process for the knowledge engineer in order to build various types of medical ontologies. This process is based on the need to express precisely in natural language the meaning of each concept using differential semantics principles. A differential ontology is a hierarchy of concepts and relationships organized according to their similarities and differences. Our main research hypothesis is to apply natural language processing tools to corpora to develop the resources needed to build the ontology. We consider two corpora, one composed of patient discharge summaries and the other being a teaching book. We propose to combine two approaches to enrich the ontology building: (i) a method which consists of building terminological resources through distributional analysis and (ii) a method based on the observation of corpus sequences in order to reveal semantic relationships. Our ontology currently includes 1550 concepts and the software implementing the coding process is still under development. Results show that the proposed approach is operational and indicates that the combination of these methods and the comparison of the resulting terminological structures give interesting clues to a knowledge engineer for the building of an ontology.
Kreimeyer, Kory; Foster, Matthew; Pandey, Abhishek; Arya, Nina; Halford, Gwendolyn; Jones, Sandra F; Forshee, Richard; Walderhaug, Mark; Botsis, Taxiarchis
2017-09-01
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP. Copyright © 2017 Elsevier Inc. All rights reserved.
Inferring heuristic classification hierarchies from natural language input
NASA Technical Reports Server (NTRS)
Hull, Richard; Gomez, Fernando
1993-01-01
A methodology for inferring hierarchies representing heuristic knowledge about the check out, control, and monitoring sub-system (CCMS) of the space shuttle launch processing system from natural language input is explained. Our method identifies failures explicitly and implicitly described in natural language by domain experts and uses those descriptions to recommend classifications for inclusion in the experts' heuristic hierarchies.
Flexible processing and the design of grammar.
Sag, Ivan A; Wasow, Thomas
2015-02-01
We explore the consequences of letting the incremental and integrative nature of language processing inform the design of competence grammar. What emerges is a view of grammar as a system of local monotonic constraints that provide a direct characterization of the signs (the form-meaning correspondences) of a given language. This "sign-based" conception of grammar has provided precise solutions to the key problems long thought to motivate movement-based analyses, has supported three decades of computational research developing large-scale grammar implementations, and is now beginning to play a role in computational psycholinguistics research that explores the use of underspecification in the incremental computation of partial meanings.
Temporal pattern processing in songbirds.
Comins, Jordan A; Gentner, Timothy Q
2014-10-01
Understanding how the brain perceives, organizes and uses patterned information is directly related to the neurobiology of language. Given the present limitations, such knowledge at the scale of neurons, neural circuits and neural populations can only come from non-human models, focusing on shared capacities that are relevant to language processing. Here we review recent advances in the behavioral and neural basis of temporal pattern processing of natural auditory communication signals in songbirds, focusing on European starlings. We suggest a general inhibitory circuit for contextual modulation that can act to control sensory representations based on patterning rules. Copyright © 2014. Published by Elsevier Ltd.
Linguistics and Information Science
ERIC Educational Resources Information Center
Montgomery, Christine A.
1972-01-01
This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The concept of a natural language information system is introduced as a framework for reviewing automated language processing efforts by computational linguists and information scientists. (96 references) (Author)
Intelligent CAI: An Author Aid for a Natural Language Interface.
ERIC Educational Resources Information Center
Burton, Richard R.; Brown, John Seely
This report addresses the problems of using natural language (English) as the communication language for advanced computer-based instructional systems. The instructional environment places requirements on a natural language understanding system that exceed the capabilities of all existing systems, including: (1) efficiency, (2) habitability, (3)…
Reading Process and Practice: From Socio-Psycholinguistics to Whole Language.
ERIC Educational Resources Information Center
Weaver, Constance
Based on the thesis that reading is not a passive process by which readers soak up words and information from the page, but an active process by which they predict, sample, and confirm or correct their hypotheses about the written text, this book is an introduction to the theories of the psycholinguistic nature of the reading process and reading…
How Involved Are American L2 Learners of Spanish in Lexical Input Processing Tasks during Reading?
ERIC Educational Resources Information Center
Pulido, Diana
2009-01-01
This study examines the nature of the involvement load (Laufer & Hulstijn, 2001) in second language (L2) lexical input processing through reading by considering the effects of the reader-based factors of L2 reading proficiency and background knowledge. The lexical input processing aspects investigated were lexical inferencing (search), attentional…
Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei
2014-04-01
Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.
Natural language processing and advanced information management
NASA Technical Reports Server (NTRS)
Hoard, James E.
1989-01-01
Integrating diverse information sources and application software in a principled and general manner will require a very capable advanced information management (AIM) system. In particular, such a system will need a comprehensive addressing scheme to locate the material in its docuverse. It will also need a natural language processing (NLP) system of great sophistication. It seems that the NLP system must serve three functions. First, it provides an natural language interface (NLI) for the users. Second, it serves as the core component that understands and makes use of the real-world interpretations (RWIs) contained in the docuverse. Third, it enables the reasoning specialists (RSs) to arrive at conclusions that can be transformed into procedures that will satisfy the users' requests. The best candidate for an intelligent agent that can satisfactorily make use of RSs and transform documents (TDs) appears to be an object oriented data base (OODB). OODBs have, apparently, an inherent capacity to use the large numbers of RSs and TDs that will be required by an AIM system and an inherent capacity to use them in an effective way.
What Artificial Intelligence Is Doing for Training.
ERIC Educational Resources Information Center
Kirrane, Peter R.; Kirrane, Diane E.
1989-01-01
Discusses the three areas of research and application of artificial intelligence: (1) robotics, (2) natural language processing, and (3) knowledge-based or expert systems. Focuses on what expert systems can do, especially in the area of training. (JOW)
Terminology model discovery using natural language processing and visualization techniques.
Zhou, Li; Tao, Ying; Cimino, James J; Chen, Elizabeth S; Liu, Hongfang; Lussier, Yves A; Hripcsak, George; Friedman, Carol
2006-12-01
Medical terminologies are important for unambiguous encoding and exchange of clinical information. The traditional manual method of developing terminology models is time-consuming and limited in the number of phrases that a human developer can examine. In this paper, we present an automated method for developing medical terminology models based on natural language processing (NLP) and information visualization techniques. Surgical pathology reports were selected as the testing corpus for developing a pathology procedure terminology model. The use of a general NLP processor for the medical domain, MedLEE, provides an automated method for acquiring semantic structures from a free text corpus and sheds light on a new high-throughput method of medical terminology model development. The use of an information visualization technique supports the summarization and visualization of the large quantity of semantic structures generated from medical documents. We believe that a general method based on NLP and information visualization will facilitate the modeling of medical terminologies.
Open Source Clinical NLP - More than Any Single System.
Masanz, James; Pakhomov, Serguei V; Xu, Hua; Wu, Stephen T; Chute, Christopher G; Liu, Hongfang
2014-01-01
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP's mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice.
QATT: a Natural Language Interface for QPE. M.S. Thesis
NASA Technical Reports Server (NTRS)
White, Douglas Robert-Graham
1989-01-01
QATT, a natural language interface developed for the Qualitative Process Engine (QPE) system is presented. The major goal was to evaluate the use of a preexisting natural language understanding system designed to be tailored for query processing in multiple domains of application. The other goal of QATT is to provide a comfortable environment in which to query envisionments in order to gain insight into the qualitative behavior of physical systems. It is shown that the use of the preexisting system made possible the development of a reasonably useful interface in a few months.
BIT BY BIT: A Game Simulating Natural Language Processing in Computers
ERIC Educational Resources Information Center
Kato, Taichi; Arakawa, Chuichi
2008-01-01
BIT BY BIT is an encryption game that is designed to improve students' understanding of natural language processing in computers. Participants encode clear words into binary code using an encryption key and exchange them in the game. BIT BY BIT enables participants who do not understand the concept of binary numbers to perform the process of…
RGSS-ID: an approach to new radiologic reporting system.
Ikeda, M; Sakuma, S; Maruyama, K
1990-01-01
RGSS-ID is a developmental computer system that applies artificial intelligence (AI) methods to a reporting system. The representation scheme called Generalized Finding Representation (GFR) is proposed to bridge the gap between natural language expressions in the radiology report and AI methods. The entry process of RGSS-ID is made mainly by selecting items; our system allows a radiologist to compose a sentence which can be completely parsed by the computer. Further RGSS-ID encodes findings into the expression corresponding to GFR, and stores this expression into the knowledge data base. The final printed report is made in the natural language.
Eliminating Unpredictable Variation through Iterated Learning
ERIC Educational Resources Information Center
Smith, Kenny; Wonnacott, Elizabeth
2010-01-01
Human languages may be shaped not only by the (individual psychological) processes of language acquisition, but also by population-level processes arising from repeated language learning and use. One prevalent feature of natural languages is that they avoid unpredictable variation. The current work explores whether linguistic predictability might…
Greenwald, Jeffrey L; Cronin, Patrick R; Carballo, Victoria; Danaei, Goodarz; Choy, Garry
2017-03-01
With the increasing focus on reducing hospital readmissions in the United States, numerous readmissions risk prediction models have been proposed, mostly developed through analyses of structured data fields in electronic medical records and administrative databases. Three areas that may have an impact on readmission but are poorly captured using structured data sources are patients' physical function, cognitive status, and psychosocial environment and support. The objective of the study was to build a discriminative model using information germane to these 3 areas to identify hospitalized patients' risk for 30-day all cause readmissions. We conducted clinician focus groups to identify language used in the clinical record regarding these 3 areas. We then created a dataset including 30,000 inpatients, 10,000 from each of 3 hospitals, and searched those records for the focus group-derived language using natural language processing. A 30-day readmission prediction model was developed on 75% of the dataset and validated on the other 25% and also on hospital specific subsets. Focus group language was aggregated into 35 variables. The final model had 16 variables, a validated C-statistic of 0.74, and was well calibrated. Subset validation of the model by hospital yielded C-statistics of 0.70-0.75. Deriving a 30-day readmission risk prediction model through identification of physical, cognitive, and psychosocial issues using natural language processing yielded a model that performs similarly to the better performing models previously published with the added advantage of being based on clinically relevant factors and also automated and scalable. Because of the clinical relevance of the variables in the model, future research may be able to test if targeting interventions to identified risks results in reductions in readmissions.
Hampton Wray, Amanda; Weber-Fox, Christine
2013-07-01
The neural activity mediating language processing in young children is characterized by large individual variability that is likely related in part to individual strengths and weakness across various cognitive abilities. The current study addresses the following question: How does proficiency in specific cognitive and language functions impact neural indices mediating language processing in children? Thirty typically developing seven- and eight-year-olds were divided into high-normal and low-normal proficiency groups based on performance on nonverbal IQ, auditory word recall, and grammatical morphology tests. Event-related brain potentials (ERPs) were elicited by semantic anomalies and phrase structure violations in naturally spoken sentences. The proficiency for each of the specific cognitive and language tasks uniquely contributed to specific aspects (e.g., timing and/or resource allocation) of neural indices underlying semantic (N400) and syntactic (P600) processing. These results suggest that distinct aptitudes within broader domains of cognition and language, even within the normal range, influence the neural signatures of semantic and syntactic processing. Furthermore, the current findings have important implications for the design and interpretation of developmental studies of ERPs indexing language processing, and they highlight the need to take into account cognitive abilities both within and outside the classic language domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
The language faculty that wasn't: a usage-based account of natural language recursion
Christiansen, Morten H.; Chater, Nick
2015-01-01
In the generative tradition, the language faculty has been shrinking—perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty. PMID:26379567
The language faculty that wasn't: a usage-based account of natural language recursion.
Christiansen, Morten H; Chater, Nick
2015-01-01
In the generative tradition, the language faculty has been shrinking-perhaps to include only the mechanism of recursion. This paper argues that even this view of the language faculty is too expansive. We first argue that a language faculty is difficult to reconcile with evolutionary considerations. We then focus on recursion as a detailed case study, arguing that our ability to process recursive structure does not rely on recursion as a property of the grammar, but instead emerges gradually by piggybacking on domain-general sequence learning abilities. Evidence from genetics, comparative work on non-human primates, and cognitive neuroscience suggests that humans have evolved complex sequence learning skills, which were subsequently pressed into service to accommodate language. Constraints on sequence learning therefore have played an important role in shaping the cultural evolution of linguistic structure, including our limited abilities for processing recursive structure. Finally, we re-evaluate some of the key considerations that have often been taken to require the postulation of a language faculty.
Automatic Requirements Specification Extraction from Natural Language (ARSENAL)
2014-10-01
designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in
Bibliography of Research in Natural Language Generation
1993-11-01
on 1397] Barbara J. Gross Focuing and description in Artifcial Intelligence (GWAI-88), Geseke, West natural language dialogues, In Joshi et al. (557...Proceedings of the Fifth Canadian Conference from information in a frame structure. Data and on Artificial Intelligence , pages Ŕ-24, London, Knowledge...generation workshops (IWNLGS, ENLGWS), natural language processing conferences (ANLP, TINLAP, SPEECH), artificial intelligence conferences (AAAI, SCA
Chinese Sentence Classification Based on Convolutional Neural Network
NASA Astrophysics Data System (ADS)
Gu, Chengwei; Wu, Ming; Zhang, Chuang
2017-10-01
Sentence classification is one of the significant issues in Natural Language Processing (NLP). Feature extraction is often regarded as the key point for natural language processing. Traditional ways based on machine learning can not take high level features into consideration, such as Naive Bayesian Model. The neural network for sentence classification can make use of contextual information to achieve greater results in sentence classification tasks. In this paper, we focus on classifying Chinese sentences. And the most important is that we post a novel architecture of Convolutional Neural Network (CNN) to apply on Chinese sentence classification. In particular, most of the previous methods often use softmax classifier for prediction, we embed a linear support vector machine to substitute softmax in the deep neural network model, minimizing a margin-based loss to get a better result. And we use tanh as an activation function, instead of ReLU. The CNN model improve the result of Chinese sentence classification tasks. Experimental results on the Chinese news title database validate the effectiveness of our model.
Rank and Sparsity in Language Processing
ERIC Educational Resources Information Center
Hutchinson, Brian
2013-01-01
Language modeling is one of many problems in language processing that have to grapple with naturally high ambient dimensions. Even in large datasets, the number of unseen sequences is overwhelmingly larger than the number of observed ones, posing clear challenges for estimation. Although existing methods for building smooth language models tend to…
Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension
ERIC Educational Resources Information Center
Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.
2017-01-01
This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…
Verification Processes in Recognition Memory: The Role of Natural Language Mediators
ERIC Educational Resources Information Center
Marshall, Philip H.; Smith, Randolph A. S.
1977-01-01
The existence of verification processes in recognition memory was confirmed in the context of Adams' (Adams & Bray, 1970) closed-loop theory. Subjects' recognition was tested following a learning session. The expectation was that data would reveal consistent internal relationships supporting the position that natural language mediation plays…
Formalizing Knowledge in Multi-Scale Agent-Based Simulations
Somogyi, Endre; Sluka, James P.; Glazier, James A.
2017-01-01
Multi-scale, agent-based simulations of cellular and tissue biology are increasingly common. These simulations combine and integrate a range of components from different domains. Simulations continuously create, destroy and reorganize constituent elements causing their interactions to dynamically change. For example, the multi-cellular tissue development process coordinates molecular, cellular and tissue scale objects with biochemical, biomechanical, spatial and behavioral processes to form a dynamic network. Different domain specific languages can describe these components in isolation, but cannot describe their interactions. No current programming language is designed to represent in human readable and reusable form the domain specific knowledge contained in these components and interactions. We present a new hybrid programming language paradigm that naturally expresses the complex multi-scale objects and dynamic interactions in a unified way and allows domain knowledge to be captured, searched, formalized, extracted and reused. PMID:29338063
Formalizing Knowledge in Multi-Scale Agent-Based Simulations.
Somogyi, Endre; Sluka, James P; Glazier, James A
2016-10-01
Multi-scale, agent-based simulations of cellular and tissue biology are increasingly common. These simulations combine and integrate a range of components from different domains. Simulations continuously create, destroy and reorganize constituent elements causing their interactions to dynamically change. For example, the multi-cellular tissue development process coordinates molecular, cellular and tissue scale objects with biochemical, biomechanical, spatial and behavioral processes to form a dynamic network. Different domain specific languages can describe these components in isolation, but cannot describe their interactions. No current programming language is designed to represent in human readable and reusable form the domain specific knowledge contained in these components and interactions. We present a new hybrid programming language paradigm that naturally expresses the complex multi-scale objects and dynamic interactions in a unified way and allows domain knowledge to be captured, searched, formalized, extracted and reused.
Formal ontology for natural language processing and the integration of biomedical databases.
Simon, Jonathan; Dos Santos, Mariana; Fielding, James; Smith, Barry
2006-01-01
The central hypothesis underlying this communication is that the methodology and conceptual rigor of a philosophically inspired formal ontology can bring significant benefits in the development and maintenance of application ontologies [A. Flett, M. Dos Santos, W. Ceusters, Some Ontology Engineering Procedures and their Supporting Technologies, EKAW2002, 2003]. This hypothesis has been tested in the collaboration between Language and Computing (L&C), a company specializing in software for supporting natural language processing especially in the medical field, and the Institute for Formal Ontology and Medical Information Science (IFOMIS), an academic research institution concerned with the theoretical foundations of ontology. In the course of this collaboration L&C's ontology, LinKBase, which is designed to integrate and support reasoning across a plurality of external databases, has been subjected to a thorough auditing on the basis of the principles underlying IFOMIS's Basic Formal Ontology (BFO) [B. Smith, Basic Formal Ontology, 2002. http://ontology.buffalo.edu/bfo]. The goal is to transform a large terminology-based ontology into one with the ability to support reasoning applications. Our general procedure has been the implementation of a meta-ontological definition space in which the definitions of all the concepts and relations in LinKBase are standardized in the framework of first-order logic. In this paper we describe how this principles-based standardization has led to a greater degree of internal coherence of the LinKBase structure, and how it has facilitated the construction of mappings between external databases using LinKBase as translation hub. We argue that the collaboration here described represents a new phase in the quest to solve the so-called "Tower of Babel" problem of ontology integration [F. Montayne, J. Flanagan, Formal Ontology: The Foundation for Natural Language Processing, 2003. http://www.landcglobal.com/].
Apprentissage naturel et apprentissage guide (Natural Learning and Guided Learning).
ERIC Educational Resources Information Center
Veronique, Daniel
1984-01-01
Although second language pedagogy has tended increasingly toward simulation, role-playing, and natural communication, it has not profited from existing research on natural learning in second languages. The emphasis should be on understanding how the processes of guided learning and natural learning differ, psychologically and sociologically, and…
Gundlapalli, Adi V; Divita, Guy; Redd, Andrew; Carter, Marjorie E; Ko, Danette; Rubin, Michael; Samore, Matthew; Strymish, Judith; Krein, Sarah; Gupta, Kalpana; Sales, Anne; Trautner, Barbara W
2017-07-01
To develop a natural language processing pipeline to extract positively asserted concepts related to the presence of an indwelling urinary catheter in hospitalized patients from the free text of the electronic medical note. The goal is to assist infection preventionists and other healthcare professionals in determining whether a patient has an indwelling urinary catheter when a catheter-associated urinary tract infection is suspected. Currently, data on indwelling urinary catheters is not consistently captured in the electronic medical record in structured format and thus cannot be reliably extracted for clinical and research purposes. We developed a lexicon of terms related to indwelling urinary catheters and urinary symptoms based on domain knowledge, prior experience in the field, and review of medical notes. A reference standard of 1595 randomly selected documents from inpatient admissions was annotated by human reviewers to identify all positively and negatively asserted concepts related to indwelling urinary catheters. We trained a natural language processing pipeline based on the V3NLP framework using 1050 documents and tested on 545 documents to determine agreement with the human reference standard. Metrics reported are positive predictive value and recall. The lexicon contained 590 terms related to the presence of an indwelling urinary catheter in various categories including insertion, care, change, and removal of urinary catheters and 67 terms for urinary symptoms. Nursing notes were the most frequent inpatient note titles in the reference standard document corpus; these also yielded the highest number of positively asserted concepts with respect to urinary catheters. Comparing the performance of the natural language processing pipeline against the human reference standard, the overall recall was 75% and positive predictive value was 99% on the training set; on the testing set, the recall was 72% and positive predictive value was 98%. The performance on extracting urinary symptoms (including fever) was high with recall and precision greater than 90%. We have shown that it is possible to identify the presence of an indwelling urinary catheter and urinary symptoms from the free text of electronic medical notes from inpatients using natural language processing. These are two key steps in developing automated protocols to assist humans in large-scale review of patient charts for catheter-associated urinary tract infection. The challenges associated with extracting indwelling urinary catheter-related concepts also inform the design of electronic medical record templates to reliably and consistently capture data on indwelling urinary catheters. Published by Elsevier Inc.
Hardjojo, Antony; Gunachandran, Arunan; Pang, Long; Abdullah, Mohammed Ridzwan Bin; Wah, Win; Chong, Joash Wen Chen; Goh, Ee Hui; Teo, Sok Huang; Lim, Gilbert; Lee, Mong Li; Hsu, Wynne; Lee, Vernon; Chen, Mark I-Cheng; Wong, Franco; Phang, Jonathan Siung King
2018-06-11
Free-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms. The aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records. CHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration. Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation. The symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status. We have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations. ©Antony Hardjojo, Arunan Gunachandran, Long Pang, Mohammed Ridzwan Bin Abdullah, Win Wah, Joash Wen Chen Chong, Ee Hui Goh, Sok Huang Teo, Gilbert Lim, Mong Li Lee, Wynne Hsu, Vernon Lee, Mark I-Cheng Chen, Franco Wong, Jonathan Siung King Phang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 11.06.2018.
Categorization of Survey Text Utilizing Natural Language Processing and Demographic Filtering
2017-09-01
SURVEY TEXT UTILIZING NATURAL LANGUAGE PROCESSING AND DEMOGRAPHIC FILTERING by Christine M. Cairoli September 2017 Thesis Advisor: Lyn...DATE September 2017 3. REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE CATEGORIZATION OF SURVEY TEXT UTILIZING NATURAL...words) Thousands of Navy survey free text comments are overlooked every year because reading and interpreting comments is expensive, time consuming
Case-based medical informatics
Pantazi, Stefan V; Arocha, José F; Moehr, Jochen R
2004-01-01
Background The "applied" nature distinguishes applied sciences from theoretical sciences. To emphasize this distinction, we begin with a general, meta-level overview of the scientific endeavor. We introduce the knowledge spectrum and four interconnected modalities of knowledge. In addition to the traditional differentiation between implicit and explicit knowledge we outline the concepts of general and individual knowledge. We connect general knowledge with the "frame problem," a fundamental issue of artificial intelligence, and individual knowledge with another important paradigm of artificial intelligence, case-based reasoning, a method of individual knowledge processing that aims at solving new problems based on the solutions to similar past problems. We outline the fundamental differences between Medical Informatics and theoretical sciences and propose that Medical Informatics research should advance individual knowledge processing (case-based reasoning) and that natural language processing research is an important step towards this goal that may have ethical implications for patient-centered health medicine. Discussion We focus on fundamental aspects of decision-making, which connect human expertise with individual knowledge processing. We continue with a knowledge spectrum perspective on biomedical knowledge and conclude that case-based reasoning is the paradigm that can advance towards personalized healthcare and that can enable the education of patients and providers. We center the discussion on formal methods of knowledge representation around the frame problem. We propose a context-dependent view on the notion of "meaning" and advocate the need for case-based reasoning research and natural language processing. In the context of memory based knowledge processing, pattern recognition, comparison and analogy-making, we conclude that while humans seem to naturally support the case-based reasoning paradigm (memory of past experiences of problem-solving and powerful case matching mechanisms), technical solutions are challenging. Finally, we discuss the major challenges for a technical solution: case record comprehensiveness, organization of information on similarity principles, development of pattern recognition and solving ethical issues. Summary Medical Informatics is an applied science that should be committed to advancing patient-centered medicine through individual knowledge processing. Case-based reasoning is the technical solution that enables a continuous individual knowledge processing and could be applied providing that challenges and ethical issues arising are addressed appropriately. PMID:15533257
Advances in natural language processing.
Hirschberg, Julia; Manning, Christopher D
2015-07-17
Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.
A UMLS-based spell checker for natural language processing in vaccine safety.
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-02-12
The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
A UMLS-based spell checker for natural language processing in vaccine safety
Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C
2007-01-01
Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest. PMID:17295907
Trivedi, Hari; Mesterhazy, Joseph; Laguna, Benjamin; Vu, Thienkhai; Sohn, Jae Ho
2018-04-01
Magnetic resonance imaging (MRI) protocoling can be time- and resource-intensive, and protocols can often be suboptimal dependent upon the expertise or preferences of the protocoling radiologist. Providing a best-practice recommendation for an MRI protocol has the potential to improve efficiency and decrease the likelihood of a suboptimal or erroneous study. The goal of this study was to develop and validate a machine learning-based natural language classifier that can automatically assign the use of intravenous contrast for musculoskeletal MRI protocols based upon the free-text clinical indication of the study, thereby improving efficiency of the protocoling radiologist and potentially decreasing errors. We utilized a deep learning-based natural language classification system from IBM Watson, a question-answering supercomputer that gained fame after challenging the best human players on Jeopardy! in 2011. We compared this solution to a series of traditional machine learning-based natural language processing techniques that utilize a term-document frequency matrix. Each classifier was trained with 1240 MRI protocols plus their respective clinical indications and validated with a test set of 280. Ground truth of contrast assignment was obtained from the clinical record. For evaluation of inter-reader agreement, a blinded second reader radiologist analyzed all cases and determined contrast assignment based on only the free-text clinical indication. In the test set, Watson demonstrated overall accuracy of 83.2% when compared to the original protocol. This was similar to the overall accuracy of 80.2% achieved by an ensemble of eight traditional machine learning algorithms based on a term-document matrix. When compared to the second reader's contrast assignment, Watson achieved 88.6% agreement. When evaluating only the subset of cases where the original protocol and second reader were concordant (n = 251), agreement climbed further to 90.0%. The classifier was relatively robust to spelling and grammatical errors, which were frequent. Implementation of this automated MR contrast determination system as a clinical decision support tool may save considerable time and effort of the radiologist while potentially decreasing error rates, and require no change in order entry or workflow.
Evidence-based interventions for reading and language difficulties: creating a virtuous circle.
Snowling, Margaret J; Hulme, Charles
2011-03-01
BACKGROUND. Children may experience two very different forms of reading problem: decoding difficulties (dyslexia) and reading comprehension difficulties. Decoding difficulties appear to be caused by problems with phonological (speech sound) processing. Reading comprehension difficulties in contrast appear to be caused by problems with 'higher level' language difficulties including problems with semantics (including deficient knowledge of word meanings) and grammar (knowledge of morphology and syntax). AIMS. We review evidence concerning the nature, causes of, and treatments for children's reading difficulties. We argue that any well-founded educational intervention must be based on a sound theory of the causes of a particular form of learning difficulty, which in turn must be based on an understanding of how a given skill is learned by typically developing children. Such theoretically motivated interventions should in turn be evaluated in randomized controlled trials (RCTs) to establish whether they are effective, and for whom. RESULTS. There is now considerable evidence showing that phonologically based interventions are effective in ameliorating children's word level decoding difficulties, and a smaller evidence base showing that reading and oral language (OL) comprehension difficulties can be ameliorated by suitable interventions to boost vocabulary and broader OL skills. CONCLUSIONS. The process of developing theories about the origins of children's educational difficulties and evaluating theoretically motivated treatments in RCTs, produces a 'virtuous circle' whereby theory informs practice, and the evaluation of effective interventions in turn feeds back to inform and refine theories about the nature and causes of children's reading and language difficulties. ©2010 The British Psychological Society.
Memory Interference as a Determinant of Language Comprehension
Van Dyke, Julie A.; Johns, Clinton L.
2012-01-01
The parameters of the human memory system constrain the operation of language comprehension processes. In the memory literature, both decay and interference have been proposed as causes of forgetting; however, while there is a long history of research establishing the nature of interference effects in memory, the effects of decay are much more poorly supported. Nevertheless, research investigating the limitations of the human sentence processing mechanism typically focus on decay-based explanations, emphasizing the role of capacity, while the role of interference has received comparatively little attention. This paper reviews both accounts of difficulty in language comprehension by drawing direct connections to research in the memory domain. Capacity-based accounts are found to be untenable, diverging substantially from what is known about the operation of the human memory system. In contrast, recent research investigating comprehension difficulty using a retrieval-interference paradigm is shown to be wholly consistent with both behavioral and neuropsychological memory phenomena. The implications of adopting a retrieval-interference approach to investigating individual variation in language comprehension are discussed. PMID:22773927
Color as a language in architecture
NASA Astrophysics Data System (ADS)
Smedal, Grete
2002-06-01
This paper takes into consideration the role of color as a non-verbal language between human beings and the environment. The communication is based on the function of the color vision to separate and identify. A language about color can be based on the same. The concept behind the Natural Color System is color differentiation and color identification, which I find very useful both in color education of design students and in environmental color design work. A commission to plan the exterior use of color for a whole mining town on 78 degree(s) North in Longyearbyen, Spitsbergen, will illustrate my ideas. This will serve as an example of how these different 'languages' can work together. After a twenty years ongoing process this work is now almost fulfilled and the result will be shown in the presentation.
Intelligent Agents as a Basis for Natural Language Interfaces
1988-01-01
language analysis component of UC, which produces a semantic representa tion of the input. This representation is in the form of a KODIAK network (see...Appendix A). Next, UC’s Concretion Mechanism performs concretion inferences ([Wilensky, 1983] and [Norvig, 1983]) based on the semantic network...The first step in UC’s processing is done by UC’s parser/understander component which produces a KODIAK semantic network representa tion of
Linguistic Knowledge and Reasoning for Error Diagnosis and Feedback Generation.
ERIC Educational Resources Information Center
Delmonte, Rodolfo
2003-01-01
Presents four sets of natural language processing-based exercises for which error correction and feedback are produced by means of a rich database in which linguistic information is encoded either at the lexical or the grammatical level. (Author/VWL)
Automatic reconstruction of a bacterial regulatory network using Natural Language Processing
Rodríguez-Penagos, Carlos; Salgado, Heladia; Martínez-Flores, Irma; Collado-Vides, Julio
2007-01-01
Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages. PMID:17683642
Friedman, Carol; Hripcsak, George; Shagina, Lyuda; Liu, Hongfang
1999-01-01
Objective: To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. Methods: A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP) was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. Results: The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. Conclusions: Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available. PMID:9925230
Small Knowledge-Based Systems in Education and Training: Something New Under the Sun.
ERIC Educational Resources Information Center
Wilson, Brent G.; Welsh, Jack R.
1986-01-01
Discusses artificial intelligence, robotics, natural language processing, and expert or knowledge-based systems research; examines two large expert systems, MYCIN and XCON; and reviews the resources required to build large expert systems and affordable smaller systems (intelligent job aids) for training. Expert system vendors and products are…
System and method for deriving a process-based specification
NASA Technical Reports Server (NTRS)
Hinchey, Michael Gerard (Inventor); Rouff, Christopher A. (Inventor); Rash, James Larry (Inventor)
2009-01-01
A system and method for deriving a process-based specification for a system is disclosed. The process-based specification is mathematically inferred from a trace-based specification. The trace-based specification is derived from a non-empty set of traces or natural language scenarios. The process-based specification is mathematically equivalent to the trace-based specification. Code is generated, if applicable, from the process-based specification. A process, or phases of a process, using the features disclosed can be reversed and repeated to allow for an interactive development and modification of legacy systems. The process is applicable to any class of system, including, but not limited to, biological and physical systems, electrical and electro-mechanical systems in addition to software, hardware and hybrid hardware-software systems.
Patton, Desmond Upton; MacBeth, Jamie; Schoenebeck, Sarita; Shear, Katherine; McKeown, Kathleen
2018-01-01
There is a dearth of research investigating youths' experience of grief and mourning after the death of close friends or family. Even less research has explored the question of how youth use social media sites to engage in the grieving process. This study employs qualitative analysis and natural language processing to examine tweets that follow 2 deaths. First, we conducted a close textual read on a sample of tweets by Gakirah Barnes, a gang-involved teenaged girl in Chicago, and members of her Twitter network, over a 19-day period in 2014 during which 2 significant deaths occurred: that of Raason "Lil B" Shaw and Gakirah's own death. We leverage the grief literature to understand the way Gakirah and her peers express thoughts, feelings, and behaviors at the time of these deaths. We also present and explain the rich and complex style of online communication among gang-involved youth, one that has been overlooked in prior research. Next, we overview the natural language processing output for expressions of loss and grief in our data set based on qualitative findings and present an error analysis on its output for grief. We conclude with a call for interdisciplinary research that analyzes online and offline behaviors to help understand physical and emotional violence and other problematic behaviors prevalent among marginalized communities.
Patton, Desmond Upton; MacBeth, Jamie; Schoenebeck, Sarita; Shear, Katherine; McKeown, Kathleen
2018-01-01
There is a dearth of research investigating youths’ experience of grief and mourning after the death of close friends or family. Even less research has explored the question of how youth use social media sites to engage in the grieving process. This study employs qualitative analysis and natural language processing to examine tweets that follow 2 deaths. First, we conducted a close textual read on a sample of tweets by Gakirah Barnes, a gang-involved teenaged girl in Chicago, and members of her Twitter network, over a 19-day period in 2014 during which 2 significant deaths occurred: that of Raason “Lil B” Shaw and Gakirah’s own death. We leverage the grief literature to understand the way Gakirah and her peers express thoughts, feelings, and behaviors at the time of these deaths. We also present and explain the rich and complex style of online communication among gang-involved youth, one that has been overlooked in prior research. Next, we overview the natural language processing output for expressions of loss and grief in our data set based on qualitative findings and present an error analysis on its output for grief. We conclude with a call for interdisciplinary research that analyzes online and offline behaviors to help understand physical and emotional violence and other problematic behaviors prevalent among marginalized communities. PMID:29636619
Conclusiveness of natural languages and recognition of images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wojcik, Z.M.
1983-01-01
The conclusiveness is investigated using recognition processes and one-one correspondence between expressions of a natural language and graphs representing events. The graphs, as conceived in psycholinguistics, are obtained as a result of perception processes. It is possible to generate and process the graphs automatically, using computers and then to convert the resulting graphs into expressions of a natural language. Correctness and conclusiveness of the graphs and sentences are investigated using the fundamental condition for events representation processes. Some consequences of the conclusiveness are discussed, e.g. undecidability of arithmetic, human brain assymetry, correctness of statistical calculations and operations research. It ismore » suggested that the group theory should be imposed on mathematical models of any real system. Proof of the fundamental condition is also presented. 14 references.« less
Robust Resilience of the Frontotemporal Syntax System to Aging
Samu, Dávid; Davis, Simon W.; Geerligs, Linda; Mustafa, Abdur; Tyler, Lorraine K.
2016-01-01
Brain function is thought to become less specialized with age. However, this view is largely based on findings of increased activation during tasks that fail to separate task-related processes (e.g., attention, decision making) from the cognitive process under examination. Here we take a systems-level approach to separate processes specific to language comprehension from those related to general task demands and to examine age differences in functional connectivity both within and between those systems. A large population-based sample (N = 111; 22–87 years) from the Cambridge Centre for Aging and Neuroscience (Cam-CAN) was scanned using functional MRI during two versions of an experiment: a natural listening version in which participants simply listened to spoken sentences and an explicit task version in which they rated the acceptability of the same sentences. Independent components analysis across the combined data from both versions showed that although task-free language comprehension activates only the auditory and frontotemporal (FTN) syntax networks, performing a simple task with the same sentences recruits several additional networks. Remarkably, functionality of the critical FTN is maintained across age groups, showing no difference in within-network connectivity or responsivity to syntactic processing demands despite gray matter loss and reduced connectivity to task-related networks. We found no evidence for reduced specialization or compensation with age. Overt task performance was maintained across the lifespan and performance in older, but not younger, adults related to crystallized knowledge, suggesting that decreased between-network connectivity may be compensated for by older adults' richer knowledge base. SIGNIFICANCE STATEMENT Understanding spoken language requires the rapid integration of information at many different levels of analysis. Given the complexity and speed of this process, it is remarkably well preserved with age. Although previous work claims that this preserved functionality is due to compensatory activation of regions outside the frontotemporal language network, we use a novel systems-level approach to show that these “compensatory” activations simply reflect age differences in response to experimental task demands. Natural, task-free language comprehension solely recruits auditory and frontotemporal networks, the latter of which is similarly responsive to language-processing demands across the lifespan. These findings challenge the conventional approach to neurocognitive aging by showing that the neural underpinnings of a given cognitive function depend on how you test it. PMID:27170120
First Language Acquisition and Teaching
ERIC Educational Resources Information Center
Cruz-Ferreira, Madalena
2011-01-01
"First language acquisition" commonly means the acquisition of a single language in childhood, regardless of the number of languages in a child's natural environment. Language acquisition is variously viewed as predetermined, wondrous, a source of concern, and as developing through formal processes. "First language teaching" concerns schooling in…
State of the Art of Natural Language Processing
1987-11-15
work of Chomsky , Hewlett-Packard, Generalized Phase Structure Grammar . D. Lunar, DARPA speech understanding, Schank’s Conceptual Dependency Theory...of computers that a machine which understood natural languages was highly desirable. It also was evident from the work of Chomsky * and others that...computers. ♦Noam Chomsky , Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press, 1965). -A- One of the earliest attempts at Natural Language
Jiang, Min; Chen, Yukun; Liu, Mei; Rosenbloom, S Trent; Mani, Subramani; Denny, Joshua C; Xu, Hua
2011-01-01
The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
Xu, Hua; AbdelRahman, Samir; Lu, Yanxin; Denny, Joshua C.; Doan, Son
2011-01-01
Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4,564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser. PMID:21856440
ERIC Educational Resources Information Center
Ziegler, Nicole; Meurers, Detmar; Rebuschat, Patrick; Ruiz, Simón; Moreno-Vega, José L.; Chinkina, Maria; Li, Wenjing; Grey, Sarah
2017-01-01
Despite the promise of research conducted at the intersection of computer-assisted language learning (CALL), natural language processing, and second language acquisition, few studies have explored the potential benefits of using intelligent CALL systems to deepen our understanding of the process and products of second language (L2) learning. The…
Sources of Difficulty in the Processing of Written Language. Report Series 4.3.
ERIC Educational Resources Information Center
Chafe, Wallace
Ease of language processing varies with the nature of the language involved. Ordinary spoken language is the easiest kind to produce and understand, while writing is a relatively new development. On thoughtful inspection, the readability of writing has shown itself to be a complex topic requiring insights from many academic disciplines and…
ERIC Educational Resources Information Center
Kamio, Yoko; Robins, Diana; Kelley, Elizabeth; Swainson, Brook; Fein, Deborah
2007-01-01
Although autism is associated with impaired language functions, the nature of semantic processing in high-functioning pervasive developmental disorders (HFPDD) without a history of early language delay has been debated. In this study, we aimed to examine whether the automatic lexical/semantic aspect of language is impaired or intact in these…
Musical Experience Influences Statistical Learning of a Novel Language
Shook, Anthony; Marian, Viorica; Bartolotti, James; Schroeder, Scott R.
2014-01-01
Musical experience may benefit learning a new language by enhancing the fidelity with which the auditory system encodes sound. In the current study, participants with varying degrees of musical experience were exposed to two statistically-defined languages consisting of auditory Morse-code sequences which varied in difficulty. We found an advantage for highly-skilled musicians, relative to less-skilled musicians, in learning novel Morse-code based words. Furthermore, in the more difficult learning condition, performance of lower-skilled musicians was mediated by their general cognitive abilities. We suggest that musical experience may lead to enhanced processing of statistical information and that musicians’ enhanced ability to learn statistical probabilities in a novel Morse-code language may extend to natural language learning. PMID:23505962
Artificial intelligence, expert systems, computer vision, and natural language processing
NASA Technical Reports Server (NTRS)
Gevarter, W. B.
1984-01-01
An overview of artificial intelligence (AI), its core ingredients, and its applications is presented. The knowledge representation, logic, problem solving approaches, languages, and computers pertaining to AI are examined, and the state of the art in AI is reviewed. The use of AI in expert systems, computer vision, natural language processing, speech recognition and understanding, speech synthesis, problem solving, and planning is examined. Basic AI topics, including automation, search-oriented problem solving, knowledge representation, and computational logic, are discussed.
Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System.
Zamora-Martinez, Francisco; Castro-Bleda, Maria Jose
2018-02-22
Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.
Revealing the Naturalization of Language and Literacy: The Common Sense of Text Complexity
ERIC Educational Resources Information Center
Newhouse, Erica H.
2017-01-01
This article illustrates the process and obstacles encountered when applying the Common Core's three-part model of determining text complexity to an urban literature text. This analysis revealed how the model privileges language and literacy practices that limit the range of texts used in classrooms through a process of naturalization and by…
Research at Yale in Natural Language Processing. Research Report #84.
ERIC Educational Resources Information Center
Schank, Roger C.
This report summarizes the capabilities of five computer programs at Yale that do automatic natural language processing as of the end of 1976. For each program an introduction to its overall intent is given, followed by the input/output, a short discussion of the research underlying the program, and a prognosis for future development. The programs…
Recurrent Artificial Neural Networks and Finite State Natural Language Processing.
ERIC Educational Resources Information Center
Moisl, Hermann
It is argued that pessimistic assessments of the adequacy of artificial neural networks (ANNs) for natural language processing (NLP) on the grounds that they have a finite state architecture are unjustified, and that their adequacy in this regard is an empirical issue. First, arguments that counter standard objections to finite state NLP on the…
The Application of Natural Language Processing to Augmentative and Alternative Communication
ERIC Educational Resources Information Center
Higginbotham, D. Jeffery; Lesher, Gregory W.; Moulton, Bryan J.; Roark, Brian
2012-01-01
Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next…
Construct Validity in TOEFL iBT Speaking Tasks: Insights from Natural Language Processing
ERIC Educational Resources Information Center
Kyle, Kristopher; Crossley, Scott A.; McNamara, Danielle S.
2016-01-01
This study explores the construct validity of speaking tasks included in the TOEFL iBT (e.g., integrated and independent speaking tasks). Specifically, advanced natural language processing (NLP) tools, MANOVA difference statistics, and discriminant function analyses (DFA) are used to assess the degree to which and in what ways responses to these…
Advanced Natural Language Processing and Temporal Mining for Clinical Discovery
ERIC Educational Resources Information Center
Mehrabi, Saeed
2016-01-01
There has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered…
You Are Your Words: Modeling Students' Vocabulary Knowledge with Natural Language Processing Tools
ERIC Educational Resources Information Center
Allen, Laura K.; McNamara, Danielle S.
2015-01-01
The current study investigates the degree to which the lexical properties of students' essays can inform stealth assessments of their vocabulary knowledge. In particular, we used indices calculated with the natural language processing tool, TAALES, to predict students' performance on a measure of vocabulary knowledge. To this end, two corpora were…
Divide and Recombine for Large Complex Data
2017-12-01
Empirical Methods in Natural Language Processing , October 2014 Keywords Enter keywords for the publication. URL Enter the URL...low-latency data processing systems. Declarative Languages for Interactive Visualization: The Reactive Vega Stack Another thread of XDATA research...for array processing operations embedded in the R programming language . Vector virtual machines work well for long vectors. One of the most
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharp, J.K.
1997-11-01
This seminar describes a process and methodology that uses structured natural language to enable the construction of precise information requirements directly from users, experts, and managers. The main focus of this natural language approach is to create the precise information requirements and to do it in such a way that the business and technical experts are fully accountable for the results. These requirements can then be implemented using appropriate tools and technology. This requirement set is also a universal learning tool because it has all of the knowledge that is needed to understand a particular process (e.g., expense vouchers, projectmore » management, budget reviews, tax, laws, machine function).« less
Diminished Auditory Responses during NREM Sleep Correlate with the Hierarchy of Language Processing
Furman-Haran, Edna; Arzi, Anat; Levkovitz, Yechiel; Malach, Rafael
2016-01-01
Natural sleep provides a powerful model system for studying the neuronal correlates of awareness and state changes in the human brain. To quantitatively map the nature of sleep-induced modulations in sensory responses we presented participants with auditory stimuli possessing different levels of linguistic complexity. Ten participants were scanned using functional magnetic resonance imaging (fMRI) during the waking state and after falling asleep. Sleep staging was based on heart rate measures validated independently on 20 participants using concurrent EEG and heart rate measurements and the results were confirmed using permutation analysis. Participants were exposed to three types of auditory stimuli: scrambled sounds, meaningless word sentences and comprehensible sentences. During non-rapid eye movement (NREM) sleep, we found diminishing brain activation along the hierarchy of language processing, more pronounced in higher processing regions. Specifically, the auditory thalamus showed similar activation levels during sleep and waking states, primary auditory cortex remained activated but showed a significant reduction in auditory responses during sleep, and the high order language-related representation in inferior frontal gyrus (IFG) cortex showed a complete abolishment of responses during NREM sleep. In addition to an overall activation decrease in language processing regions in superior temporal gyrus and IFG, those areas manifested a loss of semantic selectivity during NREM sleep. Our results suggest that the decreased awareness to linguistic auditory stimuli during NREM sleep is linked to diminished activity in high order processing stations. PMID:27310812
Diminished Auditory Responses during NREM Sleep Correlate with the Hierarchy of Language Processing.
Wilf, Meytal; Ramot, Michal; Furman-Haran, Edna; Arzi, Anat; Levkovitz, Yechiel; Malach, Rafael
2016-01-01
Natural sleep provides a powerful model system for studying the neuronal correlates of awareness and state changes in the human brain. To quantitatively map the nature of sleep-induced modulations in sensory responses we presented participants with auditory stimuli possessing different levels of linguistic complexity. Ten participants were scanned using functional magnetic resonance imaging (fMRI) during the waking state and after falling asleep. Sleep staging was based on heart rate measures validated independently on 20 participants using concurrent EEG and heart rate measurements and the results were confirmed using permutation analysis. Participants were exposed to three types of auditory stimuli: scrambled sounds, meaningless word sentences and comprehensible sentences. During non-rapid eye movement (NREM) sleep, we found diminishing brain activation along the hierarchy of language processing, more pronounced in higher processing regions. Specifically, the auditory thalamus showed similar activation levels during sleep and waking states, primary auditory cortex remained activated but showed a significant reduction in auditory responses during sleep, and the high order language-related representation in inferior frontal gyrus (IFG) cortex showed a complete abolishment of responses during NREM sleep. In addition to an overall activation decrease in language processing regions in superior temporal gyrus and IFG, those areas manifested a loss of semantic selectivity during NREM sleep. Our results suggest that the decreased awareness to linguistic auditory stimuli during NREM sleep is linked to diminished activity in high order processing stations.
A Large-Scale Analysis of Variance in Written Language
ERIC Educational Resources Information Center
Johns, Brendan T.; Jamieson, Randall K.
2018-01-01
The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers,…
Natural Language Processing Technologies in Radiology Research and Clinical Applications.
Cai, Tianrun; Giannopoulos, Andreas A; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K; Rybicki, Frank J; Mitsouras, Dimitrios
2016-01-01
The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016.
Natural Language Processing Technologies in Radiology Research and Clinical Applications
Cai, Tianrun; Giannopoulos, Andreas A.; Yu, Sheng; Kelil, Tatiana; Ripley, Beth; Kumamaru, Kanako K.; Rybicki, Frank J.
2016-01-01
The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively “mine” these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. “Intelligent” search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. ©RSNA, 2016 PMID:26761536
Open Source Clinical NLP – More than Any Single System
Masanz, James; Pakhomov, Serguei V.; Xu, Hua; Wu, Stephen T.; Chute, Christopher G.; Liu, Hongfang
2014-01-01
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP’s mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice. PMID:25954581
Restrictions on biological adaptation in language evolution.
Chater, Nick; Reali, Florencia; Christiansen, Morten H
2009-01-27
Language acquisition and processing are governed by genetic constraints. A crucial unresolved question is how far these genetic constraints have coevolved with language, perhaps resulting in a highly specialized and species-specific language "module," and how much language acquisition and processing redeploy preexisting cognitive machinery. In the present work, we explored the circumstances under which genes encoding language-specific properties could have coevolved with language itself. We present a theoretical model, implemented in computer simulations, of key aspects of the interaction of genes and language. Our results show that genes for language could have coevolved only with highly stable aspects of the linguistic environment; a rapidly changing linguistic environment does not provide a stable target for natural selection. Thus, a biological endowment could not coevolve with properties of language that began as learned cultural conventions, because cultural conventions change much more rapidly than genes. We argue that this rules out the possibility that arbitrary properties of language, including abstract syntactic principles governing phrase structure, case marking, and agreement, have been built into a "language module" by natural selection. The genetic basis of human language acquisition and processing did not coevolve with language, but primarily predates the emergence of language. As suggested by Darwin, the fit between language and its underlying mechanisms arose because language has evolved to fit the human brain, rather than the reverse.
Jiao, Dazhi; Wild, David J
2009-02-01
This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in our system, such as part of speech (POS) tagging, Named Entity Recognition (NER), dependency parsing, and relation extraction. An evaluation of the system is conducted at the end, yielding very promising results: The POS, dependency parsing, and NER components in our system have achieved a very high level of accuracy as measured by precision, ranging from 85.9% to 98.5%, and the precision and the recall of the interaction extraction component are 76.0% and 82.6%, and for the overall system are 68.4% and 72.2%, respectively.
Crowley, Rebecca S; Castine, Melissa; Mitchell, Kevin; Chavan, Girish; McSherry, Tara; Feldman, Michael
2010-01-01
The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.
Visual sign phonology: insights into human reading and language from a natural soundless phonology.
Petitto, L A; Langdon, C; Stone, A; Andriola, D; Kartheiser, G; Cochran, C
2016-11-01
Among the most prevailing assumptions in science and society about the human reading process is that sound and sound-based phonology are critical to young readers. The child's sound-to-letter decoding is viewed as universal and vital to deriving meaning from print. We offer a different view. The crucial link for early reading success is not between segmental sounds and print. Instead the human brain's capacity to segment, categorize, and discern linguistic patterning makes possible the capacity to segment all languages. This biological process includes the segmentation of languages on the hands in signed languages. Exposure to natural sign language in early life equally affords the child's discovery of silent segmental units in visual sign phonology (VSP) that can also facilitate segmental decoding of print. We consider powerful biological evidence about the brain, how it builds sound and sign phonology, and why sound and sign phonology are equally important in language learning and reading. We offer a testable theoretical account, reading model, and predictions about how VSP can facilitate segmentation and mapping between print and meaning. We explain how VSP can be a powerful facilitator of all children's reading success (deaf and hearing)-an account with profound transformative impact on learning to read in deaf children with different language backgrounds. The existence of VSP has important implications for understanding core properties of all human language and reading, challenges assumptions about language and reading as being tied to sound, and provides novel insight into a remarkable biological equivalence in signed and spoken languages. WIREs Cogn Sci 2016, 7:366-381. doi: 10.1002/wcs.1404 For further resources related to this article, please visit the WIREs website. © 2016 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros
1985-01-01
A collection of presentation visuals associated with the companion report entitled KARL: A Knowledge-Assisted Retrieval Language, is presented. Information is given on data retrieval, natural language database front ends, generic design objectives, processing capababilities and the query processing cycle.
Assessment of Language and Literacy: A Process of Hypothesis Testing for Individual Differences
ERIC Educational Resources Information Center
Scott, Cheryl M.
2011-01-01
Purpose: Older school-aged children and adolescents with persistent language and literacy impairments vary in their individual profiles of linguistic strengths and weaknesses. Given the multidimensional nature and complexity of language, designing an assessment protocol capable of uncovering linguistic variation is challenging. A process of…
Towards Automatic Treatment of Natural Language.
ERIC Educational Resources Information Center
Lonsdale, Deryle
1984-01-01
Because automated natural language processing relies heavily on the still developing fields of linguistics, knowledge representation, and computational linguistics, no system is capable of mimicking human linguistic capabilities. For the present, interactive systems may be used to augment today's technology. (MSE)
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary
NASA Astrophysics Data System (ADS)
Liu, Wuying; Wang, Lin
2018-03-01
The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.
ERIC Educational Resources Information Center
Patterson, Olga
2012-01-01
Domain adaptation of natural language processing systems is challenging because it requires human expertise. While manual effort is effective in creating a high quality knowledge base, it is expensive and time consuming. Clinical text adds another layer of complexity to the task due to privacy and confidentiality restrictions that hinder the…
Studies of Human Memory and Language Processing.
ERIC Educational Resources Information Center
Collins, Allan M.
The purposes of this study were to determine the nature of human semantic memory and to obtain knowledge usable in the future development of computer systems that can converse with people. The work was based on a computer model which is designed to comprehend English text, relating the text to information stored in a semantic data base that is…
Neurolinguistics and psycholinguistics as a basis for computer acquisition of natural language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Powers, D.M.W.
1983-04-01
Research into natural language understanding systems for computers has concentrated on implementing particular grammars and grammatical models of the language concerned. This paper presents a rationale for research into natural language understanding systems based on neurological and psychological principles. Important features of the approach are that it seeks to place the onus of learning the language on the computer, and that it seeks to make use of the vast wealth of relevant psycholinguistic and neurolinguistic theory. 22 references.
Karakülah, Gökhan; Dicle, Oğuz; Koşaner, Ozgün; Suner, Aslı; Birant, Çağdaş Can; Berber, Tolga; Canbek, Sezin
2014-01-01
The lack of laboratory tests for the diagnosis of most of the congenital anomalies renders the physical examination of the case crucial for the diagnosis of the anomaly; and the cases in the diagnostic phase are mostly being evaluated in the light of the literature knowledge. In this respect, for accurate diagnosis, ,it is of great importance to provide the decision maker with decision support by presenting the literature knowledge about a particular case. Here, we demonstrated a methodology for automated scanning and determining of the phenotypic features from the case reports related to congenital anomalies in the literature with text and natural language processing methods, and we created a framework of an information source for a potential diagnostic decision support system for congenital anomalies.
The application of natural language processing to augmentative and alternative communication.
Higginbotham, D Jeffery; Lesher, Gregory W; Moulton, Bryan J; Roark, Brian
2011-01-01
Significant progress has been made in the application of natural language processing (NLP) to augmentative and alternative communication (AAC), particularly in the areas of interface design and word prediction. This article will survey the current state-of-the-science of NLP in AAC and discuss its future applications for the development of next generation of AAC technology.
ERIC Educational Resources Information Center
Duran, Nicholas D.; Hall, Charles; McCarthy, Philip M.; McNamara, Danielle S.
2010-01-01
The words people use and the way they use them can reveal a great deal about their mental states when they attempt to deceive. The challenge for researchers is how to reliably distinguish the linguistic features that characterize these hidden states. In this study, we use a natural language processing tool called Coh-Metrix to evaluate deceptive…
ERIC Educational Resources Information Center
Sager, Naomi
This investigation matches the emerging techniques in computerized natural language processing against emerging needs for such techniques in the information field to evaluate and extend such techniques for future applications and to establish a basis and direction for further research toward these goals. An overview describes developments in the…
Exploring Social Meaning in Online Bilingual Text through Social Network Analysis
2015-09-01
p. 1). 30 GATE development began in 1995. As techniques for natural language processing ( NLP ) are investigated by the research community and...become part of the NLP repetoire, developers incorporate them with wrappers, which allow the output from GATE processes to be recognized as input by...University NEE Named Entity Extraction NLP natural language processing OSD Office of the Secretary of Defense POS parts of speech SBIR Small Business
The language of gene ontology: a Zipf's law analysis.
Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy
2012-06-07
Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
A Codasyl-Type Schema for Natural Language Medical Records
Sager, N.; Tick, L.; Story, G.; Hirschman, L.
1980-01-01
This paper describes a CODASYL (network) database schema for information derived from narrative clinical reports. The goal of this work is to create an automated process that accepts natural language documents as input and maps this information into a database of a type managed by existing database management systems. The schema described here represents the medical events and facts identified through the natural language processing. This processing decomposes each narrative into a set of elementary assertions, represented as MEDFACT records in the database. Each assertion in turn consists of a subject and a predicate classed according to a limited number of medical event types, e.g., signs/symptoms, laboratory tests, etc. The subject and predicate are represented by EVENT records which are owned by the MEDFACT record associated with the assertion. The CODASYL-type network structure was found to be suitable for expressing most of the relations needed to represent the natural language information. However, special mechanisms were developed for storing the time relations between EVENT records and for recording connections (such as causality) between certain MEDFACT records. This schema has been implemented using the UNIVAC DMS-1100 DBMS.
Topaz, Maxim; Radhakrishnan, Kavita; Blackley, Suzanne; Lei, Victor; Lai, Kenneth; Zhou, Li
2016-09-14
This study developed an innovative natural language processing algorithm to automatically identify heart failure (HF) patients with ineffective self-management status (in the domains of diet, physical activity, medication adherence, and adherence to clinician appointments) from narrative discharge summary notes. We also analyzed the association between self-management status and preventable 30-day hospital readmissions. Our natural language system achieved relatively high accuracy (F-measure = 86.3%; precision = 95%; recall = 79.2%) on a testing sample of 300 notes annotated by two human reviewers. In a sample of 8,901 HF patients admitted to our healthcare system, 14.4% (n = 1,282) had documentation of ineffective HF self-management. Adjusted regression analyses indicated that presence of any skill-related self-management deficit (odds ratio [OR] = 1.3, 95% confidence interval [CI] = [1.1, 1.6]) and non-specific ineffective self-management (OR = 1.5, 95% CI = [1.2, 2]) was significantly associated with readmissions. We have demonstrated the feasibility of identifying ineffective HF self-management from electronic discharge summaries with natural language processing. © The Author(s) 2016.
Proceedings of the international conference on cybernetics and societ
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1985-01-01
This book presents the papers given at a conference on artificial intelligence, expert systems and knowledge bases. Topics considered at the conference included automating expert system development, modeling expert systems, causal maps, data covariances, robot vision, image processing, multiprocessors, parallel processing, VLSI structures, man-machine systems, human factors engineering, cognitive decision analysis, natural language, computerized control systems, and cybernetics.
Honda, Masayuki; Matsumoto, Takehiro
2017-01-01
Several kinds of event log data produced in daily clinical activities have yet to be used for secure and efficient improvement of hospital activities. Data Warehouse systems in Hospital Information Systems used for the analysis of structured data such as disease, lab-tests, and medications, have also shown efficient outcomes. This article is focused on two kinds of essential functions: process mining using log data and non-structured data analysis via Natural Language Processing.
Assessing Group Interaction with Social Language Network Analysis
NASA Astrophysics Data System (ADS)
Scholand, Andrew J.; Tausczik, Yla R.; Pennebaker, James W.
In this paper we discuss a new methodology, social language network analysis (SLNA), that combines tools from social language processing and network analysis to assess socially situated working relationships within a group. Specifically, SLNA aims to identify and characterize the nature of working relationships by processing artifacts generated with computer-mediated communication systems, such as instant message texts or emails. Because social language processing is able to identify psychological, social, and emotional processes that individuals are not able to fully mask, social language network analysis can clarify and highlight complex interdependencies between group members, even when these relationships are latent or unrecognized.
Understanding a technical language: A schema-based approach
NASA Technical Reports Server (NTRS)
Falzon, P.
1984-01-01
Workers in many job categories tend to develop technical languages, which are restricted subjects of natural language. A better knowledge of these retrictions provides guidelines for the design of the restricted languages of interactive systems. Accordingly, a technical language used by air-traffic controllers in their communications with pilots was studied. A method of analysis is presented that allows the schemata underlying each category of messages to be identified. This schematic knowledge was implemented in programs, which assume that the goal-oriented aspect of technical languages (and particularly the restricted domain of discourse) limits the processes and the data necessary in order to understand the messages (monosemy, limited vocabulary, evocation of the schemata by some command words, absence of syntax). The programs can interpret, and translate into sequences of action, the messages emitted by the controllers.
ERIC Educational Resources Information Center
Amaral, Luiz; Meurers, Detmar; Ziai, Ramon
2011-01-01
Intelligent language tutoring systems (ILTS) typically analyze learner input to diagnose learner language properties and provide individualized feedback. Despite a long history of ILTS research, such systems are virtually absent from real-life foreign language teaching (FLT). Taking a step toward more closely linking ILTS research to real-life…
Interactive natural language acquisition in a multi-modal recurrent neural architecture
NASA Astrophysics Data System (ADS)
Heinrich, Stefan; Wermter, Stefan
2018-01-01
For the complex human brain that enables us to communicate in natural language, we gathered good understandings of principles underlying language acquisition and processing, knowledge about sociocultural conditions, and insights into activity patterns in the brain. However, we were not yet able to understand the behavioural and mechanistic characteristics for natural language and how mechanisms in the brain allow to acquire and process language. In bridging the insights from behavioural psychology and neuroscience, the goal of this paper is to contribute a computational understanding of appropriate characteristics that favour language acquisition. Accordingly, we provide concepts and refinements in cognitive modelling regarding principles and mechanisms in the brain and propose a neurocognitively plausible model for embodied language acquisition from real-world interaction of a humanoid robot with its environment. In particular, the architecture consists of a continuous time recurrent neural network, where parts have different leakage characteristics and thus operate on multiple timescales for every modality and the association of the higher level nodes of all modalities into cell assemblies. The model is capable of learning language production grounded in both, temporal dynamic somatosensation and vision, and features hierarchical concept abstraction, concept decomposition, multi-modal integration, and self-organisation of latent representations.
1993-04-01
the use of thus seems more natural . It eliminates the parameter a symbolic manipulation program. Their robustness is 790 questionable. variance and...and learning (UMd/GMU), IU and reasoning (ISI/USC), IU and natural language (SUNY Buffalo), and IU and neural nets (new BAA; contracts to be awarded...visual navigation is defined as different natures . Among these are theoretical questions, the process of motion control based on an analysis of im
ERIC Educational Resources Information Center
Burk, Robin K.
2010-01-01
Computational natural language understanding and generation have been a goal of artificial intelligence since McCarthy, Minsky, Rochester and Shannon first proposed to spend the summer of 1956 studying this and related problems. Although statistical approaches dominate current natural language applications, two current research trends bring…
Sound Evidence: The Missing Piece of the Jigsaw in Formulaic Language Research
ERIC Educational Resources Information Center
Lin, Phoebe M. S.
2012-01-01
With the ever increasing number of studies on formulaic language, we are beginning to learn more about the processing of formulaic language (e.g. Ellis et al. 2008; Siyanova et al. 2011), its use in speech (e.g. Aijmer 1996; Wood 2012) and writing (e.g. Hyland 2008a, 2008b) and its application in natural language processing (e.g. Tschichold 2000).…
Introduction to the special issue: parsimony and redundancy in models of language.
Wiechmann, Daniel; Kerz, Elma; Snider, Neal; Jaeger, T Florian
2013-09-01
One of the most fundamental goals in linguistic theory is to understand the nature of linguistic knowledge, that is, the representations and mechanisms that figure in a cognitively plausible model of human language-processing. The past 50 years have witnessed the development and refinement of various theories about what kind of 'stuff' human knowledge of language consists of, and technological advances now permit the development of increasingly sophisticated computational models implementing key assumptions of different theories from both rationalist and empiricist perspectives. The present special issue does not aim to present or discuss the arguments for and against the two epistemological stances or discuss evidence that supports either of them (cf. Bod, Hay, & Jannedy, 2003; Christiansen & Chater, 2008; Hauser, Chomsky, & Fitch, 2002; Oaksford & Chater, 2007; O'Donnell, Hauser, & Fitch, 2005). Rather, the research presented in this issue, which we label usage-based here, conceives of linguistic knowledge as being induced from experience. According to the strongest of such accounts, the acquisition and processing of language can be explained with reference to general cognitive mechanisms alone (rather than with reference to innate language-specific mechanisms). Defined in these terms, usage-based approaches encompass approaches referred to as experience-based, performance-based and/or emergentist approaches (Amrnon & Snider, 2010; Bannard, Lieven, & Tomasello, 2009; Bannard & Matthews, 2008; Chater & Manning, 2006; Clark & Lappin, 2010; Gerken, Wilson, & Lewis, 2005; Gomez, 2002;
Natural language processing, pragmatics, and verbal behavior
Cherpas, Chris
1992-01-01
Natural Language Processing (NLP) is that part of Artificial Intelligence (AI) concerned with endowing computers with verbal and listener repertoires, so that people can interact with them more easily. Most attention has been given to accurately parsing and generating syntactic structures, although NLP researchers are finding ways of handling the semantic content of language as well. It is increasingly apparent that understanding the pragmatic (contextual and consequential) dimension of natural language is critical for producing effective NLP systems. While there are some techniques for applying pragmatics in computer systems, they are piecemeal, crude, and lack an integrated theoretical foundation. Unfortunately, there is little awareness that Skinner's (1957) Verbal Behavior provides an extensive, principled pragmatic analysis of language. The implications of Skinner's functional analysis for NLP and for verbal aspects of epistemology lead to a proposal for a “user expert”—a computer system whose area of expertise is the long-term computer user. The evolutionary nature of behavior suggests an AI technology known as genetic algorithms/programming for implementing such a system. ImagesFig. 1 PMID:22477052
ChemicalTagger: A tool for semantic text-mining in chemistry.
Hawizy, Lezan; Jessop, David M; Adams, Nico; Murray-Rust, Peter
2011-05-16
The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.
Data-Driven Approaches for Paraphrasing across Language Variations
ERIC Educational Resources Information Center
Xu, Wei
2014-01-01
Our language changes very rapidly, accompanying political, social and cultural trends, as well as the evolution of science and technology. The Internet, especially the social media, has accelerated this process of change. This poses a severe challenge for both human beings and natural language processing (NLP) systems, which usually only model a…
Comeau, Donald C.; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W. John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net PMID:24935050
A natural command language for C/3/I applications
NASA Astrophysics Data System (ADS)
Mergler, J. P.
1980-03-01
The article discusses the development of a natural command language and a control and analysis console designed to simplify the task of the operator in field of Command, Control, Communications, and Intelligence. The console is based on a DEC LSI-11 microcomputer, supported by 16-K words of memory and a serial interface component. Discussion covers the language, which utilizes English and a natural syntax, and how it is integrated with the hardware. It is concluded that results have demonstrated the effectiveness of this natural command language.
An Analysis of College Students' Attitudes towards Error Correction in EFL Context
ERIC Educational Resources Information Center
Zhu, Honglin
2010-01-01
This article is based on a survey on the attitudes towards the error correction by their teachers in the process of teaching and learning and it is intended to improve the language teachers' understanding of the nature of error correction. Based on the analysis, the article expounds some principles and techniques that can be applied in the process…
Review of Knowledge Enhanced Electronic Logic (KEEL) Technology
2016-09-01
compiled. Two KEEL Engine processing models are available for most languages : The “Normal Model” processes information as if it was processed on an... language also makes it easy to “see” the functional relationships and the dynamic (interactive) nature of the language , allows one to interact with...for the Accelerated Processing Model ( Patent number 7,512,581 (3/31/2009)). In June 2006, application US 11/446/801 was submitted to support
Natural Language Processing and Game-Based Practice in iSTART
ERIC Educational Resources Information Center
Jackson, Tanner; Boonthum-Denecke, Chutima; McNamara, Danielle
2015-01-01
Intelligent Tutoring Systems (ITSs) are situated in a potential struggle between effective pedagogy and system enjoyment and engagement. iSTART (Interactive Strategy Training for Active Reading and Thinking), a reading strategy tutoring system in which students practice generating self-explanations and using reading strategies, employs two devices…
Automated Guidance for Thermodynamics Essays: Critiquing versus Revisiting
ERIC Educational Resources Information Center
Donnelly, Dermot F.; Vitale, Jonathan M.; Linn, Marcia C.
2015-01-01
Middle school students struggle to explain thermodynamics concepts. In this study, to help students succeed, we use a natural language processing program to analyze their essays explaining the aspects of thermodynamics and provide guidance based on the automated score. The 346 sixth-grade students were assigned to either the critique condition…
Semantic Search of Web Services
ERIC Educational Resources Information Center
Hao, Ke
2013-01-01
This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…
Common Ground: An Interactive Visual Exploration and Discovery for Complex Health Data
2015-04-01
working with Intermountain Healthcare on a new rich dataset extracted directly from medical notes using natural language processing ( NLP ) algorithms...probabilities based on a state- of-the-art NLP classifiers. At that stage the data did not include geographic information or temporal information but we
Robo-Sensei's NLP-Based Error Detection and Feedback Generation
ERIC Educational Resources Information Center
Nagata, Noriko
2009-01-01
This paper presents a new version of Robo-Sensei's NLP (Natural Language Processing) system which updates the version currently available as the software package "ROBO-SENSEI: Personal Japanese Tutor" (Nagata, 2004). Robo-Sensei's NLP system includes a lexicon, a morphological generator, a word segmentor, a morphological parser, a syntactic…
Implicit Schemata and Categories in Memory-Based Language Processing
ERIC Educational Resources Information Center
van den Bosch, Antal; Daelemans, Walter
2013-01-01
Memory-based language processing (MBLP) is an approach to language processing based on exemplar storage during learning and analogical reasoning during processing. From a cognitive perspective, the approach is attractive as a model for human language processing because it does not make any assumptions about the way abstractions are shaped, nor any…
Constructing Concept Schemes From Astronomical Telegrams Via Natural Language Clustering
NASA Astrophysics Data System (ADS)
Graham, Matthew; Zhang, M.; Djorgovski, S. G.; Donalek, C.; Drake, A. J.; Mahabal, A.
2012-01-01
The rapidly emerging field of time domain astronomy is one of the most exciting and vibrant new research frontiers, ranging in scientific scope from studies of the Solar System to extreme relativistic astrophysics and cosmology. It is being enabled by a new generation of large synoptic digital sky surveys - LSST, PanStarrs, CRTS - that cover large areas of sky repeatedly, looking for transient objects and phenomena. One of the biggest challenges facing these is the automated classification of transient events, a process that needs machine-processible astronomical knowledge. Semantic technologies enable the formal representation of concepts and relations within a particular domain. ATELs (http://www.astronomerstelegram.org) are a commonly-used means for reporting and commenting upon new astronomical observations of transient sources (supernovae, stellar outbursts, blazar flares, etc). However, they are loose and unstructured and employ scientific natural language for description: this makes automated processing of them - a necessity within the next decade with petascale data rates - a challenge. Nevertheless they represent a potentially rich corpus of information that could lead to new and valuable insights into transient phenomena. This project lies in the cutting-edge field of astrosemantics, a branch of astroinformatics, which applies semantic technologies to astronomy. The ATELs have been used to develop an appropriate concept scheme - a representation of the information they contain - for transient astronomy using hierarchical clustering of processed natural language. This allows us to automatically organize ATELs based on the vocabulary used. We conclude that we can use simple algorithms to process and extract meaning from astronomical textual data.
PASTE: patient-centered SMS text tagging in a medication management system.
Stenner, Shane P; Johnson, Kevin B; Denny, Joshua C
2012-01-01
To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages.
Research in Knowledge Representation for Natural Language Understanding
1980-11-01
artificial intelligence, natural language understanding , parsing, syntax, semantics, speaker meaning, knowledge representation, semantic networks...TinB PAGE map M W006 1Report No. 4513 L RESEARCH IN KNOWLEDGE REPRESENTATION FOR NATURAL LANGUAGE UNDERSTANDING Annual Report 1 September 1979 to 31... understanding , knowledge representation, and knowledge based inference. The work that we have been doing falls into three classes, successively motivated by
NASA Astrophysics Data System (ADS)
Gómez-Rodríguez, Carlos
2017-07-01
Liu et al. [1] provide a comprehensive account of research on dependency distance in human languages. While the article is a very rich and useful report on this complex subject, here I will expand on a few specific issues where research in computational linguistics (specifically natural language processing) can inform DDM research, and vice versa. These aspects have not been explored much in [1] or elsewhere, probably due to the little overlap between both research communities, but they may provide interesting insights for improving our understanding of the evolution of human languages, the mechanisms by which the brain processes and understands language, and the construction of effective computer systems to achieve this goal.
SEMCARE: Multilingual Semantic Search in Semi-Structured Clinical Data.
López-García, Pablo; Kreuzthaler, Markus; Schulz, Stefan; Scherr, Daniel; Daumke, Philipp; Markó, Kornél; Kors, Jan A; van Mulligen, Erik M; Wang, Xinkai; Gonna, Hanney; Behr, Elijah; Honrado, Ángel
2016-01-01
The vast amount of clinical data in electronic health records constitutes a great potential for secondary use. However, most of this content consists of unstructured or semi-structured texts, which is difficult to process. Several challenges are still pending: medical language idiosyncrasies in different natural languages, and the large variety of medical terminology systems. In this paper we present SEMCARE, a European initiative designed to minimize these problems by providing a multi-lingual platform (English, German, and Dutch) that allows users to express complex queries and obtain relevant search results from clinical texts. SEMCARE is based on a selection of adapted biomedical terminologies, together with Apache UIMA and Apache Solr as open source state-of-the-art natural language pipeline and indexing technologies. SEMCARE has been deployed and is currently being tested at three medical institutions in the UK, Austria, and the Netherlands, showing promising results in a cardiology use case.
Payne, Philip R O; Kwok, Alan; Dhaval, Rakesh; Borlawsky, Tara B
2009-03-01
The conduct of large-scale translational studies presents significant challenges related to the storage, management and analysis of integrative data sets. Ideally, the application of methodologies such as conceptual knowledge discovery in databases (CKDD) provides a means for moving beyond intuitive hypothesis discovery and testing in such data sets, and towards the high-throughput generation and evaluation of knowledge-anchored relationships between complex bio-molecular and phenotypic variables. However, the induction of such high-throughput hypotheses is non-trivial, and requires correspondingly high-throughput validation methodologies. In this manuscript, we describe an evaluation of the efficacy of a natural language processing-based approach to validating such hypotheses. As part of this evaluation, we will examine a phenomenon that we have labeled as "Conceptual Dissonance" in which conceptual knowledge derived from two or more sources of comparable scope and granularity cannot be readily integrated or compared using conventional methods and automated tools.
Extracting Sexual Trauma Mentions from Electronic Medical Notes Using Natural Language Processing.
Divita, Guy; Brignone, Emily; Carter, Marjorie E; Suo, Ying; Blais, Rebecca K; Samore, Matthew H; Fargo, Jamison D; Gundlapalli, Adi V
2017-01-01
Patient history of sexual trauma is of clinical relevance to healthcare providers as survivors face adverse health-related outcomes. This paper describes a method for identifying mentions of sexual trauma within the free text of electronic medical notes. A natural language processing pipeline for information extraction was developed and scaled to handle a large corpus of electronic medical notes used for this study from US Veterans Health Administration medical facilities. The tool was used to identify sexual trauma mentions and create snippets around every asserted mention based on a domain-specific lexicon developed for this purpose. All snippets were evaluated by trained human reviewers. An overall positive predictive value (PPV) of 0.90 for identifying sexual trauma mentions from the free text and a PPV of 0.71 at the patient level are reported. The metrics are superior for records from female patients.
Pai, Vinay M; Rodgers, Mary; Conroy, Richard; Luo, James; Zhou, Ruixia; Seto, Belinda
2014-01-01
In April 2012, the National Institutes of Health organized a two-day workshop entitled ‘Natural Language Processing: State of the Art, Future Directions and Applications for Enhancing Clinical Decision-Making’ (NLP-CDS). This report is a summary of the discussions during the second day of the workshop. Collectively, the workshop presenters and participants emphasized the need for unstructured clinical notes to be included in the decision making workflow and the need for individualized longitudinal data tracking. The workshop also discussed the need to: (1) combine evidence-based literature and patient records with machine-learning and prediction models; (2) provide trusted and reproducible clinical advice; (3) prioritize evidence and test results; and (4) engage healthcare professionals, caregivers, and patients. The overall consensus of the NLP-CDS workshop was that there are promising opportunities for NLP and CDS to deliver cognitive support for healthcare professionals, caregivers, and patients. PMID:23921193
Semantic based man-machine interface for real-time communication
NASA Technical Reports Server (NTRS)
Ali, M.; Ai, C.-S.
1988-01-01
A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.
Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M
2018-01-01
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.
Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.
2018-01-01
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546
Cascading Air Power Effects Simulation (CAPES)
2010-05-01
governments using autmated natural language processing techniques. New data on the attitudes of the masses and Arabic mass media were also collected using...environmental contexts. To meet this objective, new data was collected on the behavior of groups and governments using automated natural language processing...illustrate one fruitful avenue for future research. We show that we can model the first, second , and third order effects of US actions on group violence
Usability Evaluation of NLP-PIER: A Clinical Document Search Engine for Researchers.
Hultman, Gretchen; McEwan, Reed; Pakhomov, Serguei; Lindemann, Elizabeth; Skube, Steven; Melton, Genevieve B
2017-01-01
NLP-PIER (Natural Language Processing - Patient Information Extraction for Research) is a self-service platform with a search engine for clinical researchers to perform natural language processing (NLP) queries using clinical notes. We conducted user-centered testing of NLP-PIER's usability to inform future design decisions. Quantitative and qualitative data were analyzed. Our findings will be used to improve the usability of NLP-PIER.
Gross, Alexander; Murthy, Dhiraj
2014-10-01
This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques. Copyright © 2014 Elsevier Ltd. All rights reserved.
Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.
Automated encoding of clinical documents based on natural language processing.
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George
2004-01-01
The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
Language Design in the Processing of Non-Restrictive Relative Clauses in French as a Second Language
ERIC Educational Resources Information Center
Lorente Lapole, Amandine
2012-01-01
Recent years have witnessed a lively debate on the nature of learners' morphological competence and use. Some argue that a breakdown in acquisition of second-language (L2) is expected whenever features required for the analysis of L2 input are not present in the L1. Others argue that features have the same nature and etiology in first…
Knowledge-Based Extensible Natural Language Interface Technology Program
1989-11-30
natural language as its own meta-language to explain the meaning and attributes of the words and idioms of the larguage. Educational courses in language...understood and used by Lydia for human-computer dialogue. The KL enables a systems developer or " teacher -user" to build the system to a point where new...language can be "formal" as in a structured educational language program or it can be "informal" as in the case of a person consulting a dictionary for the
The ALICE System: A Workbench for Learning and Using Language.
ERIC Educational Resources Information Center
Levin, Lori; And Others
1991-01-01
ALICE, a multimedia framework for intelligent computer-assisted language instruction (ICALI) at Carnegie Mellon University (PA), consists of a set of tools for building a number of different types of ICALI programs in any language. Its Natural Language Processing tools for syntactic error detection, morphological analysis, and generation of…
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
A Look at Natural Language Retrieval Systems
ERIC Educational Resources Information Center
Townley, Helen M.
1971-01-01
Natural language systems are seen as falling into two classes - those which process and analyse the input and store it in an ordered fashion, and those which employ controls at the output stage. A variety of systems of both types is reviewed, and their respective features are discussed. (12 references) (Author/NH)
Leveraging Code Comments to Improve Software Reliability
ERIC Educational Resources Information Center
Tan, Lin
2009-01-01
Commenting source code has long been a common practice in software development. This thesis, consisting of three pieces of work, made novel use of the code comments written in natural language to improve software reliability. Our solution combines Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis techniques to…
First Toronto Conference on Database Users. Systems that Enhance User Performance.
ERIC Educational Resources Information Center
Doszkocs, Tamas E.; Toliver, David
1987-01-01
The first of two papers discusses natural language searching as a user performance enhancement tool, focusing on artificial intelligence applications for information retrieval and problems with natural language processing. The second presents a conceptual framework for further development and future design of front ends to online bibliographic…
Learning a Foreign Language: A New Path to Enhancement of Cognitive Functions
ERIC Educational Resources Information Center
Shoghi Javan, Sara; Ghonsooly, Behzad
2018-01-01
The complicated cognitive processes involved in natural (primary) bilingualism lead to significant cognitive development. Executive functions as a fundamental component of human cognition are deemed to be affected by language learning. To date, a large number of studies have investigated how natural (primary) bilingualism influences executive…
Exploiting salient semantic analysis for information retrieval
NASA Astrophysics Data System (ADS)
Luo, Jing; Meng, Bo; Quan, Changqin; Tu, Xinhui
2016-11-01
Recently, many Wikipedia-based methods have been proposed to improve the performance of different natural language processing (NLP) tasks, such as semantic relatedness computation, text classification and information retrieval. Among these methods, salient semantic analysis (SSA) has been proven to be an effective way to generate conceptual representation for words or documents. However, its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use SSA to improve the information retrieval performance, and propose a SSA-based retrieval method under the language model framework. First, SSA model is adopted to build conceptual representations for documents and queries. Then, these conceptual representations and the bag-of-words (BOW) representations can be used in combination to estimate the language models of queries and documents. The proposed method is evaluated on several standard text retrieval conference (TREC) collections. Experiment results on standard TREC collections show the proposed models consistently outperform the existing Wikipedia-based retrieval methods.
A role for relaxed selection in the evolution of the language capacity
Deacon, Terrence W.
2010-01-01
Explaining the extravagant complexity of the human language and our competence to acquire it has long posed challenges for natural selection theory. To answer his critics, Darwin turned to sexual selection to account for the extreme development of language. Many contemporary evolutionary theorists have invoked incredibly lucky mutation or some variant of the assimilation of acquired behaviors to innate predispositions in an effort to explain it. Recent evodevo approaches have identified developmental processes that help to explain how complex functional synergies can evolve by Darwinian means. Interestingly, many of these developmental mechanisms bear a resemblance to aspects of Darwin's mechanism of natural selection, often differing only in one respect (e.g., form of duplication, kind of variation, competition/cooperation). A common feature is an interplay between processes of stabilizing selection and processes of relaxed selection at different levels of organism function. These may play important roles in the many levels of evolutionary process contributing to language. Surprisingly, the relaxation of selection at the organism level may have been a source of many complex synergistic features of the human language capacity, and may help explain why so much language information is “inherited” socially. PMID:20445088
A Tutorial on Techniques and Applications for Natural Language Processing
1983-10-17
mentioned above as specific to context-free grammars were tackled by linguists, in particular Chomsky [21, 221 through Transformational Grammar . As shown...DTIC e, C 17 October 1983 MAY 1,5 1990 DEPARTMENT of COMPUTER SCIENCE Approved for pu ]3 -- ,. " Carnegie-Mellon University . . . - -A.,,Anm m n n n n ln...A Tutorial on Techniques and Applications for Natural Language Processing Philip J. Hayes and Jaime G. Carbonell Carnegie-Mellon University 17
A Natural Language Interface to Databases
NASA Technical Reports Server (NTRS)
Ford, D. R.
1990-01-01
The development of a Natural Language Interface (NLI) is presented which is semantic-based and uses Conceptual Dependency representation. The system was developed using Lisp and currently runs on a Symbolics Lisp machine.
A Text Knowledge Base from the AI Handbook.
ERIC Educational Resources Information Center
Simmons, Robert F.
1987-01-01
Describes a prototype natural language text knowledge system (TKS) that was used to organize 50 pages of a handbook on artificial intelligence as an inferential knowledge base with natural language query and command capabilities. Representation of text, database navigation, query systems, discourse structuring, and future research needs are…
Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models
2009-01-01
88 4 Monolingually -Derived Phrasal Paraphrase Generation for Statistical Ma- chine Translation 90 4.1...123 4.4 Spanish-English (S2E) results . . . . . . . . . . . . . . . . . . . . . . 125 4.5 Gains from using larger monolingual corpora for...96 4.2 Visual example of a phrasal distributional profile . . . . . . . . . . . . 103 4.3 Monolingual corpus-based distributional
Indexing Anatomical Phrases in Neuro-Radiology Reports to the UMLS 2005AA
Bashyam, Vijayaraghavan; Taira, Ricky K.
2005-01-01
This work describes a methodology to index anatomical phrases to the 2005AA release of the Unified Medical Language System (UMLS). A phrase chunking tool based on Natural Language Processing (NLP) was developed to identify semantically coherent phrases within medical reports. Using this phrase chunker, a set of 2,551 unique anatomical phrases was extracted from brain radiology reports. These phrases were mapped to the 2005AA release of the UMLS using a vector space model. Precision for the task of indexing unique phrases was 0.87. PMID:16778995
Topaz, Maxim; Lai, Kenneth; Dowding, Dawn; Lei, Victor J; Zisberg, Anna; Bowles, Kathryn H; Zhou, Li
2016-12-01
Electronic health records are being increasingly used by nurses with up to 80% of the health data recorded as free text. However, only a few studies have developed nursing-relevant tools that help busy clinicians to identify information they need at the point of care. This study developed and validated one of the first automated natural language processing applications to extract wound information (wound type, pressure ulcer stage, wound size, anatomic location, and wound treatment) from free text clinical notes. First, two human annotators manually reviewed a purposeful training sample (n=360) and random test sample (n=1100) of clinical notes (including 50% discharge summaries and 50% outpatient notes), identified wound cases, and created a gold standard dataset. We then trained and tested our natural language processing system (known as MTERMS) to process the wound information. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard. We also compared the prevalence of wound cases identified from free-text data with coded diagnoses in the structured data. The testing dataset included 101 notes (9.2%) with wound information. The overall system performance was good (F-measure is a compiled measure of system's accuracy=92.7%), with best results for wound treatment (F-measure=95.7%) and poorest results for wound size (F-measure=81.9%). Only 46.5% of wound notes had a structured code for a wound diagnosis. The natural language processing system achieved good performance on a subset of randomly selected discharge summaries and outpatient notes. In more than half of the wound notes, there were no coded wound diagnoses, which highlight the significance of using natural language processing to enrich clinical decision making. Our future steps will include expansion of the application's information coverage to other relevant wound factors and validation of the model with external data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Memory Reconsolidation and Computational Learning
2010-03-01
Cooper and H.T. Siegelmann, "Memory Reconsolidation for Natural Language Processing," Cognitive Neurodynamics , 3, 2009: 365-372. M.M. Olsen, N...computerized memories and other state of the art cognitive architectures, our memory system has the ability to process on-line and in real-time as...on both continuous and binary inputs, unlike state of the art methods in case based reasoning and in cognitive architectures, which are bound to
NASA Astrophysics Data System (ADS)
Sakakibara, Kai; Hagiwara, Masafumi
In this paper, we propose a 3-dimensional self-organizing memory and describe its application to knowledge extraction from natural language. First, the proposed system extracts a relation between words by JUMAN (morpheme analysis system) and KNP (syntax analysis system), and stores it in short-term memory. In the short-term memory, the relations are attenuated with the passage of processing. However, the relations with high frequency of appearance are stored in the long-term memory without attenuation. The relations in the long-term memory are placed to the proposed 3-dimensional self-organizing memory. We used a new learning algorithm called ``Potential Firing'' in the learning phase. In the recall phase, the proposed system recalls relational knowledge from the learned knowledge based on the input sentence. We used a new recall algorithm called ``Waterfall Recall'' in the recall phase. We added a function to respond to questions in natural language with ``yes/no'' in order to confirm the validity of proposed system by evaluating the quantity of correct answers.
Natural Language Processing in aid of FlyBase curators
Karamanis, Nikiforos; Seal, Ruth; Lewin, Ian; McQuilton, Peter; Vlachos, Andreas; Gasperin, Caroline; Drysdale, Rachel; Briscoe, Ted
2008-01-01
Background Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. Results PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. Conclusion We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully. PMID:18410678
Emadzadeh, Ehsan; Sarker, Abeed; Nikfarjam, Azadeh; Gonzalez, Graciela
2017-01-01
Social networks, such as Twitter, have become important sources for active monitoring of user-reported adverse drug reactions (ADRs). Automatic extraction of ADR information can be crucial for healthcare providers, drug manufacturers, and consumers. However, because of the non-standard nature of social media language, automatically extracted ADR mentions need to be mapped to standard forms before they can be used by operational pharmacovigilance systems. We propose a modular natural language processing pipeline for mapping (normalizing) colloquial mentions of ADRs to their corresponding standardized identifiers. We seek to accomplish this task and enable customization of the pipeline so that distinct unlabeled free text resources can be incorporated to use the system for other normalization tasks. Our approach, which we call Hybrid Semantic Analysis (HSA), sequentially employs rule-based and semantic matching algorithms for mapping user-generated mentions to concept IDs in the Unified Medical Language System vocabulary. The semantic matching component of HSA is adaptive in nature and uses a regression model to combine various measures of semantic relatedness and resources to optimize normalization performance on the selected data source. On a publicly available corpus, our normalization method achieves 0.502 recall and 0.823 precision (F-measure: 0.624). Our proposed method outperforms a baseline based on latent semantic analysis and another that uses MetaMap.
Carrell, David S.; Halgrim, Scott; Tran, Diem-Thy; Buist, Diana S. M.; Chubak, Jessica; Chapman, Wendy W.; Savova, Guergana
2014-01-01
The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction. PMID:24488511
Toledo, Cíntia Matsuda; Cunha, Andre; Scarton, Carolina; Aluísio, Sandra
2014-01-01
Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario. The aims were to describe how to:(i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and(ii) automatically identify the features that best distinguish the groups. The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo 18 were used,which included 200 healthy Brazilians of both genders. A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.
Positionalism of Relations and Its Consequences for Fact-Oriented Modelling
NASA Astrophysics Data System (ADS)
Keet, C. Maria
Natural language-based conceptual modelling as well as the use of diagrams have been essential components of fact-oriented modelling from its inception. However, transforming natural language to its corresponding object-role modelling diagram, and vv., is not trivial. This is due to the more fundamental problem of the different underlying ontological commitments concerning positionalism of the fact types. The natural language-based approach adheres to the standard view whereas the diagram-based approach has a positionalist commitment, which is, from an ontological perspective, incompatible with the former. This hinders seamless transition between the two approaches and affects interoperability with other conceptual modelling languages. One can adopt either the limited standard view or the positionalist commitment with fact types that may not be easily verbalisable but which facilitates data integration and reusability of conceptual models with ontological foundations.
van der Slik, Frans W P; van Hout, Roeland W N M; Schepens, Job J
2015-01-01
Gender differences were analyzed across countries of origin and continents, and across mother tongues and language families, using a large-scale database, containing information on 27,119 adult learners of Dutch as a second language. Female learners consistently outperformed male learners in speaking and writing proficiency in Dutch as a second language. This gender gap remained remarkably robust and constant when other learner characteristics were taken into account, such as education, age of arrival, length of residence and hours studying Dutch. For reading and listening skills in Dutch, no gender gap was found. In addition, we found a general gender by education effect for all four language skills in Dutch for speaking, writing, reading, and listening. Female language learners turned out to profit more from higher educational training than male learners do in adult second language acquisition. These findings do not seem to match nurture-oriented explanatory frameworks based for instance on a human capital approach or gender-specific acculturation processes. Rather, they seem to corroborate a nature-based, gene-environment correlational framework in which language proficiency being a genetically-influenced ability interacting with environmental factors such as motivation, orientation, education, and learner strategies that still mediate between endowment and acquiring language proficiency at an adult stage.
ChemicalTagger: A tool for semantic text-mining in chemistry
2011-01-01
Background The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Results We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). Conclusions It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision. PMID:21575201
Incorporating advanced language models into the P300 speller using particle filtering
NASA Astrophysics Data System (ADS)
Speier, W.; Arnold, C. W.; Deshpande, A.; Knall, J.; Pouratian, N.
2015-08-01
Objective. The P300 speller is a common brain-computer interface (BCI) application designed to communicate language by detecting event related potentials in a subject’s electroencephalogram signal. Information about the structure of natural language can be valuable for BCI communication, but attempts to use this information have thus far been limited to rudimentary n-gram models. While more sophisticated language models are prevalent in natural language processing literature, current BCI analysis methods based on dynamic programming cannot handle their complexity. Approach. Sampling methods can overcome this complexity by estimating the posterior distribution without searching the entire state space of the model. In this study, we implement sequential importance resampling, a commonly used particle filtering (PF) algorithm, to integrate a probabilistic automaton language model. Main result. This method was first evaluated offline on a dataset of 15 healthy subjects, which showed significant increases in speed and accuracy when compared to standard classification methods as well as a recently published approach using a hidden Markov model (HMM). An online pilot study verified these results as the average speed and accuracy achieved using the PF method was significantly higher than that using the HMM method. Significance. These findings strongly support the integration of domain-specific knowledge into BCI classification to improve system performance.
SWAN: An expert system with natural language interface for tactical air capability assessment
NASA Technical Reports Server (NTRS)
Simmons, Robert M.
1987-01-01
SWAN is an expert system and natural language interface for assessing the war fighting capability of Air Force units in Europe. The expert system is an object oriented knowledge based simulation with an alternate worlds facility for performing what-if excursions. Responses from the system take the form of generated text, tables, or graphs. The natural language interface is an expert system in its own right, with a knowledge base and rules which understand how to access external databases, models, or expert systems. The distinguishing feature of the Air Force expert system is its use of meta-knowledge to generate explanations in the frame and procedure based environment.
ERIC Educational Resources Information Center
Kiraz, George Anton
This book presents a tractable computational model that can cope with complex morphological operations, especially in Semitic languages, and less complex morphological systems present in Western languages. It outlines a new generalized regular rewrite rule system that uses multiple finite-state automata to cater to root-and-pattern morphology,…
An Intelligent Computer Assisted Language Learning System for Arabic Learners
ERIC Educational Resources Information Center
Shaalan, Khaled F.
2005-01-01
This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning…
Natural Language Processing: A Tutorial. Revision
1990-01-01
English in word-for-word language translations. An oft-repeated (although fictional) anecdote illustrates the ... English by a language translation program, became: " The vodka is strong but 3 the steak is rotten." The point made is that vast amounts of knowledge...are required for effective language translations. The initial goal for Language Translation was "fully-automatic high-quality translation" (FAHOT).
Encoding Standards for Linguistic Corpora.
ERIC Educational Resources Information Center
Ide, Nancy
The demand for extensive reusability of large language text collections for natural languages processing research requires development of standardized encoding formats. Such formats must be capable of representing different kinds of information across the spectrum of text types and languages, capable of representing different levels of…
The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
2006-01-01
The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina Demner-Fushman Department of Computer Science... dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that...in which the documents are written. In dictionary -based CLIR techniques, the princi- pal source of translation knowledge is a translation lexicon
The native-language benefit for talker identification is robust in 7.5-month-old infants.
Fecher, Natalie; Johnson, Elizabeth K
2018-04-26
Adults recognize talkers better when the talkers speak a familiar language than when they speak an unfamiliar language. This language familiarity effect (LFE) demonstrates the inseparable nature of linguistic and indexical information in adult spoken language processing. Relatively little is known about children's integration of linguistic and indexical information in speech. For example, to date, only one study has explored the LFE in infants. Here, we sought to better understand the maturation of speech processing abilities in infants by replicating this earlier study using a more stringent experimental design (eliminating a potential voice-language confound), a different test population (English- rather than Dutch-learning infants), and a new language pairing (English vs. Polish rather than Dutch vs. Italian or Japanese). Furthermore, we explored the language exposure conditions required for infants to develop an LFE for a formerly unfamiliar language. We hypothesized based on previous studies (including the perceptual narrowing literature) that infants might develop an LFE more readily than would adults. Although our findings replicate those of the earlier study-demonstrating that the LFE is robust in 7.5-month-olds-we found no evidence that infants need less language exposure than do adults to develop an LFE. We concluded that both infants and adults need extensive (potentially live) exposure to an unfamiliar language before talker identification in that language improves. Moreover, our study suggests that the LFE is likely rooted in early emerging phonology rather than shared lexical knowledge and that infants already closely resemble adults in their processing of linguistic and indexical information. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Chapman, Wendy W; Christensen, Lee M; Wagner, Michael M; Haug, Peter J; Ivanov, Oleg; Dowling, John N; Olszewski, Robert T
2005-01-01
Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems, the free-text must be translated into encoded form. We implemented a biosurveillance detection system from Pennsylvania to monitor the 2002 Winter Olympic Games. Because input data was in free-text format, we used a natural language processing text classifier to automatically classify free-text triage chief complaints into syndromic categories used by the biosurveillance system. The classifier was trained on 4700 chief complaints from Pennsylvania. We evaluated the ability of the classifier to classify free-text chief complaints into syndromic categories with a test set of 800 chief complaints from Utah. The classifier produced the following areas under the ROC curve: Constitutional = 0.95; Gastrointestinal = 0.97; Hemorrhagic = 0.99; Neurological = 0.96; Rash = 1.0; Respiratory = 0.99; Other = 0.96. Using information stored in the system's semantic model, we extracted from the Respiratory classifications lower respiratory complaints and lower respiratory complaints with fever with a precision of 0.97 and 0.96, respectively. Results suggest that a trainable natural language processing text classifier can accurately extract data from free-text chief complaints for biosurveillance.
PASTE: patient-centered SMS text tagging in a medication management system
Johnson, Kevin B; Denny, Joshua C
2011-01-01
Objective To evaluate the performance of a system that extracts medication information and administration-related actions from patient short message service (SMS) messages. Design Mobile technologies provide a platform for electronic patient-centered medication management. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends text messages to cell phones. The object of this work was to extend MMH to allow two-way interaction using mobile phone-based SMS technology. Unprompted text-message communication with patients using natural language could engage patients in their healthcare, but presents unique natural language processing challenges. The authors developed a new functional component of MMH, the Patient-centered Automated SMS Tagging Engine (PASTE). The PASTE web service uses natural language processing methods, custom lexicons, and existing knowledge sources to extract and tag medication information from patient text messages. Measurements A pilot evaluation of PASTE was completed using 130 medication messages anonymously submitted by 16 volunteers via a website. System output was compared with manually tagged messages. Results Verified medication names, medication terms, and action terms reached high F-measures of 91.3%, 94.7%, and 90.4%, respectively. The overall medication name F-measure was 79.8%, and the medication action term F-measure was 90%. Conclusion Other studies have demonstrated systems that successfully extract medication information from clinical documents using semantic tagging, regular expression-based approaches, or a combination of both approaches. This evaluation demonstrates the feasibility of extracting medication information from patient-generated medication messages. PMID:21984605
Trébuchon-Da Fonseca, Agnès; Bénar, Christian-G; Bartoloméi, Fabrice; Régis, Jean; Démonet, Jean-François; Chauvel, Patrick; Liégeois-Chauvel, Catherine
2009-03-01
Regions involved in language processing have been observed in the inferior part of the left temporal lobe. Although collectively labelled 'the Basal Temporal Language Area' (BTLA), these territories are functionally heterogeneous and are involved in language perception (i.e. reading or semantic task) or language production (speech arrest after stimulation). The objective of this study was to clarify the role of BTLA in the language network in an epileptic patient who displayed jargonaphasia. Intracerebral evoked related potentials to verbal and non-verbal stimuli in auditory and visual modalities were recorded from BTLA. Time-frequency analysis was performed during ictal events. Evoked potentials and induced gamma-band activity provided direct evidence that BTLA is sensitive to language stimuli in both modalities, 350 ms after stimulation. In addition, spontaneous gamma-band discharges were recorded from this region during which we observed phonological jargon. The findings emphasize the multimodal nature of this region in speech perception. In the context of transient dysfunction, the patient's lexical semantic processing network is disrupted, reducing spoken output to meaningless phoneme combinations. This rare opportunity to study the BTLA "in vivo" demonstrates its pivotal role in lexico-semantic processing for speech production and its multimodal nature in speech perception.
Teaching where We Are: Place-Based Language Arts
ERIC Educational Resources Information Center
Lundahl, Merrilyne
2011-01-01
This article discusses building ecoliteracy through place-based education (PBE) within English language arts: some ideas of what PBE is, why it's important, and examples of how it might be applied. The author contends that observing nature and creating personal metaphors from the natural world can help students develop keener writing skills and…
The Complex Nature of Bilinguals' Language Usage Modulates Task-Switching Outcomes
Yang, Hwajin; Hartanto, Andree; Yang, Sujin
2016-01-01
In view of inconsistent findings regarding bilingual advantages in executive functions (EF), we reviewed the literature to determine whether bilinguals' different language usage causes measureable changes in the shifting aspects of EF. By drawing on the theoretical framework of the adaptive control hypothesis—which postulates a critical link between bilinguals' varying demands on language control and adaptive cognitive control (Green and Abutalebi, 2013), we examined three factors that characterize bilinguals' language-switching experience: (a) the interactional context of conversational exchanges, (b) frequency of language switching, and (c) typology of code-switching. We also examined whether methodological variations in previous task-switching studies modulate task-specific demands on control processing and lead to inconsistencies in the literature. Our review demonstrates that not only methodological rigor but also a more finely grained, theory-based approach will be required to understand the cognitive consequences of bilinguals' varied linguistic practices in shifting EF. PMID:27199800
"Thinking-for-Writing": A Prolegomenon on Writing Signed Languages.
Rosen, Russell S; Hartman, Maria C; Wang, Ye
2017-01-01
In his article in this American Annals of the Deaf special issue that also includes the present article, Grushkin argues that the writing difficulties of many deaf and hard of hearing children result primarily from the orthographic nature of the writing system; he proposes a new system based on features found in signed languages. In response, the present authors review the literature on D/HH children's writing difficulties, outline the main percepts of and assumptions about writing signed languages, discuss "thinking-for-writing" as a process in developing writing skills, offer research designs to test the effectiveness of writing signed language systems, and provide strategies for adopting "thinking-for-writing" in education. They conclude that until empirical studies show that writing signed languages effectively reflects writers' "thinking-for-writing," the alphabetic orthographic system of English should still be used, and ways should be found to teach D/HH children to use English writing to express their thoughts.
From emblems to diagrams: Kepler's new pictorial language of scientific representation.
Chen-Morris, Raz
2009-01-01
Kepler's treatise on optics of 1604 furnished, along with technical solutions to problems in medieval perspective, a mathematically-based visual language for the observation of nature. This language, based on Kepler's theory of retinal pictures, ascribed a new role to geometrical diagrams. This paper examines Kepler's pictorial language against the backdrop of alchemical emblems that flourished in and around the court of Rudolf II in Prague. It highlights the cultural context in which Kepler's optics was immersed, and the way in which Kepler attempted to demarcate his new science from other modes of the investigation of nature.
Caregiver communication to the child as moderator and mediator of genes for language.
Onnis, Luca
2017-05-15
Human language appears to be unique among natural communication systems, and such uniqueness impinges on both nature and nurture. Human babies are endowed with cognitive abilities that predispose them to learn language, and this process cannot operate in an impoverished environment. To be effectively complete the acquisition of human language in human children requires highly socialised forms of learning, scaffolded over years of prolonged and intense caretaker-child interactions. How genes and environment operate in shaping language is unknown. These two components have traditionally been considered as independent, and often pitted against each other in terms of the nature versus nurture debate. This perspective article considers how innate abilities and experience might instead work together. In particular, it envisages potential scenarios for research, in which early caregiver verbal and non-verbal attachment practices may mediate or moderate the expression of human genetic systems for language. Copyright © 2017 Elsevier B.V. All rights reserved.
Roch, Alexandra M; Mehrabi, Saeed; Krishnan, Anand; Schmidt, Heidi E; Kesterson, Joseph; Beesley, Chris; Dexter, Paul R; Palakal, Mathew; Schmidt, C Max
2015-01-01
Introduction As many as 3% of computed tomography (CT) scans detect pancreatic cysts. Because pancreatic cysts are incidental, ubiquitous and poorly understood, follow-up is often not performed. Pancreatic cysts may have a significant malignant potential and their identification represents a ‘window of opportunity’ for the early detection of pancreatic cancer. The purpose of this study was to implement an automated Natural Language Processing (NLP)-based pancreatic cyst identification system. Method A multidisciplinary team was assembled. NLP-based identification algorithms were developed based on key words commonly used by physicians to describe pancreatic cysts and programmed for automated search of electronic medical records. A pilot study was conducted prospectively in a single institution. Results From March to September 2013, 566 233 reports belonging to 50 669 patients were analysed. The mean number of patients reported with a pancreatic cyst was 88/month (range 78–98). The mean sensitivity and specificity were 99.9% and 98.8%, respectively. Conclusion NLP is an effective tool to automatically identify patients with pancreatic cysts based on electronic medical records (EMR). This highly accurate system can help capture patients ‘at-risk’ of pancreatic cancer in a registry. PMID:25537257
The Naturalization Process in New Mexico. A Guide for ESL Teachers and Advocates.
ERIC Educational Resources Information Center
Irvine, Patricia, Ed.; And Others
This guide provides an overview of the naturalization process and what it means to Hispanic immigrants, describes techniques for integrating English-as-a-Second-Language (ESL) and civics/history content in multilevel classes, offers directions for filling out the naturalization forms and completing the legal steps to naturalization, and provides…
ERIC Educational Resources Information Center
LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.
2007-01-01
The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…
Evolution: Language Use and the Evolution of Languages
NASA Astrophysics Data System (ADS)
Croft, William
Language change can be understood as an evolutionary process. Language change occurs at two different timescales, corresponding to the two steps of the evolutionary process. The first timescale is very short, namely, the production of an utterance: this is where linguistic structures are replicated and language variation is generated. The second timescale is (or can be) very long, namely, the propagation of linguistic variants in the speech community: this is where certain variants are selected over others. At both timescales, the evolutionary process is driven by social interaction and the role language plays in it. An understanding of social interaction at the micro-level—face-to-face interactions—and at the macro-level—the structure of speech communities—gives us the basis for understanding the generation and propagation of language structures, and understanding the nature of language itself.
Building a common pipeline for rule-based document classification.
Patterson, Olga V; Ginter, Thomas; DuVall, Scott L
2013-01-01
Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.
Artificial Intelligence and CALL.
ERIC Educational Resources Information Center
Underwood, John H.
The potential application of artificial intelligence (AI) to computer-assisted language learning (CALL) is explored. Two areas of AI that hold particular interest to those who deal with language meaning--knowledge representation and expert systems, and natural-language processing--are described and examples of each are presented. AI contribution…
What's So Hard about Understanding Language?
ERIC Educational Resources Information Center
Read, Walter; And Others
A discussion of the application of artificial intelligence to natural language processing looks at several problems in language comprehension, involving semantic ambiguity, anaphoric reference, and metonymy. Examples of these problems are cited, and the importance of the computational approach in analyzing them is explained. The approach applies…
Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources
Solovyev, Valery; Ivanov, Vladimir
2016-01-01
Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian. PMID:26955386
Bengali-English Relevant Cross Lingual Information Access Using Finite Automata
NASA Astrophysics Data System (ADS)
Banerjee, Avishek; Bhattacharyya, Swapan; Hazra, Simanta; Mondal, Shatabdi
2010-10-01
CLIR techniques searches unrestricted texts and typically extract term and relationships from bilingual electronic dictionaries or bilingual text collections and use them to translate query and/or document representations into a compatible set of representations with a common feature set. In this paper, we focus on dictionary-based approach by using a bilingual data dictionary with a combination to statistics-based methods to avoid the problem of ambiguity also the development of human computer interface aspects of NLP (Natural Language processing) is the approach of this paper. The intelligent web search with regional language like Bengali is depending upon two major aspect that is CLIA (Cross language information access) and NLP. In our previous work with IIT, KGP we already developed content based CLIA where content based searching in trained on Bengali Corpora with the help of Bengali data dictionary. Here we want to introduce intelligent search because to recognize the sense of meaning of a sentence and it has a better real life approach towards human computer interactions.
Benchmarking natural-language parsers for biological applications using dependency graphs.
Clegg, Andrew B; Shepherd, Adrian J
2007-01-25
Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques.
Benchmarking natural-language parsers for biological applications using dependency graphs
Clegg, Andrew B; Shepherd, Adrian J
2007-01-01
Background Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. Results Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. Conclusion Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques. PMID:17254351
The State-of-the-Art in Natural Language Understanding.
1981-01-28
driven text analysis. If we know a story is about a restaurant, we expect that we may encounter a waitress, menu, table, a bill, food , and other... Pront aids for Data Bases During the 70’s a number of natural language data base front ends apreared: LUNPLR Woods et al 19721 has already been briefly...like to loo.< it inr. ui4 : 3D ’-- "-: handling of novel language, especially netaphor; az-I i,?i nn rti inriq, -mlerstanding systems: the handling of
ERIC Educational Resources Information Center
Weeber, Marc; Klein, Henny; de Jong-van den Berg, Lolkje T. W.; Vos, Rein
2001-01-01
Proposes a two-step model of discovery in which new scientific hypotheses can be generated and subsequently tested. Applying advanced natural language processing techniques to find biomedical concepts in text, the model is implemented in a versatile interactive discovery support tool. This tool is used to successfully simulate Don R. Swanson's…
Task Proficiency and L1 Private Speech
ERIC Educational Resources Information Center
Yamada, Minako
2005-01-01
There is a growing volume of research on task-based language use; however, the nature of "task proficiency" has not yet been clearly defined. In order to gain new insights, this study examines the relationship between the process of communication in an L2 and a task outcome by analysing lexical density, as obtained from the pattern of a…
A Model Based Framework for Semantic Interpretation of Architectural Construction Drawings
ERIC Educational Resources Information Center
Babalola, Olubi Oluyomi
2011-01-01
The study addresses the automated translation of architectural drawings from 2D Computer Aided Drafting (CAD) data into a Building Information Model (BIM), with emphasis on the nature, possible role, and limitations of a drafting language Knowledge Representation (KR) on the problem and process. The central idea is that CAD to BIM translation is a…
Systematic Model-in-the-Loop Test of Embedded Control Systems
NASA Astrophysics Data System (ADS)
Krupp, Alexander; Müller, Wolfgang
Current model-based development processes offer new opportunities for verification automation, e.g., in automotive development. The duty of functional verification is the detection of design flaws. Current functional verification approaches exhibit a major gap between requirement definition and formal property definition, especially when analog signals are involved. Besides lack of methodical support for natural language formalization, there does not exist a standardized and accepted means for formal property definition as a target for verification planning. This article addresses several shortcomings of embedded system verification. An Enhanced Classification Tree Method is developed based on the established Classification Tree Method for Embeded Systems CTM/ES which applies a hardware verification language to define a verification environment.
Keijzer, Merel; de Bot, Kees
2018-01-01
Cognitive advantages for bilinguals have inconsistently been observed in different populations, with different operationalisations of bilingualism, cognitive performance, and the process by which language control transfers to cognitive control. This calls for studies investigating which aspects of multilingualism drive a cognitive advantage, in which populations and under which conditions. This study reports on two cognitive tasks coupled with an extensive background questionnaire on health, wellbeing, personality, language knowledge and language use, administered to 387 older adults in the northern Netherlands, a small but highly multilingual area. Using linear mixed effects regression modeling, we find that when different languages are used frequently in different contexts, enhanced attentional control is observed. Subsequently, a PLS regression model targeting also other influential factors yielded a two-component solution whereby only more sensitive measures of language proficiency and language usage in different social contexts were predictive of cognitive performance above and beyond the contribution of age, gender, income and education. We discuss these findings in light of previous studies that try to uncover more about the nature of bilingualism and the cognitive processes that may drive an advantage. With an unusually large sample size our study advocates for a move away from dichotomous, knowledge-based operationalisations of multilingualism and offers new insights for future studies at the individual level. PMID:29783764
Learning a Foreign Language through the Media. CLCS Occasional Paper No. 18.
ERIC Educational Resources Information Center
Devitt, Sean M.
The use of mass media as a means of learning a foreign language from the beginning of language study is discussed. Using the media enables many of the features of the natural language acquisition process to be brought into play in a way that much current language teaching material does not. This position is supported by recent research into the…
ERIC Educational Resources Information Center
Bilgin, Sezen Seymen
2016-01-01
Code switching involves the interplay of two languages and as well as serving linguistic functions, it has social and psychological implications. In the context of English language teaching, these psychological implications reveal themselves as teachers' thought processes. While the nature of code switching in language classrooms has been widely…
Using natural language processing to analyze physician modifications to data entry templates.
Wilcox, Adam B.; Narus, Scott P.; Bowes, Watson A.
2002-01-01
Efficient data entry by clinicians remains a significant challenge for electronic medical records. Current approaches have largely focused on either structured data entry, which can be limiting in expressive power, or free-text entry, which restricts the use of the data for automated decision support. Text-based templates are a semi-structured data entry method that has been used to assist physicians in manually entering clinical notes, by allowing them to edit predefined example notes. We analyzed changes made to 18,726 sentences from text templates, using a natural language processor. The most common changes were addition or deletion of normal observations, or changes in certainty. We identified common modifications that could be captured in structured form by a graphical user interface. PMID:12463955
Natural Language Processing as a Discipline at LLNL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Firpo, M A
The field of Natural Language Processing (NLP) is described as it applies to the needs of LLNL in handling free-text. The state of the practice is outlined with the emphasis placed on two specific aspects of NLP: Information Extraction and Discourse Integration. A brief description is included of the NLP applications currently being used at LLNL. A gap analysis provides a look at where the technology needs work in order to meet the needs of LLNL. Finally, recommendations are made to meet these needs.
The Relevance of Second Language Acquisition Theory to the Written Error Correction Debate
ERIC Educational Resources Information Center
Polio, Charlene
2012-01-01
The controversies surrounding written error correction can be traced to Truscott (1996) in his polemic against written error correction. He claimed that empirical studies showed that error correction was ineffective and that this was to be expected "given the nature of the correction process and "the nature of language learning" (p. 328, emphasis…
Understanding and representing natural language meaning
NASA Astrophysics Data System (ADS)
Waltz, D. L.; Maran, L. R.; Dorfman, M. H.; Dinitz, R.; Farwell, D.
1982-12-01
During this contract period the authors have: (1) continued investigation of events and actions by means of representation schemes called 'event shape diagrams'; (2) written a parsing program which selects appropriate word and sentence meanings by a parallel process know as activation and inhibition; (3) begun investigation of the point of a story or event by modeling the motivations and emotional behaviors of story characters; (4) started work on combining and translating two machine-readable dictionaries into a lexicon and knowledge base which will form an integral part of our natural language understanding programs; (5) made substantial progress toward a general model for the representation of cognitive relations by comparing English scene and event descriptions with similar descriptions in other languages; (6) constructed a general model for the representation of tense and aspect of verbs; (7) made progress toward the design of an integrated robotics system which accepts English requests, and uses visual and tactile inputs in making decisions and learning new tasks.
Flexible Processing and the Design of Grammar
ERIC Educational Resources Information Center
Sag, Ivan A.; Wasow, Thomas
2015-01-01
We explore the consequences of letting the incremental and integrative nature of language processing inform the design of competence grammar. What emerges is a view of grammar as a system of local monotonic constraints that provide a direct characterization of the signs (the form-meaning correspondences) of a given language. This…
Vivona, Jeanine M
2013-12-01
Like psychoanalysis, poetry is possible because of the nature of verbal language, particularly its potentials to evoke the sensations of lived experience. These potentials are vestiges of the personal relational context in which language is learned, without which there would be no poetry and no psychoanalysis. Such a view of language infuses psychoanalytic writings on poetry, yet has not been fully elaborated. To further that elaboration, a poem by Billy Collins is presented to illustrate the sensorial and imagistic potentials of words, after which the interpersonal processes of language development are explored in an attempt to elucidate the original nature of words as imbued with personal meaning, embodied resonance, and emotion. This view of language and the verbal form allows a fuller understanding of the therapeutic processes of speech and conversation at the heart of psychoanalysis, including the relational potentials of speech between present individuals, which are beyond the reach of poetry. In one sense, the work of the analyst is to create language that mobilizes the experiential, memorial, and relational potentials of words, and in so doing to make a poet out of the patient so that she too can create such language.
Carlson, Laura; Skubic, Marjorie; Miller, Jared; Huo, Zhiyu; Alexenko, Tatiana
2014-07-01
This contribution presents a corpus of spatial descriptions and describes the development of a human-driven spatial language robot system for their comprehension. The domain of application is an eldercare setting in which an assistive robot is asked to "fetch" an object for an elderly resident based on a natural language spatial description given by the resident. In Part One, we describe a corpus of naturally occurring descriptions elicited from a group of older adults within a virtual 3D home that simulates the eldercare setting. We contrast descriptions elicited when participants offered descriptions to a human versus robot avatar, and under instructions to tell the addressee how to find the target versus where the target is. We summarize the key features of the spatial descriptions, including their dynamic versus static nature and the perspective adopted by the speaker. In Part Two, we discuss critical cognitive and perceptual processing capabilities necessary for the robot to establish a common ground with the human user and perform the "fetch" task. Based on the collected corpus, we focus here on resolving the perspective ambiguity and recognizing furniture items used as landmarks in the descriptions. Taken together, the work presented here offers the key building blocks of a robust system that takes as input natural spatial language descriptions and produces commands that drive the robot to successfully fetch objects within our eldercare scenario. Copyright © 2014 Cognitive Science Society, Inc.
Sevenster, M; Buurman, J; Liu, P; Peters, J F; Chang, P J
2015-01-01
Accumulating quantitative outcome parameters may contribute to constructing a healthcare organization in which outcomes of clinical procedures are reproducible and predictable. In imaging studies, measurements are the principal category of quantitative para meters. The purpose of this work is to develop and evaluate two natural language processing engines that extract finding and organ measurements from narrative radiology reports and to categorize extracted measurements by their "temporality". The measurement extraction engine is developed as a set of regular expressions. The engine was evaluated against a manually created ground truth. Automated categorization of measurement temporality is defined as a machine learning problem. A ground truth was manually developed based on a corpus of radiology reports. A maximum entropy model was created using features that characterize the measurement itself and its narrative context. The model was evaluated in a ten-fold cross validation protocol. The measurement extraction engine has precision 0.994 and recall 0.991. Accuracy of the measurement classification engine is 0.960. The work contributes to machine understanding of radiology reports and may find application in software applications that process medical data.
NASA Technical Reports Server (NTRS)
Gill, E. N.
1986-01-01
The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended.
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Smith, Margaret H.; Barringer, Howard; Groce, Alex
2012-01-01
LogScope is a software package for analyzing log files. The intended use is for offline post-processing of such logs, after the execution of the system under test. LogScope can, however, in principle, also be used to monitor systems online during their execution. Logs are checked against requirements formulated as monitors expressed in a rule-based specification language. This language has similarities to a state machine language, but is more expressive, for example, in its handling of data parameters. The specification language is user friendly, simple, and yet expressive enough for many practical scenarios. The LogScope software was initially developed to specifically assist in testing JPL s Mars Science Laboratory (MSL) flight software, but it is very generic in nature and can be applied to any application that produces some form of logging information (which almost any software does).
Social media based NPL system to find and retrieve ARM data: Concept paper
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra
Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
Can, Doğan; Marín, Rebeca A.; Georgiou, Panayiotis G.; Imel, Zac E.; Atkins, David C.; Narayanan, Shrikanth S.
2016-01-01
The dissemination and evaluation of evidence based behavioral treatments for substance abuse problems rely on the evaluation of counselor interventions. In Motivational Interviewing (MI), a treatment that directs the therapist to utilize a particular linguistic style, proficiency is assessed via behavioral coding - a time consuming, non-technological approach. Natural language processing techniques have the potential to scale up the evaluation of behavioral treatments like MI. We present a novel computational approach to assessing components of MI, focusing on one specific counselor behavior – reflections – that are believed to be a critical MI ingredient. Using 57 sessions from 3 MI clinical trials, we automatically detected counselor reflections in a Maximum Entropy Markov Modeling framework using the raw linguistic data derived from session transcripts. We achieved 93% recall, 90% specificity, and 73% precision. Results provide insight into the linguistic information used by coders to make ratings and demonstrate the feasibility of new computational approaches to scaling up the evaluation of behavioral treatments. PMID:26784286
Hassanpour, Saeed; Bay, Graham; Langlotz, Curtis P
2017-06-01
We built a natural language processing (NLP) method to automatically extract clinical findings in radiology reports and characterize their level of change and significance according to a radiology-specific information model. We utilized a combination of machine learning and rule-based approaches for this purpose. Our method is unique in capturing different features and levels of abstractions at surface, entity, and discourse levels in text analysis. This combination has enabled us to recognize the underlying semantics of radiology report narratives for this task. We evaluated our method on radiology reports from four major healthcare organizations. Our evaluation showed the efficacy of our method in highlighting important changes (accuracy 99.2%, precision 96.3%, recall 93.5%, and F1 score 94.7%) and identifying significant observations (accuracy 75.8%, precision 75.2%, recall 75.7%, and F1 score 75.3%) to characterize radiology reports. This method can help clinicians quickly understand the key observations in radiology reports and facilitate clinical decision support, review prioritization, and disease surveillance.
Performance of a Lexical and POS Tagger for Sanskrit
NASA Astrophysics Data System (ADS)
Hellwig, Oliver
Due to the phonetic, morphological, and lexical complexity of Sanskrit, the automatic analysis of this language is a real challenge in the area of natural language processing. The paper describes a series of tests that were performed to assess the accuracy of the tagging program SanskritTagger. To our knowlegde, it offers the first reliable benchmark data for evaluating the quality of taggers for Sanskrit using an unrestricted dictionary and texts from different domains. Based on a detailed analysis of the test results, the paper points out possible directions for future improvements of statistical tagging procedures for Sanskrit.
A common type system for clinical natural language processing
2013-01-01
Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types. PMID:23286462
A common type system for clinical natural language processing.
Wu, Stephen T; Kaggal, Vinod C; Dligach, Dmitriy; Masanz, James J; Chen, Pei; Becker, Lee; Chapman, Wendy W; Savova, Guergana K; Liu, Hongfang; Chute, Christopher G
2013-01-03
One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
Gender, Identity and Intercultural Transformation in Second Language Socialisation
ERIC Educational Resources Information Center
Shi, Xingsong
2006-01-01
In L2 learners' second language socialisation process, males and females from different sociocultural backgrounds have diverse attitudes and access to second language acquisition. In this study, informed by feminist poststructuralist theory, we can see the highly context-sensitive nature of the gendered practices and the corresponding outcomes of…
Data-Informed Language Learning
ERIC Educational Resources Information Center
Godwin-Jones, Robert
2017-01-01
Although data collection has been used in language learning settings for some time, it is only in recent decades that large corpora have become available, along with efficient tools for their use. Advances in natural language processing (NLP) have enabled rich tagging and annotation of corpus data, essential for their effective use in language…
Redundancy and reduction: Speakers manage syntactic information density
Florian Jaeger, T.
2010-01-01
A principle of efficient language production based on information theoretic considerations is proposed: Uniform Information Density predicts that language production is affected by a preference to distribute information uniformly across the linguistic signal. This prediction is tested against data from syntactic reduction. A single multilevel logit model analysis of naturally distributed data from a corpus of spontaneous speech is used to assess the effect of information density on complementizer that-mentioning, while simultaneously evaluating the predictions of several influential alternative accounts: availability, ambiguity avoidance, and dependency processing accounts. Information density emerges as an important predictor of speakers’ preferences during production. As information is defined in terms of probabilities, it follows that production is probability-sensitive, in that speakers’ preferences are affected by the contextual probability of syntactic structures. The merits of a corpus-based approach to the study of language production are discussed as well. PMID:20434141
Developing Formal Correctness Properties from Natural Language Requirements
NASA Technical Reports Server (NTRS)
Nikora, Allen P.
2006-01-01
This viewgraph presentation reviews the rationale of the program to transform natural language specifications into formal notation.Specifically, automate generation of Linear Temporal Logic (LTL)correctness properties from natural language temporal specifications. There are several reasons for this approach (1) Model-based techniques becoming more widely accepted, (2) Analytical verification techniques (e.g., model checking, theorem proving) significantly more effective at detecting types of specification design errors (e.g., race conditions, deadlock) than manual inspection, (3) Many requirements still written in natural language, which results in a high learning curve for specification languages, associated tools and increased schedule and budget pressure on projects reduce training opportunities for engineers, and (4) Formulation of correctness properties for system models can be a difficult problem. This has relevance to NASA in that it would simplify development of formal correctness properties, lead to more widespread use of model-based specification, design techniques, assist in earlier identification of defects and reduce residual defect content for space mission software systems. The presentation also discusses: potential applications, accomplishments and/or technological transfer potential and the next steps.
Dynamic Approaches to Language Processing
ERIC Educational Resources Information Center
Srinivasan, Narayanan
2007-01-01
Symbolic rule-based approaches have been a preferred way to study language and cognition. Dissatisfaction with rule-based approaches in the 1980s lead to alternative approaches to study language, the most notable being the dynamic approaches to language processing. Dynamic approaches provide a significant alternative by not being rule-based and…
van der Slik, Frans W. P.; van Hout, Roeland W. N. M.; Schepens, Job J.
2015-01-01
Gender differences were analyzed across countries of origin and continents, and across mother tongues and language families, using a large-scale database, containing information on 27,119 adult learners of Dutch as a second language. Female learners consistently outperformed male learners in speaking and writing proficiency in Dutch as a second language. This gender gap remained remarkably robust and constant when other learner characteristics were taken into account, such as education, age of arrival, length of residence and hours studying Dutch. For reading and listening skills in Dutch, no gender gap was found. In addition, we found a general gender by education effect for all four language skills in Dutch for speaking, writing, reading, and listening. Female language learners turned out to profit more from higher educational training than male learners do in adult second language acquisition. These findings do not seem to match nurture-oriented explanatory frameworks based for instance on a human capital approach or gender-specific acculturation processes. Rather, they seem to corroborate a nature-based, gene-environment correlational framework in which language proficiency being a genetically-influenced ability interacting with environmental factors such as motivation, orientation, education, and learner strategies that still mediate between endowment and acquiring language proficiency at an adult stage. PMID:26540465
Toledo, Cíntia Matsuda; Cunha, Andre; Scarton, Carolina; Aluísio, Sandra
2014-01-01
Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario. Objective The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups. Methods The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described – simple or complex; presentation order – which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used,which included 200 healthy Brazilians of both genders. Results and Conclusion A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods. PMID:29213908
ERIC Educational Resources Information Center
Alexopoulou, Theodora; Michel, Marije; Murakami, Akira; Meurers, Detmar
2017-01-01
Large-scale learner corpora collected from online language learning platforms, such as the EF-Cambridge Open Language Database (EFCAMDAT), provide opportunities to analyze learner data at an unprecedented scale. However, interpreting the learner language in such corpora requires a precise understanding of tasks: How does the prompt and input of a…
ERIC Educational Resources Information Center
Khany, Reza; Amiri, Majid
2018-01-01
Theoretical developments in second or foreign language motivation research have led to a better understanding of the convoluted nature of motivation in the process of language acquisition. Among these theories, action control theory has recently shown a good deal of explanatory power in second language learning contexts and in the presence of…
Multilingual natural language generation as part of a medical terminology server.
Wagner, J C; Solomon, W D; Michel, P A; Juge, C; Baud, R H; Rector, A L; Scherrer, J R
1995-01-01
Re-usable and sharable, and therefore language-independent concept models are of increasing importance in the medical domain. The GALEN project (Generalized Architecture for Languages Encyclopedias and Nomenclatures in Medicine) aims at developing language-independent concept representation systems as the foundations for the next generation of multilingual coding systems. For use within clinical applications, the content of the model has to be mapped to natural language. A so-called Multilingual Information Module (MM) establishes the link between the language-independent concept model and different natural languages. This text generation software must be versatile enough to cope at the same time with different languages and with different parts of a compositional model. It has to meet, on the one hand, the properties of the language as used in the medical domain and, on the other hand, the specific characteristics of the underlying model and its representation formalism. We propose a semantic-oriented approach to natural language generation that is based on linguistic annotations to a concept model. This approach is realized as an integral part of a Terminology Server, built around the concept model and offering different terminological services for clinical applications.
Modeling Coevolution between Language and Memory Capacity during Language Origin
Gong, Tao; Shuai, Lan
2015-01-01
Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language. PMID:26544876
Modeling Coevolution between Language and Memory Capacity during Language Origin.
Gong, Tao; Shuai, Lan
2015-01-01
Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language.
Intelligent Performance Analysis with a Natural Language Interface
NASA Astrophysics Data System (ADS)
Juuso, Esko K.
2017-09-01
Performance improvement is taken as the primary goal in the asset management. Advanced data analysis is needed to efficiently integrate condition monitoring data into the operation and maintenance. Intelligent stress and condition indices have been developed for control and condition monitoring by combining generalized norms with efficient nonlinear scaling. These nonlinear scaling methodologies can also be used to handle performance measures used for management since management oriented indicators can be presented in the same scale as intelligent condition and stress indices. Performance indicators are responses of the process, machine or system to the stress contributions analyzed from process and condition monitoring data. Scaled values are directly used in intelligent temporal analysis to calculate fluctuations and trends. All these methodologies can be used in prognostics and fatigue prediction. The meanings of the variables are beneficial in extracting expert knowledge and representing information in natural language. The idea of dividing the problems into the variable specific meanings and the directions of interactions provides various improvements for performance monitoring and decision making. The integrated temporal analysis and uncertainty processing facilitates the efficient use of domain expertise. Measurements can be monitored with generalized statistical process control (GSPC) based on the same scaling functions.
A natural language interface plug-in for cooperative query answering in biological databases.
Jamil, Hasan M
2012-06-11
One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
Ahlberg, Daniela Katharina; Bischoff, Heike; Strozyk, Jessica Vanessa; Bryant, Doreen; Kaup, Barbara
2018-01-01
While much support is found for embodied language processing in a first language (L1), evidence for embodiment in second language (L2) processing is rather sparse. In a recent study, we found support for L2 embodiment, but also an influence of L1 on L2 processing in adult learners. In the present study, we compared bilingual schoolchildren who speak German as one of their languages with monolingual German schoolchildren. We presented the German prepositions auf (on), über (above), and unter (under) in a Stroop-like task. Upward or downward responses were made depending on the font colour, resulting in compatible and incompatible trials. We found compatibility effects for all children, but in contrast to the adult sample, there were no processing differences between the children depending on the nature of their other language, suggesting that the processing of German prepositions of bilingual children is embodied in a similar way as in monolingual German children.
Bischoff, Heike; Strozyk, Jessica Vanessa; Bryant, Doreen; Kaup, Barbara
2018-01-01
While much support is found for embodied language processing in a first language (L1), evidence for embodiment in second language (L2) processing is rather sparse. In a recent study, we found support for L2 embodiment, but also an influence of L1 on L2 processing in adult learners. In the present study, we compared bilingual schoolchildren who speak German as one of their languages with monolingual German schoolchildren. We presented the German prepositions auf (on), über (above), and unter (under) in a Stroop-like task. Upward or downward responses were made depending on the font colour, resulting in compatible and incompatible trials. We found compatibility effects for all children, but in contrast to the adult sample, there were no processing differences between the children depending on the nature of their other language, suggesting that the processing of German prepositions of bilingual children is embodied in a similar way as in monolingual German children. PMID:29538404
A Portable Natural Language Interface.
1987-09-01
regrets. - 27 - BIBLIOGRAPHY Bayer, Samuel. "A Theory of Linearization in Relational Grammar ," Senior essay, Yale University , 1984. Dyer, Michael. In... Grammar 1. Chicago: University Chicago Press, 1983. Rustin, R., ed., Natural Language Processing. New York: Algorithmics Press, 1973. Wasow, Tom...most notably, the theory of relational grammar developed by Perlmutter and his associates, and the theory of discourse developed by Barbara Grosz
Sakji, Saoussen; Gicquel, Quentin; Pereira, Suzanne; Kergourlay, Ivan; Proux, Denys; Darmoni, Stéfan; Metzger, Marie-Hélène
2010-01-01
Surveillance of healthcare-associated infections is essential to prevention. A new collaborative project, namely ALADIN, was launched in January 2009 and aims to develop an automated detection tool based on natural language processing of medical documents. The objective of this study was to evaluate the annotation of natural language medical reports of healthcare-associated infections. A software MS Access application (NosIndex) has been developed to interface ECMT XML answer and manual annotation work. ECMT performances were evaluated by an infection control practitioner (ICP). Precision was evaluated for the 2 modules and recall only for the default module. Exclusion rate was defined as ratio between medical terms not found by ECMT and total number of terms evaluated. The medical discharge summaries were randomly selected in 4 medical wards. From the 247 medical terms evaluated, ECMT proposed 428 and 3,721 codes, respectively for the default and expansion modules. The precision was higher with the default module (P1=0.62) than with the expansion (P2=0.47). Performances of ECMT as support tool for the medical annotation were satisfactory.
Communicative Discourse in Second Language Classrooms: From Building Skills to Becoming Skillful
ERIC Educational Resources Information Center
Suleiman, Mahmoud
2013-01-01
The dynamics of the communicative discourse is a natural process that requires an application of a wide range of skills and strategies. In particular, linguistic discourse and the interaction process have a huge impact on promoting literacy and academic skills in all students especially English language learners (ELLs). Using interactive…
Sučević, Jelena; Savić, Andrej M; Popović, Mirjana B; Styles, Suzy J; Ković, Vanja
2015-01-01
There is something about the sound of a pseudoword like takete that goes better with a spiky, than a curvy shape (Köhler, 1929:1947). Yet despite decades of research into sound symbolism, the role of this effect on real words in the lexicons of natural languages remains controversial. We report one behavioural and one ERP study investigating whether sound symbolism is active during normal language processing for real words in a speaker's native language, in the same way as for novel word forms. The results indicate that sound-symbolic congruence has a number of influences on natural language processing: Written forms presented in a congruent visual context generate more errors during lexical access, as well as a chain of differences in the ERP. These effects have a very early onset (40-80 ms, 100-160 ms, 280-320 ms) and are later overshadowed by familiar types of semantic processing, indicating that sound symbolism represents an early sensory-co-activation effect. Copyright © 2015 Elsevier Inc. All rights reserved.
oRis: multiagents approach for image processing
NASA Astrophysics Data System (ADS)
Rodin, Vincent; Harrouet, Fabrice; Ballet, Pascal; Tisseau, Jacques
1998-09-01
In this article, we present a parallel image processing system based on the concept of reactive agents. This means that, in our system, each agent has a very simple behavior which allows it to take a decision (find out an edge, a region, ...) according to its position in the image and to the information enclosed in it. Our system lies in the oRis language, which allows to describe very finely and simply the agents' behaviors. In fact, oRis is an interpreted and dynamic multiagent language. First of all, oRis is an object language with the use of classes regrouping attributes and methods. The syntax is close to the C++ language and includes notions of multiple inheritance, oRis is also an agent language: every object with a method `main()' becomes an agent. This method is cyclically executed by the system scheduler and corresponds to the agent behavior. We also present an application made with oRis. This application allows to detect concentric striae located on different natural `objects' (age-rings of tree, fish otolith growth rings, striae of some minerals, ...). The stopping of the multiagent system is implemented through a technique issued from immunology: the apoptosis.
ERIC Educational Resources Information Center
Eckhaus, Eyal; Davidovitch, Nitza
2018-01-01
This pilot study focuses on the impact of academic conferences from a gender-based perspective. What motivates faculty members to attend conferences? Which conferences do they choose? Can differences be found between men and women in their attitude to the effect of the conference and its contribution to their academic work, in light of many…
ERIC Educational Resources Information Center
Khojasteh, Laleh; Kafipour, Reza
2012-01-01
Using corpus approach, a growing number of researchers blamed textbooks for neglecting important information on the use of grammatical structures in natural English. Likewise, the prescribed Malaysian English textbooks used in schools are reportedly prepared through a process of material development that involves intuition. Hence, a corpus-based…
We succeeded in developing a Natural Language Processing ( NLP ) System with excellent performance characteristics for determining the type of...people (quadruple-annotated) and7,226 of which were double annotated. We also developed an NLP system to extract PT Checklist (PCL) scores from clinical notes with excellent accuracy (98 positive predictive value).
Priming English Past Tense Verbs: Rules or Statistics?
ERIC Educational Resources Information Center
Kielar, A.; Joanisse, Marc F.; Hare, M. L.
2008-01-01
A key question in language processing concerns the rule-like nature of many aspects of grammar. Much research on this topic has focused on English past tense morphology, which comprises a regular, rule-like pattern (e.g., bake-baked) and a set of irregular forms that defy a rule-based description (e.g., take-took). Previous studies have used past…
Wu, Stephen; Miller, Timothy; Masanz, James; Coarr, Matt; Halgrim, Scott; Carrell, David; Clark, Cheryl
2014-01-01
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP. PMID:25393544
Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David
2018-05-09
Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
A meta-analysis of fMRI studies on Chinese orthographic, phonological, and semantic processing.
Wu, Chiao-Yi; Ho, Moon-Ho Ringo; Chen, Shen-Hsing Annabel
2012-10-15
A growing body of neuroimaging evidence has shown that Chinese character processing recruits differential activation from alphabetic languages due to its unique linguistic features. As more investigations on Chinese character processing have recently become available, we applied a meta-analytic approach to summarize previous findings and examined the neural networks for orthographic, phonological, and semantic processing of Chinese characters independently. The activation likelihood estimation (ALE) method was used to analyze eight studies in the orthographic task category, eleven in the phonological and fifteen in the semantic task categories. Converging activation among three language-processing components was found in the left middle frontal gyrus, the left superior parietal lobule and the left mid-fusiform gyrus, suggesting a common sub-network underlying the character recognition process regardless of the task nature. With increasing task demands, the left inferior parietal lobule and the right superior temporal gyrus were specialized for phonological processing, while the left middle temporal gyrus was involved in semantic processing. Functional dissociation was identified in the left inferior frontal gyrus, with the posterior dorsal part for phonological processing and the anterior ventral part for semantic processing. Moreover, bilateral involvement of the ventral occipito-temporal regions was found for both phonological and semantic processing. The results provide better understanding of the neural networks underlying Chinese orthographic, phonological, and semantic processing, and consolidate the findings of additional recruitment of the left middle frontal gyrus and the right fusiform gyrus for Chinese character processing as compared with the universal language network that has been based on alphabetic languages. Copyright © 2012 Elsevier Inc. All rights reserved.
Huang, Yang; Lowe, Henry J; Klein, Dan; Cucina, Russell J
2005-01-01
The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS(R) Specialist Lexicon to improve noun phrase identification within clinical radiology documents. The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)(R) Clinical Document Architecture (CDA)-compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain.
Natural Language Processing Based Instrument for Classification of Free Text Medical Records
2016-01-01
According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260
ng: What next-generation languages can teach us about HENP frameworks in the manycore era
NASA Astrophysics Data System (ADS)
Binet, Sébastien
2011-12-01
Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
Cook, Benjamin L; Progovac, Ana M; Chen, Pei; Mullin, Brian; Hou, Sherry; Baca-Garcia, Enrique
2016-01-01
Natural language processing (NLP) and machine learning were used to predict suicidal ideation and heightened psychiatric symptoms among adults recently discharged from psychiatric inpatient or emergency room settings in Madrid, Spain. Participants responded to structured mental and physical health instruments at multiple follow-up points. Outcome variables of interest were suicidal ideation and psychiatric symptoms (GHQ-12). Predictor variables included structured items (e.g., relating to sleep and well-being) and responses to one unstructured question, "how do you feel today?" We compared NLP-based models using the unstructured question with logistic regression prediction models using structured data. The PPV, sensitivity, and specificity for NLP-based models of suicidal ideation were 0.61, 0.56, and 0.57, respectively, compared to 0.73, 0.76, and 0.62 of structured data-based models. The PPV, sensitivity, and specificity for NLP-based models of heightened psychiatric symptoms (GHQ-12 ≥ 4) were 0.56, 0.59, and 0.60, respectively, compared to 0.79, 0.79, and 0.85 in structured models. NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question. These models have promise for rapidly identifying persons at risk of suicide or psychological distress and could provide a low-cost screening alternative in settings where lengthy structured item surveys are not feasible.
A human mirror neuron system for language: Perspectives from signed languages of the deaf.
Knapp, Heather Patterson; Corina, David P
2010-01-01
Language is proposed to have developed atop the human analog of the macaque mirror neuron system for action perception and production [Arbib M.A. 2005. From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics (with commentaries and author's response). Behavioral and Brain Sciences, 28, 105-167; Arbib M.A. (2008). From grasp to language: Embodied concepts and the challenge of abstraction. Journal de Physiologie Paris 102, 4-20]. Signed languages of the deaf are fully-expressive, natural human languages that are perceived visually and produced manually. We suggest that if a unitary mirror neuron system mediates the observation and production of both language and non-linguistic action, three prediction can be made: (1) damage to the human mirror neuron system should non-selectively disrupt both sign language and non-linguistic action processing; (2) within the domain of sign language, a given mirror neuron locus should mediate both perception and production; and (3) the action-based tuning curves of individual mirror neurons should support the highly circumscribed set of motions that form the "vocabulary of action" for signed languages. In this review we evaluate data from the sign language and mirror neuron literatures and find that these predictions are only partially upheld. 2009 Elsevier Inc. All rights reserved.
Alt, Mary; Arizmendi, Genesis D; Beal, Carole R
2014-07-01
The present study examined the relationship between mathematics and language to better understand the nature of the deficit and the academic implications associated with specific language impairment (SLI) and academic implications for English language learners (ELLs). School-age children (N = 61; 20 SLI, 20 ELL, 21 native monolingual English [NE]) were assessed using a norm-referenced mathematics instrument and 3 experimental computer-based mathematics games that varied in language demands. Group means were compared with analyses of variance. The ELL group was less accurate than the NE group only when tasks were language heavy. In contrast, the group with SLI was less accurate than the groups with NE and ELLs on language-heavy tasks and some language-light tasks. Specifically, the group with SLI was less accurate on tasks that involved comparing numerical symbols and using visual working memory for patterns. However, there were no group differences between children with SLI and peers without SLI on language-light mathematics tasks that involved visual working memory for numerical symbols. Mathematical difficulties of children who are ELLs appear to be related to the language demands of mathematics tasks. In contrast, children with SLI appear to have difficulty with mathematics tasks because of linguistic as well as nonlinguistic processing constraints.
Beliefs about Learning English as a Second Language among Native Groups in Rural Sabah, Malaysia
ERIC Educational Resources Information Center
Krishnasamy, Hariharan N.; Veloo, Arsaythamby; Lu, Ho Fui
2013-01-01
This paper identifies differences between the three ethnic groups, namely, Kadazans/Dusuns, Bajaus, and other minority ethnic groups on the beliefs about learning English as a second language based on the five variables, that is, language aptitude, language learning difficulty, language learning and communicating strategies, nature of language…
ERIC Educational Resources Information Center
Ryder, Nuala; Leinonen, Eeva; Schulz, Joerg
2008-01-01
Background: Pragmatic language impairment in children with specific language impairment has proved difficult to assess, and the nature of their abilities to comprehend pragmatic meaning has not been fully investigated. Aims: To develop both a cognitive approach to pragmatic language assessment based on Relevance Theory and an assessment tool for…
Defining the Language Assessment Literacy Gap: Evidence from a Parliamentary Inquiry
ERIC Educational Resources Information Center
Pill, John; Harding, Luke
2013-01-01
This study identifies a unique context for exploring lay understandings of language testing and, by extension, for characterizing the nature of language assessment literacy among non-practitioners, stemming from data in an inquiry into the registration processes and support for overseas trained doctors by the Australian House of Representatives…
Integrated Processing in Planning and Understanding.
1986-12-01
to language analysis seemed necessary. The second observation was the rather commonsense one that it is easier to understand a foreign language ...syntactic analysis Probably the most widely employed method for natural language analysis is augmea ted transition network parsing, or ATNs (Thorne, Bratley...accomplished. It is for this reason that the programming language Prolog, which implements that general method , has proven so well-stilted to writing ATN
Marques, J Frederico; Canessa, Nicola; Cappa, Stefano
2009-06-01
The inquiry on the nature of truth in language comprehension has a long history of opposite perspectives. These perspectives either consider that there are qualitative differences in the processing of true and false statements, or that these processes are fundamentally the same and only differ in quantitative terms. The present study evaluated the processing nature of true and false statements in terms of patterns of brain activity using event-related functional-Magnetic-Resonance-Imaging (fMRI). We show that when true and false concept-feature statements are controlled for relation strength/ambiguity, their processing is associated to qualitatively different processes. Verifying true statements activates the left inferior parietal cortex and the caudate nucleus, a neural correlate compatible with an extended search and matching process for particular stored information. In contrast, verifying false statements activates the fronto-polar cortex and is compatible with a reasoning process of finding and evaluating a contradiction between the sentence information and stored knowledge.
ERIC Educational Resources Information Center
Commissaire, Eva; Duncan, Lynne G.; Casalis, Severine
2011-01-01
This study explores the nature of orthographic processing skills among French-speaking children in Grades 6 and 8 who are learning English at school as a second language (L2). Two aspects of orthographic processing skills are thought to form a convergent construct in monolingual beginning readers: word-specific knowledge (e.g. "rain-rane") and…
Toward a theory of distributed word expert natural language parsing
NASA Technical Reports Server (NTRS)
Rieger, C.; Small, S.
1981-01-01
An approach to natural language meaning-based parsing in which the unit of linguistic knowledge is the word rather than the rewrite rule is described. In the word expert parser, knowledge about language is distributed across a population of procedural experts, each representing a word of the language, and each an expert at diagnosing that word's intended usage in context. The parser is structured around a coroutine control environment in which the generator-like word experts ask questions and exchange information in coming to collective agreement on sentence meaning. The word expert theory is advanced as a better cognitive model of human language expertise than the traditional rule-based approach. The technical discussion is organized around examples taken from the prototype LISP system which implements parts of the theory.
Innateness and culture in the evolution of language
Kirby, Simon; Dowman, Mike; Griffiths, Thomas L.
2007-01-01
Human language arises from biological evolution, individual learning, and cultural transmission, but the interaction of these three processes has not been widely studied. We set out a formal framework for analyzing cultural transmission, which allows us to investigate how innate learning biases are related to universal properties of language. We show that cultural transmission can magnify weak biases into strong linguistic universals, undermining one of the arguments for strong innate constraints on language learning. As a consequence, the strength of innate biases can be shielded from natural selection, allowing these genes to drift. Furthermore, even when there is no natural selection, cultural transmission can produce apparent adaptations. Cultural transmission thus provides an alternative to traditional nativist and adaptationist explanations for the properties of human languages. PMID:17360393
Odean, Rosalie; Nazareth, Alina; Pruden, Shannon M.
2015-01-01
Developmental systems theory posits that development cannot be segmented by influences acting in isolation, but should be studied through a scientific lens that highlights the complex interactions between these forces over time (Overton, 2013a). This poses a unique challenge for developmental psychologists studying complex processes like language development. In this paper, we advocate for the combining of highly sophisticated data collection technologies in an effort to move toward a more systemic approach to studying language development. We investigate the efficiency and appropriateness of combining eye-tracking technology and the LENA (Language Environment Analysis) system, an automated language analysis tool, in an effort to explore the relation between language processing in early development, and external dynamic influences like parent and educator language input in the home and school environments. Eye-tracking allows us to study language processing via eye movement analysis; these eye movements have been linked to both conscious and unconscious cognitive processing, and thus provide one means of evaluating cognitive processes underlying language development that does not require the use of subjective parent reports or checklists. The LENA system, on the other hand, provides automated language output that describes a child’s language-rich environment. In combination, these technologies provide critical information not only about a child’s language processing abilities but also about the complexity of the child’s language environment. Thus, when used in conjunction these technologies allow researchers to explore the nature of interacting systems involved in language development. PMID:26379591
ERIC Educational Resources Information Center
Geluso, Joe
2013-01-01
Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…
A bootstrapping method for development of Treebank
NASA Astrophysics Data System (ADS)
Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.
2017-01-01
Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].
Machine-aided indexing for NASA STI
NASA Technical Reports Server (NTRS)
Wilson, John
1987-01-01
One of the major components of the NASA/STI processing system is machine-aided indexing (MAI). MAI is a computer process that generates a set of indexing terms selected from NASA's thesaurus, is used for indexing technical reports, is based on text, and is reviewed by indexers. This paper summarizes the MAI objectives and discusses the NASA Lexical Dictionary, subject switching, and phrase matching or natural languages. The benefits of using MAI are mentioned, and MAI production improvement and the future of MAI are briefly addressed.
The Now-or-Never bottleneck: A fundamental constraint on language.
Christiansen, Morten H; Chater, Nick
2016-01-01
Memory is fleeting. New material rapidly obliterates previous material. How, then, can the brain deal successfully with the continual deluge of linguistic input? We argue that, to deal with this "Now-or-Never" bottleneck, the brain must compress and recode linguistic input as rapidly as possible. This observation has strong implications for the nature of language processing: (1) the language system must "eagerly" recode and compress linguistic input; (2) as the bottleneck recurs at each new representational level, the language system must build a multilevel linguistic representation; and (3) the language system must deploy all available information predictively to ensure that local linguistic ambiguities are dealt with "Right-First-Time"; once the original input is lost, there is no way for the language system to recover. This is "Chunk-and-Pass" processing. Similarly, language learning must also occur in the here and now, which implies that language acquisition is learning to process, rather than inducing, a grammar. Moreover, this perspective provides a cognitive foundation for grammaticalization and other aspects of language change. Chunk-and-Pass processing also helps explain a variety of core properties of language, including its multilevel representational structure and duality of patterning. This approach promises to create a direct relationship between psycholinguistics and linguistic theory. More generally, we outline a framework within which to integrate often disconnected inquiries into language processing, language acquisition, and language change and evolution.
ERIC Educational Resources Information Center
Hawson, Anne
1997-01-01
Three threshold hypotheses proposed by Cummins (1976) and Diaz (1985) as explanations of data on the cognitive consequences of bilingualism are examined in depth and compared to one another. A neuroscientifically updated information-processing perspective on the interaction of second-language comprehension and visual-processing ability is…
Geva, Esther; Massey-Garrison, Angela
2013-01-01
The overall objective of this article is to examine how oral language abilities relate to reading profiles in English language learners (ELLs) and English as a first language (EL1) learners, and the extent of similarities and differences between ELLs and EL1s in three reading subgroups: normal readers, poor decoders, and poor comprehenders. The study included 100 ELLs and 50 EL1s in Grade 5. The effect of language group (ELL/EL1) and reading group on cognitive and linguistic skills was examined. Except for vocabulary, there was no language group effect on any measure. However, within ELL and EL1 alike, significant differences were found between reading groups: Normal readers outperformed the two other groups on all the oral language measures. Distinct cognitive and linguistic profiles were associated with poor decoders and poor comprehenders, regardless of language group. The ELL and EL1 poor decoders outperformed the poor comprehenders on listening comprehension and inferencing. The poor decoders displayed phonological-based weaknesses, whereas the poor comprehenders displayed a more generalized language processing weakness that is nonphonological in nature. Regardless of language status, students with poor decoding or comprehension problems display difficulties with various aspects of language.
Quantifiable and objective approach to organizational performance enhancement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scholand, Andrew Joseph; Tausczik, Yla R.
This report describes a new methodology, social language network analysis (SLNA), that combines tools from social language processing and network analysis to identify socially situated relationships between individuals which, though subtle, are highly influential. Specifically, SLNA aims to identify and characterize the nature of working relationships by processing artifacts generated with computer-mediated communication systems, such as instant message texts or emails. Because social language processing is able to identify psychological, social, and emotional processes that individuals are not able to fully mask, social language network analysis can clarify and highlight complex interdependencies between group members, even when these relationships aremore » latent or unrecognized. This report outlines the philosophical antecedents of SLNA, the mechanics of preprocessing, processing, and post-processing stages, and some example results obtained by applying this approach to a 15-month corporate discussion archive.« less
Model-based query language for analyzing clinical processes.
Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris
2013-01-01
Nowadays large databases of clinical process data exist in hospitals. However, these data are rarely used in full scope. In order to perform queries on hospital processes, one must either choose from the predefined queries or develop queries using MS Excel-type software system, which is not always a trivial task. In this paper we propose a new query language for analyzing clinical processes that is easily perceptible also by non-IT professionals. We develop this language based on a process modeling language which is also described in this paper. Prototypes of both languages have already been verified using real examples from hospitals.
Koizumi, Masatoshi; Imamura, Satoshi
2017-02-01
The effects of syntactic and information structures on sentence processing load were investigated using two reading comprehension experiments in Japanese, a head-final SOV language. In the first experiment, we discovered the main effects of syntactic and information structures, as well as their interaction, showing that interaction of these two factors is not restricted to head-initial languages. The second experiment revealed that the interaction between syntactic structure and information structure occurs at the second NP (O of SOV and S of OSV), which, crucially, is a pre-head position, suggesting the incremental nature of the processing of both syntactic structure and information structure in head-final languages.
Comparison of LISP and MUMPS as implementation languages for knowledge-based systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, A.C.
1984-01-01
Major components of knowledge-based systems are summarized, along with the programming language features generally useful in their implementation. LISP and MUMPS are briefly described and compared as vehicles for building knowledge-based systems. The paper concludes with suggestions for extensions to MUMPS which might increase its usefulness in artificial intelligence applications without affecting the essential nature of the language. 8 references.
Cross-Language Information Retrieval: An Analysis of Errors.
ERIC Educational Resources Information Center
Ruiz, Miguel E.; Srinivasan, Padmini
1998-01-01
Investigates an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual Unified Medical Language System (UMLS) Metathesaurus to translate Spanish natural-language queries into English. Results indicate that for Spanish, the UMLS Metathesaurus-based CLIR method is at least equivalent to if not better than…
Clinical and Educational Perspectives on Language Intervention for Children with Autism.
ERIC Educational Resources Information Center
Kamhi, Alan G.; And Others
The paper examines aspects of effective language intervention with autistic children. An overview is presented about the nature of language, its perception and comprehension, and the production of speech-language. Assessment strategies are considered. The second part of the paper analyzes traditional and communications-based intervention programs.…
Informal Language Learning Setting: Technology or Social Interaction?
ERIC Educational Resources Information Center
Bahrani, Taher; Sim, Tam Shu
2012-01-01
Based on the informal language learning theory, language learning can occur outside the classroom setting unconsciously and incidentally through interaction with the native speakers or exposure to authentic language input through technology. However, an EFL context lacks the social interaction which naturally occurs in an ESL context. To explore…
A novel robust Arabic light stemmer
NASA Astrophysics Data System (ADS)
Abainia, Kheireddine; Ouamour, Siham; Sayoud, Halim
2017-05-01
The stemming is the process of transforming a word into its root or stem, hence, it is considered as a crucial pre-processing step before tackling any task of natural language processing or information retrieval. However, in the case of Arabic language, finding an effective stemming algorithm seems to be quite difficult, since the Arabic language has a specific morphology, which is different from many other languages. Although, there exist several algorithms in literature addressing the Arabic stemming issue, unfortunately, most of them are restricted to a limited number of words, present some confusions between original letters and affixes, and usually employ dictionary of words or patterns. For that purpose, we propose the design and implementation of a novel Arabic light stemmer, which is based on some new rules for stripping prefixes, suffixes and infixes in a smart way. And in our knowledge, it is the first work dealing with Arabic infixes with regards to their irregular rules. The empirical evaluation was conducted on a new Arabic data-set (called ARASTEM), which was conceived and collected from several Arabic discussion forums containing dialectical Arabic and modern pseudo-Arabic languages. Hence, we present a comparative investigation between our new stemmer and other existing stemmers using Paice's parameters, namely: Under Stemming Index (UI), Over Stemming Index (OI) and Stemming Weight (SW). Results show that the proposed Arabic light stemmer maintains consistently high performances and outperforms several existing light stemmers.
A natural language interface to databases
NASA Technical Reports Server (NTRS)
Ford, D. R.
1988-01-01
The development of a Natural Language Interface which is semantic-based and uses Conceptual Dependency representation is presented. The system was developed using Lisp and currently runs on a Symbolics Lisp machine. A key point is that the parser handles morphological analysis, which expands its capabilities of understanding more words.
Two Interpretive Systems for Natural Language?
ERIC Educational Resources Information Center
Frazier, Lyn
2015-01-01
It is proposed that humans have available to them two systems for interpreting natural language. One system is familiar from formal semantics. It is a type based system that pairs a syntactic form with its interpretation using grammatical rules of composition. This system delivers both plausible and implausible meanings. The other proposed system…
NASA Astrophysics Data System (ADS)
Knoeferle, Pia
2016-03-01
In his review article [19], Arbib outlines an ambitious research agenda: to accommodate within a unified framework the evolution, the development, and the processing of language in natural settings (implicating other systems such as vision). He does so with neuro-computationally explicit modeling in mind [1,2] and inspired by research on the mirror neuron system in primates. Similar research questions have received substantial attention also among other scientists [3,4,12].
The nature of the language input affects brain activation during learning from a natural language
Plante, Elena; Patterson, Dianne; Gómez, Rebecca; Almryde, Kyle R.; White, Milo G.; Asbjørnsen, Arve E.
2015-01-01
Artificial language studies have demonstrated that learners are able to segment individual word-like units from running speech using the transitional probability information. However, this skill has rarely been examined in the context of natural languages, where stimulus parameters can be quite different. In this study, two groups of English-speaking learners were exposed to Norwegian sentences over the course of three fMRI scans. One group was provided with input in which transitional probabilities predicted the presence of target words in the sentences. This group quickly learned to identify the target words and fMRI data revealed an extensive and highly dynamic learning network. These results were markedly different from activation seen for a second group of participants. This group was provided with highly similar input that was modified so that word learning based on syllable co-occurrences was not possible. These participants showed a much more restricted network. The results demonstrate that the nature of the input strongly influenced the nature of the network that learners employ to learn the properties of words in a natural language. PMID:26257471
2011-01-01
Background The identification of patients who pose an epidemic hazard when they are admitted to a health facility plays a role in preventing the risk of hospital acquired infection. An automated clinical decision support system to detect suspected cases, based on the principle of syndromic surveillance, is being developed at the University of Lyon's Hôpital de la Croix-Rousse. This tool will analyse structured data and narrative reports from computerized emergency department (ED) medical records. The first step consists of developing an application (UrgIndex) which automatically extracts and encodes information found in narrative reports. The purpose of the present article is to describe and evaluate this natural language processing system. Methods Narrative reports have to be pre-processed before utilizing the French-language medical multi-terminology indexer (ECMT) for standardized encoding. UrgIndex identifies and excludes syntagmas containing a negation and replaces non-standard terms (abbreviations, acronyms, spelling errors...). Then, the phrases are sent to the ECMT through an Internet connection. The indexer's reply, based on Extensible Markup Language, returns codes and literals corresponding to the concepts found in phrases. UrgIndex filters codes corresponding to suspected infections. Recall is defined as the number of relevant processed medical concepts divided by the number of concepts evaluated (coded manually by the medical epidemiologist). Precision is defined as the number of relevant processed concepts divided by the number of concepts proposed by UrgIndex. Recall and precision were assessed for respiratory and cutaneous syndromes. Results Evaluation of 1,674 processed medical concepts contained in 100 ED medical records (50 for respiratory syndromes and 50 for cutaneous syndromes) showed an overall recall of 85.8% (95% CI: 84.1-87.3). Recall varied from 84.5% for respiratory syndromes to 87.0% for cutaneous syndromes. The most frequent cause of lack of processing was non-recognition of the term by UrgIndex (9.7%). Overall precision was 79.1% (95% CI: 77.3-80.8). It varied from 81.4% for respiratory syndromes to 77.0% for cutaneous syndromes. Conclusions This study demonstrates the feasibility of and interest in developing an automated method for extracting and encoding medical concepts from ED narrative reports, the first step required for the detection of potentially infectious patients at epidemic risk. PMID:21798029
Integrating language models into classifiers for BCI communication: a review
NASA Astrophysics Data System (ADS)
Speier, W.; Arnold, C.; Pouratian, N.
2016-06-01
Objective. The present review systematically examines the integration of language models to improve classifier performance in brain-computer interface (BCI) communication systems. Approach. The domain of natural language has been studied extensively in linguistics and has been used in the natural language processing field in applications including information extraction, machine translation, and speech recognition. While these methods have been used for years in traditional augmentative and assistive communication devices, information about the output domain has largely been ignored in BCI communication systems. Over the last few years, BCI communication systems have started to leverage this information through the inclusion of language models. Main results. Although this movement began only recently, studies have already shown the potential of language integration in BCI communication and it has become a growing field in BCI research. BCI communication systems using language models in their classifiers have progressed down several parallel paths, including: word completion; signal classification; integration of process models; dynamic stopping; unsupervised learning; error correction; and evaluation. Significance. Each of these methods have shown significant progress, but have largely been addressed separately. Combining these methods could use the full potential of language model, yielding further performance improvements. This integration should be a priority as the field works to create a BCI system that meets the needs of the amyotrophic lateral sclerosis population.
Integrating language models into classifiers for BCI communication: a review.
Speier, W; Arnold, C; Pouratian, N
2016-06-01
The present review systematically examines the integration of language models to improve classifier performance in brain-computer interface (BCI) communication systems. The domain of natural language has been studied extensively in linguistics and has been used in the natural language processing field in applications including information extraction, machine translation, and speech recognition. While these methods have been used for years in traditional augmentative and assistive communication devices, information about the output domain has largely been ignored in BCI communication systems. Over the last few years, BCI communication systems have started to leverage this information through the inclusion of language models. Although this movement began only recently, studies have already shown the potential of language integration in BCI communication and it has become a growing field in BCI research. BCI communication systems using language models in their classifiers have progressed down several parallel paths, including: word completion; signal classification; integration of process models; dynamic stopping; unsupervised learning; error correction; and evaluation. Each of these methods have shown significant progress, but have largely been addressed separately. Combining these methods could use the full potential of language model, yielding further performance improvements. This integration should be a priority as the field works to create a BCI system that meets the needs of the amyotrophic lateral sclerosis population.
Peeling the Onion of Auditory Processing Disorder: A Language/Curricular-Based Perspective
ERIC Educational Resources Information Center
Wallach, Geraldine P.
2011-01-01
Purpose: This article addresses auditory processing disorder (APD) from a language-based perspective. The author asks speech-language pathologists to evaluate the functionality (or not) of APD as a diagnostic category for children and adolescents with language-learning and academic difficulties. Suggestions are offered from a…
Parton, Becky Sue
2006-01-01
In recent years, research has progressed steadily in regard to the use of computers to recognize and render sign language. This paper reviews significant projects in the field beginning with finger-spelling hands such as "Ralph" (robotics), CyberGloves (virtual reality sensors to capture isolated and continuous signs), camera-based projects such as the CopyCat interactive American Sign Language game (computer vision), and sign recognition software (Hidden Markov Modeling and neural network systems). Avatars such as "Tessa" (Text and Sign Support Assistant; three-dimensional imaging) and spoken language to sign language translation systems such as Poland's project entitled "THETOS" (Text into Sign Language Automatic Translator, which operates in Polish; natural language processing) are addressed. The application of this research to education is also explored. The "ICICLE" (Interactive Computer Identification and Correction of Language Errors) project, for example, uses intelligent computer-aided instruction to build a tutorial system for deaf or hard-of-hearing children that analyzes their English writing and makes tailored lessons and recommendations. Finally, the article considers synthesized sign, which is being added to educational material and has the potential to be developed by students themselves.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.
Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A
2017-03-01
The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems
Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A.
2017-01-01
Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. PMID:28328252
The digital language of amino acids.
Kurić, L
2007-11-01
The subject of this paper is a digital approach to the investigation of the biochemical basis of genetic processes. The digital mechanism of nucleic acid and protein bio-syntheses, the evolution of biomacromolecules and, especially, the biochemical evolution of genetic language have been analyzed by the application of cybernetic methods, information theory and system theory, respectively. This paper reports the discovery of new methods for developing the new technologies in genetics. It is about the most advanced digital technology which is based on program, cybernetics and informational systems and laws. The results in the practical application of the new technology could be useful in bioinformatics, genetics, biochemistry, medicine and other natural sciences.
ERIC Educational Resources Information Center
Kolodny, Oren; Lotem, Arnon; Edelman, Shimon
2015-01-01
We introduce a set of biologically and computationally motivated design choices for modeling the learning of language, or of other types of sequential, hierarchically structured experience and behavior, and describe an implemented system that conforms to these choices and is capable of unsupervised learning from raw natural-language corpora. Given…
NLPReViz: an interactive tool for natural language processing on clinical text.
Trivedi, Gaurav; Pham, Phuong; Chapman, Wendy W; Hwa, Rebecca; Wiebe, Janyce; Hochheiser, Harry
2018-01-01
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Second teaching: An exploration of cognitive factors in small group physics learning
NASA Astrophysics Data System (ADS)
Novemsky, Lisa Forman
This inquiry was focused on an exploration of introductory physics teaching. Alan Van Heuvelen's Overview Case Study (OCS) physics was the pedagogical approach involving guided small group problem solving and stressing concepts first, before mathematics. Second teaching is a new pedagogical construct based on Vygotsky's ideas. Structured small group activity follows traditional instruction facilitating learning for non-traditional students. It is a model of structured small group activity designed to follow traditional instruction to facilitate the learning process for students who find a physics optic (way of seeing) and physics language foreign. In informal small group settins students describe, explain, elaborate, test, and defend ideas in their own familiar vernacular as they collaborate in solving problems. Collective wisdom of a collaborative group, somewhat beyond the level for each individual member, is created then recreated through self-correction. Students improved significantly in physics knowledge. In a classroom setting, small groups of non-traditional physics students engaged in second teaching were observed. Written explanations to conceptual physics questions were analyzed. Development of language usage in relationship to introductory physics concept learning was studied. Overall physics learning correlated positively with gains in language clarity thus confirming the hypothesis that language development can be linked with gains in physics knowledge. Males and females were found to be significantly different in this respect. Male gains in language clarity were closely coupled with physics learning whereas female gains in the two measures were not coupled. Physics discourse, particularly in relationship to force and motion, seems to resonate with natural developmentally acquired sex-typical male but not female discourse. Thus, for males but not for females, physics learning proceeds in a seamless fashion wherein knowledge gains are coupled with language development. Average frequency in use of the indeterminate pronoun it per person decreased. Reificiation of qualifying terms appeared in the form of a word-form problem. In the process of reifying adjectival properties students may be recapitulating the language-bound history of natural science.
Prediction of psychosis across protocols and risk cohorts using automated language analysis
Corcoran, Cheryl M.; Carrillo, Facundo; Fernández‐Slezak, Diego; Bedi, Gillinder; Klim, Casimir; Javitt, Daniel C.; Bearden, Carrie E.; Cecchi, Guillermo A.
2018-01-01
Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer‐based natural language processing analyses, we previously showed that, among English‐speaking clinical (e.g., ultra) high‐risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross‐validate these automated linguistic analytic methods in a second larger risk cohort, also English‐speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine‐learning speech classifier – comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns – that had an 83% accuracy in predicting psychosis onset (intra‐protocol), a cross‐validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross‐protocol), and a 72% accuracy in discriminating the speech of recent‐onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at‐risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry. PMID:29352548
Prediction of psychosis across protocols and risk cohorts using automated language analysis.
Corcoran, Cheryl M; Carrillo, Facundo; Fernández-Slezak, Diego; Bedi, Gillinder; Klim, Casimir; Javitt, Daniel C; Bearden, Carrie E; Cecchi, Guillermo A
2018-02-01
Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer-based natural language processing analyses, we previously showed that, among English-speaking clinical (e.g., ultra) high-risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross-validate these automated linguistic analytic methods in a second larger risk cohort, also English-speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine-learning speech classifier - comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns - that had an 83% accuracy in predicting psychosis onset (intra-protocol), a cross-validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross-protocol), and a 72% accuracy in discriminating the speech of recent-onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at-risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry. © 2018 World Psychiatric Association.
The Design of Lexical Database for Indonesian Language
NASA Astrophysics Data System (ADS)
Gunawan, D.; Amalia, A.
2017-03-01
Kamus Besar Bahasa Indonesia (KBBI), an official dictionary for Indonesian language, provides lists of words with their meaning. The online version can be accessed via Internet network. Another online dictionary is Kateglo. KBBI online and Kateglo only provides an interface for human. A machine cannot retrieve data from the dictionary easily without using advanced techniques. Whereas, lexical of words is required in research or application development which related to natural language processing, text mining, information retrieval or sentiment analysis. To address this requirement, we need to build a lexical database which provides well-defined structured information about words. A well-known lexical database is WordNet, which provides the relation among words in English. This paper proposes the design of a lexical database for Indonesian language based on the combination of KBBI 4th edition, Kateglo and WordNet structure. Knowledge representation by utilizing semantic networks depict the relation among words and provide the new structure of lexical database for Indonesian language. The result of this design can be used as the foundation to build the lexical database for Indonesian language.
A Guide to IRUS-II Application Development
1989-09-01
Stallard (editors). Research and Develo; nent in Natural Language b’nderstan,;ng as Part of t/i Strategic Computing Program . chapter 3, pages 27-34...Development in Natural Language Processing in the Strategic Computing Program . Compi-nrional Linguistics 12(2):132-136. April-June, 1986. [24] Sidner. C.L...assist developers interested in adapting IRUS-11 to new application domains Chapter 2 provides a general introduction and overviev ,. Chapter 3 describes
Velan, Hadas; Frost, Ram
2010-01-01
Recent studies suggest that basic effects which are markers of visual word recognition in Indo-European languages cannot be obtained in Hebrew or in Arabic. Although Hebrew has an alphabetic writing system, just like English, French, or Spanish, a series of studies consistently suggested that simple form-orthographic priming, or letter-transposition priming are not found in Hebrew. In four experiments, we tested the hypothesis that this is due to the fact that Semitic words have an underlying structure that constrains the possible alignment of phonemes and their respective letters. The experiments contrasted typical Semitic words which are root-derived, with Hebrew words of non-Semitic origin, which are morphologically simple and resemble base words in European languages. Using RSVP, TL priming, and form-priming manipulations, we show that Hebrew readers process Hebrew words which are morphologically simple similar to the way they process English words. These words indeed reveal the typical form-priming and TL priming effects reported in European languages. In contrast, words with internal structure are processed differently, and require a different code for lexical access. We discuss the implications of these findings for current models of visual word recognition. PMID:21163472
Direct brain recordings reveal hippocampal rhythm underpinnings of language processing.
Piai, Vitória; Anderson, Kristopher L; Lin, Jack J; Dewar, Callum; Parvizi, Josef; Dronkers, Nina F; Knight, Robert T
2016-10-04
Language is classically thought to be supported by perisylvian cortical regions. Here we provide intracranial evidence linking the hippocampal complex to linguistic processing. We used direct recordings from the hippocampal structures to investigate whether theta oscillations, pivotal in memory function, track the amount of contextual linguistic information provided in sentences. Twelve participants heard sentences that were either constrained ("She locked the door with the") or unconstrained ("She walked in here with the") before presentation of the final word ("key"), shown as a picture that participants had to name. Hippocampal theta power increased for constrained relative to unconstrained contexts during sentence processing, preceding picture presentation. Our study implicates hippocampal theta oscillations in a language task using natural language associations that do not require memorization. These findings reveal that the hippocampal complex contributes to language in an active fashion, relating incoming words to stored semantic knowledge, a necessary process in the generation of sentence meaning.
On Religion and Language Evolutions Seen Through Mathematical and Agent Based Models
NASA Astrophysics Data System (ADS)
Ausloos, M.
Religions and languages are social variables, like age, sex, wealth or political opinions, to be studied like any other organizational parameter. In fact, religiosity is one of the most important sociological aspects of populations. Languages are also obvious characteristics of the human species. Religions, languages appear though also disappear. All religions and languages evolve and survive when they adapt to the society developments. On the other hand, the number of adherents of a given religion, or the number of persons speaking a language is not fixed in time, - nor space. Several questions can be raised. E.g. from a oscopic point of view : How many religions/languages exist at a given time? What is their distribution? What is their life time? How do they evolve? From a "microscopic" view point: can one invent agent based models to describe oscopic aspects? Do simple evolution equations exist? How complicated must be a model? These aspects are considered in the present note. Basic evolution equations are outlined and critically, though briefly, discussed. Similarities and differences between religions and languages are summarized. Cases can be illustrated with historical facts and data. It is stressed that characteristic time scales are different. It is emphasized that "external fields" are historically very relevant in the case of religions, rending the study more " interesting" within a mechanistic approach based on parity and symmetry of clusters concepts. Yet the modern description of human societies through networks in reported simulations is still lacking some mandatory ingredients, i.e. the non scalar nature of the nodes, and the non binary aspects of nodes and links, though for the latter this is already often taken into account, including directions. From an analytical point of view one can consider a population independently of the others. It is intuitively accepted, but also found from the statistical analysis of the frequency distribution that an attachment process is the primary cause of the distribution evolution in the number of adepts: usually the initial religion/language is that of the mother. However later on, changes can occur either due to "heterogeneous agent interaction" processes or due to "external field" constraints, - or both. In so doing one has to consider competition-like processes, in a general environment with different rates of reproduction. More general equations are thus proposed for future work.
Wang, Guoli; Ebrahimi, Nader
2014-01-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader
2015-04-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Naturalistic Language Intervention in Inclusive Environments.
ERIC Educational Resources Information Center
Lowenthal, Barbara
1995-01-01
This review considers use of natural language instruction by early childhood teachers for children with language disabilities in inclusive environments. The following factors are addressed: child-centered approach, family involvement, classroom strategies, activity-based intervention, environmental influences, the function of play, preliteracy…
Domain-general neural correlates of dependency formation: Using complex tones to simulate language.
Brilmayer, Ingmar; Sassenhagen, Jona; Bornkessel-Schlesewsky, Ina; Schlesewsky, Matthias
2017-08-01
There is an ongoing debate whether the P600 event-related potential component following syntactic anomalies reflects syntactic processes per se, or if it is an instance of the P300, a domain-general ERP component associated with attention and cognitive reorientation. A direct comparison of both components is challenging because of the huge discrepancy in experimental designs and stimulus choice between language and 'classic' P300 experiments. In the present study, we develop a new approach to mimic the interplay of sequential position as well as categorical and relational information in natural language syntax (word category and agreement) in a non-linguistic target detection paradigm using musical instruments. Participants were instructed to (covertly) detect target tones which were defined by instrument change and pitch rise between subsequent tones at the last two positions of four-tone sequences. We analysed the EEG using event-related averaging and time-frequency decomposition. Our results show striking similarities to results obtained from linguistic experiments. We found a P300 that showed sensitivity to sequential position and a late positivity sensitive to stimulus type and position. A time-frequency decomposition revealed significant effects of sequential position on the theta band and a significant influence of stimulus type on the delta band. Our results suggest that the detection of non-linguistic targets defined via complex feature conjunctions in the present study and the detection of syntactic anomalies share the same underlying processes: attentional shift and memory based matching processes that act upon multi-feature conjunctions. We discuss the results as supporting domain-general accounts of the P600 during natural language comprehension. Copyright © 2017 Elsevier Ltd. All rights reserved.
Creation of structured documentation templates using Natural Language Processing techniques.
Kashyap, Vipul; Turchin, Alexander; Morin, Laura; Chang, Frank; Li, Qi; Hongsermeier, Tonya
2006-01-01
Structured Clinical Documentation is a fundamental component of the healthcare enterprise, linking both clinical (e.g., electronic health record, clinical decision support) and administrative functions (e.g., evaluation and management coding, billing). One of the challenges in creating good quality documentation templates has been the inability to address specialized clinical disciplines and adapt to local clinical practices. A one-size-fits-all approach leads to poor adoption and inefficiencies in the documentation process. On the other hand, the cost associated with manual generation of documentation templates is significant. Consequently there is a need for at least partial automation of the template generation process. We propose an approach and methodology for the creation of structured documentation templates for diabetes using Natural Language Processing (NLP).
ERIC Educational Resources Information Center
Gallas, Karen
Noting children's natural proclivity to interpret language freely and use that potential to expand and develop as learners, this book offers a new approach to understanding how young children communicate their knowledge of the world and how that understanding can transform the educative process. The book also describes the process of conducting…
Neural substrates of interactive musical improvisation: an FMRI study of 'trading fours' in jazz.
Donnay, Gabriel F; Rankin, Summer K; Lopez-Gonzalez, Monica; Jiradejvong, Patpong; Limb, Charles J
2014-01-01
Interactive generative musical performance provides a suitable model for communication because, like natural linguistic discourse, it involves an exchange of ideas that is unpredictable, collaborative, and emergent. Here we show that interactive improvisation between two musicians is characterized by activation of perisylvian language areas linked to processing of syntactic elements in music, including inferior frontal gyrus and posterior superior temporal gyrus, and deactivation of angular gyrus and supramarginal gyrus, brain structures directly implicated in semantic processing of language. These findings support the hypothesis that musical discourse engages language areas of the brain specialized for processing of syntax but in a manner that is not contingent upon semantic processing. Therefore, we argue that neural regions for syntactic processing are not domain-specific for language but instead may be domain-general for communication.
Neural Substrates of Interactive Musical Improvisation: An fMRI Study of ‘Trading Fours’ in Jazz
Donnay, Gabriel F.; Rankin, Summer K.; Lopez-Gonzalez, Monica; Jiradejvong, Patpong; Limb, Charles J.
2014-01-01
Interactive generative musical performance provides a suitable model for communication because, like natural linguistic discourse, it involves an exchange of ideas that is unpredictable, collaborative, and emergent. Here we show that interactive improvisation between two musicians is characterized by activation of perisylvian language areas linked to processing of syntactic elements in music, including inferior frontal gyrus and posterior superior temporal gyrus, and deactivation of angular gyrus and supramarginal gyrus, brain structures directly implicated in semantic processing of language. These findings support the hypothesis that musical discourse engages language areas of the brain specialized for processing of syntax but in a manner that is not contingent upon semantic processing. Therefore, we argue that neural regions for syntactic processing are not domain-specific for language but instead may be domain-general for communication. PMID:24586366
ERIC Educational Resources Information Center
Gan, Linda; Chong, Sylvia
1998-01-01
Examined the effectiveness of a year-long integrated language and music program (the Expressive Language and Music Project) to enhance Singaporean kindergartners' English oral-language competency. Found that the natural communicative setting and creative use of resources and activities based on the Orff and Kodaly approaches facilitated language…
The Languages of Communication. A Logical and Psychological Examination.
ERIC Educational Resources Information Center
Gordon, George N.
Two methods of analysis, logical and psychological (or, loosely, aesthetic and functional) are used to investigate the many kinds of languages man uses to communicate, the ways in which these languages operate, and the reasons for communication failures. Based on a discussion of the nature of symbols, since most languages of communication draw…
Consistent model driven architecture
NASA Astrophysics Data System (ADS)
Niepostyn, Stanisław J.
2015-09-01
The goal of the MDA is to produce software systems from abstract models in a way where human interaction is restricted to a minimum. These abstract models are based on the UML language. However, the semantics of UML models is defined in a natural language. Subsequently the verification of consistency of these diagrams is needed in order to identify errors in requirements at the early stage of the development process. The verification of consistency is difficult due to a semi-formal nature of UML diagrams. We propose automatic verification of consistency of the series of UML diagrams originating from abstract models implemented with our consistency rules. This Consistent Model Driven Architecture approach enables us to generate automatically complete workflow applications from consistent and complete models developed from abstract models (e.g. Business Context Diagram). Therefore, our method can be used to check practicability (feasibility) of software architecture models.
Generation of Natural-Language Textual Summaries from Longitudinal Clinical Records.
Goldstein, Ayelet; Shahar, Yuval
2015-01-01
Physicians are required to interpret, abstract and present in free-text large amounts of clinical data in their daily tasks. This is especially true for chronic-disease domains, but holds also in other clinical domains. We have recently developed a prototype system, CliniText, which, given a time-oriented clinical database, and appropriate formal abstraction and summarization knowledge, combines the computational mechanisms of knowledge-based temporal data abstraction, textual summarization, abduction, and natural-language generation techniques, to generate an intelligent textual summary of longitudinal clinical data. We demonstrate our methodology, and the feasibility of providing a free-text summary of longitudinal electronic patient records, by generating summaries in two very different domains - Diabetes Management and Cardiothoracic surgery. In particular, we explain the process of generating a discharge summary of a patient who had undergone a Coronary Artery Bypass Graft operation, and a brief summary of the treatment of a diabetes patient for five years.
NASA Astrophysics Data System (ADS)
Armando, Alessandro; Giunchiglia, Enrico; Ponta, Serena Elisa
We present an approach to the formal specification and automatic analysis of business processes under authorization constraints based on the action language \\cal{C}. The use of \\cal{C} allows for a natural and concise modeling of the business process and the associated security policy and for the automatic analysis of the resulting specification by using the Causal Calculator (CCALC). Our approach improves upon previous work by greatly simplifying the specification step while retaining the ability to perform a fully automatic analysis. To illustrate the effectiveness of the approach we describe its application to a version of a business process taken from the banking domain and use CCALC to determine resource allocation plans complying with the security policy.
Music and language perception: expectations, structural integration, and cognitive sequencing.
Tillmann, Barbara
2012-10-01
Music can be described as sequences of events that are structured in pitch and time. Studying music processing provides insight into how complex event sequences are learned, perceived, and represented by the brain. Given the temporal nature of sound, expectations, structural integration, and cognitive sequencing are central in music perception (i.e., which sounds are most likely to come next and at what moment should they occur?). This paper focuses on similarities in music and language cognition research, showing that music cognition research provides insight into the understanding of not only music processing but also language processing and the processing of other structured stimuli. The hypothesis of shared resources between music and language processing and of domain-general dynamic attention has motivated the development of research to test music as a means to stimulate sensory, cognitive, and motor processes. Copyright © 2012 Cognitive Science Society, Inc.
ERIC Educational Resources Information Center
D'Mello, Sidney K.; Dowell, Nia; Graesser, Arthur
2011-01-01
There is the question of whether learning differs when students speak versus type their responses when interacting with intelligent tutoring systems with natural language dialogues. Theoretical bases exist for three contrasting hypotheses. The "speech facilitation" hypothesis predicts that spoken input will "increase" learning,…
Translation of Japanese Noun Compounds at Super-Function Based MT System
NASA Astrophysics Data System (ADS)
Zhao, Xin; Ren, Fuji; Kuroiwa, Shingo
Noun compounds are frequently encountered construction in nature language processing (NLP), consisting of a sequence of two or more nouns which functions syntactically as one noun. The translation of noun compounds has become a major issue in Machine Translation (MT) due to their frequency of occurrence and high productivity. In our previous studies on Super-Function Based Machine Translation (SFBMT), we have found that noun compounds are very frequently used and difficult to be translated correctly, the overgeneration of noun compounds can be dangerous as it may introduce ambiguity in the translation. In this paper, we discuss the challenges in handling Japanese noun compounds in an SFBMT system, we present a shallow method for translating noun compounds by using a word level translation dictionary and target language monolingual corpus.
Discourses of prejudice in the professions: the case of sign languages
Humphries, Tom; Kushalnagar, Poorna; Mathur, Gaurav; Napoli, Donna Jo; Padden, Carol; Rathmann, Christian; Smith, Scott
2017-01-01
There is no evidence that learning a natural human language is cognitively harmful to children. To the contrary, multilingualism has been argued to be beneficial to all. Nevertheless, many professionals advise the parents of deaf children that their children should not learn a sign language during their early years, despite strong evidence across many research disciplines that sign languages are natural human languages. Their recommendations are based on a combination of misperceptions about (1) the difficulty of learning a sign language, (2) the effects of bilingualism, and particularly bimodalism, (3) the bona fide status of languages that lack a written form, (4) the effects of a sign language on acquiring literacy, (5) the ability of technologies to address the needs of deaf children and (6) the effects that use of a sign language will have on family cohesion. We expose these misperceptions as based in prejudice and urge institutions involved in educating professionals concerned with the healthcare, raising and educating of deaf children to include appropriate information about first language acquisition and the importance of a sign language for deaf children. We further urge such professionals to advise the parents of deaf children properly, which means to strongly advise the introduction of a sign language as soon as hearing loss is detected. PMID:28280057
ERIC Educational Resources Information Center
Marchman, Virginia A.; Fernald, Anne
2008-01-01
The nature of predictive relations between early language and later cognitive function is a fundamental question in research on human cognition. In a longitudinal study assessing speed of language processing in infancy, Fernald, Perfors and Marchman (2006 ) found that reaction time at 25 months was strongly related to lexical and grammatical…
ERIC Educational Resources Information Center
Crossley, Scott A.
2013-01-01
This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…
ERIC Educational Resources Information Center
Ozturk, Mustafa; Yildirim, Ali
2012-01-01
This study aimed to investigate the nature of the induction process of English as a foreign language (EFL) teachers teaching at tertiary level through individual interviews. In order to gather intended data, fifteen novice instructors teaching at four different public universities in Ankara were interviewed on a basis of two criteria: (a) having 1…
ERIC Educational Resources Information Center
Granger, Sylviane; Kraif, Olivier; Ponton, Claude; Antoniadis, Georges; Zampa, Virginie
2007-01-01
Learner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with…
Natural Language as a Tool for Analyzing the Proving Process: The Case of Plane Geometry Proof
ERIC Educational Resources Information Center
Robotti, Elisabetta
2012-01-01
In the field of human cognition, language plays a special role that is connected directly to thinking and mental development (e.g., Vygotsky, "1938"). Thanks to "verbal thought", language allows humans to go beyond the limits of immediately perceived information, to form concepts and solve complex problems (Luria, "1975"). So, it appears language…
Verbal Counting in Bilingual Contexts
ERIC Educational Resources Information Center
Donevska-Todorova, Ana
2015-01-01
Informal experiences in mathematics often include playful competitions among young children in counting numbers in as many as possible different languages. Can these enjoyable experiences result with excellence in the formal processes of education? This article discusses connections between mathematical achievements and natural languages within…
Culture and biology in the origins of linguistic structure.
Kirby, Simon
2017-02-01
Language is systematically structured at all levels of description, arguably setting it apart from all other instances of communication in nature. In this article, I survey work over the last 20 years that emphasises the contributions of individual learning, cultural transmission, and biological evolution to explaining the structural design features of language. These 3 complex adaptive systems exist in a network of interactions: individual learning biases shape the dynamics of cultural evolution; universal features of linguistic structure arise from this cultural process and form the ultimate linguistic phenotype; the nature of this phenotype affects the fitness landscape for the biological evolution of the language faculty; and in turn this determines individuals' learning bias. Using a combination of computational simulation, laboratory experiments, and comparison with real-world cases of language emergence, I show that linguistic structure emerges as a natural outcome of cultural evolution once certain minimal biological requirements are in place.
ROPE: Recoverable Order-Preserving Embedding of Natural Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Widemann, David P.; Wang, Eric X.; Thiagarajan, Jayaraman J.
We present a novel Recoverable Order-Preserving Embedding (ROPE) of natural language. ROPE maps natural language passages from sparse concatenated one-hot representations to distributed vector representations of predetermined fixed length. We use Euclidean distance to return search results that are both grammatically and semantically similar. ROPE is based on a series of random projections of distributed word embeddings. We show that our technique typically forms a dictionary with sufficient incoherence such that sparse recovery of the original text is possible. We then show how our embedding allows for efficient and meaningful natural search and retrieval on Microsoft’s COCO dataset and themore » IMDB Movie Review dataset.« less
Sentence alignment using feed forward neural network.
Fattah, Mohamed Abdel; Ren, Fuji; Kuroiwa, Shingo
2006-12-01
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
The bridge of iconicity: from a world of experience to the experience of language.
Perniss, Pamela; Vigliocco, Gabriella
2014-09-19
Iconicity, a resemblance between properties of linguistic form (both in spoken and signed languages) and meaning, has traditionally been considered to be a marginal, irrelevant phenomenon for our understanding of language processing, development and evolution. Rather, the arbitrary and symbolic nature of language has long been taken as a design feature of the human linguistic system. In this paper, we propose an alternative framework in which iconicity in face-to-face communication (spoken and signed) is a powerful vehicle for bridging between language and human sensori-motor experience, and, as such, iconicity provides a key to understanding language evolution, development and processing. In language evolution, iconicity might have played a key role in establishing displacement (the ability of language to refer beyond what is immediately present), which is core to what language does; in ontogenesis, iconicity might play a critical role in supporting referentiality (learning to map linguistic labels to objects, events, etc., in the world), which is core to vocabulary development. Finally, in language processing, iconicity could provide a mechanism to account for how language comes to be embodied (grounded in our sensory and motor systems), which is core to meaningful communication.
The bridge of iconicity: from a world of experience to the experience of language
Perniss, Pamela; Vigliocco, Gabriella
2014-01-01
Iconicity, a resemblance between properties of linguistic form (both in spoken and signed languages) and meaning, has traditionally been considered to be a marginal, irrelevant phenomenon for our understanding of language processing, development and evolution. Rather, the arbitrary and symbolic nature of language has long been taken as a design feature of the human linguistic system. In this paper, we propose an alternative framework in which iconicity in face-to-face communication (spoken and signed) is a powerful vehicle for bridging between language and human sensori-motor experience, and, as such, iconicity provides a key to understanding language evolution, development and processing. In language evolution, iconicity might have played a key role in establishing displacement (the ability of language to refer beyond what is immediately present), which is core to what language does; in ontogenesis, iconicity might play a critical role in supporting referentiality (learning to map linguistic labels to objects, events, etc., in the world), which is core to vocabulary development. Finally, in language processing, iconicity could provide a mechanism to account for how language comes to be embodied (grounded in our sensory and motor systems), which is core to meaningful communication. PMID:25092668
Learning for Semantic Parsing with Kernels under Various Forms of Supervision
2007-08-01
natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing computing...sentences are semantically tractable. This indi- cates that Geoquery is more challenging domain for semantic parsing than ATIS. In the past, there have been a...Combining parsers. In Proceedings of the Conference on Em- pirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/ VLC -99), pp. 187–194
ERIC Educational Resources Information Center
Tode, Tomoko
2012-01-01
This article examines how learners of English as a foreign language process reduced relative clauses (RRCs) from the perspective of usage-based language learning, which posits that language knowledge forms a hierarchy from item-based knowledge consisting only of entrenched frequent exemplars to more advanced schematized knowledge. Twenty-eight…
An Infinite Game in a Finite Setting: Visualizing Foreign Language Teaching and Learning in America.
ERIC Educational Resources Information Center
Mantero, Miguel
According to contemporary thought and foundational research, this paper presents various elements of the foreign language teaching profession and language learning environment in the United States as either product-driven or process-based. It is argued that a process-based approach to language teaching and learning benefits not only second…
Tanana, Michael; Hallgren, Kevin A; Imel, Zac E; Atkins, David C; Srikumar, Vivek
2016-06-01
Motivational interviewing (MI) is an efficacious treatment for substance use disorders and other problem behaviors. Studies on MI fidelity and mechanisms of change typically use human raters to code therapy sessions, which requires considerable time, training, and financial costs. Natural language processing techniques have recently been utilized for coding MI sessions using machine learning techniques, rather than human coders, and preliminary results have suggested these methods hold promise. The current study extends this previous work by introducing two natural language processing models for automatically coding MI sessions via computer. The two models differ in the way they semantically represent session content, utilizing either 1) simple discrete sentence features (DSF model) and 2) more complex recursive neural networks (RNN model). Utterance- and session-level predictions from these models were compared to ratings provided by human coders using a large sample of MI sessions (N=341 sessions; 78,977 clinician and client talk turns) from 6 MI studies. Results show that the DSF model generally had slightly better performance compared to the RNN model. The DSF model had "good" or higher utterance-level agreement with human coders (Cohen's kappa>0.60) for open and closed questions, affirm, giving information, and follow/neutral (all therapist codes); considerably higher agreement was obtained for session-level indices, and many estimates were competitive with human-to-human agreement. However, there was poor agreement for client change talk, client sustain talk, and therapist MI-inconsistent behaviors. Natural language processing methods provide accurate representations of human derived behavioral codes and could offer substantial improvements to the efficiency and scale in which MI mechanisms of change research and fidelity monitoring are conducted. Copyright © 2016 Elsevier Inc. All rights reserved.
A Research on Second Language Acquisition and College English Teaching
ERIC Educational Resources Information Center
Li, Changyu
2009-01-01
It was in the 1970s that American linguist S.D. Krashen created the theory of "language acquisition". The theories on second language acquisition were proposed based on the study on the second language acquisition process and its rules. Here, the second language acquisition process refers to the process in which a learner with the…
ERIC Educational Resources Information Center
Koffi, Phil Yao
A study suggests that the nature of linguistic borrowing in a group of 14 African languages termed Togo remnant languages--Basila, Lelemie (Buem), Aogba, Adele, Likpe, Santrokofi, Akpafu-Lolobi, Avatime, Nyangbo-Tafi, Bowili, Aklo, Kposo, Kebu, Animere--is similar to that of the Akebu language. Analysis focuses on the origins and itineraries of…
Alor-Hernández, Giner; Sánchez-Cervantes, José Luis; Juárez-Martínez, Ulises; Posada-Gómez, Rubén; Cortes-Robles, Guillermo; Aguilar-Laserre, Alberto
2012-03-01
Emergency healthcare is one of the emerging application domains for information services, which requires highly multimodal information services. The time of consuming pre-hospital emergency process is critical. Therefore, the minimization of required time for providing primary care and consultation to patients is one of the crucial factors when trying to improve the healthcare delivery in emergency situations. In this sense, dynamic location of medical entities is a complex process that needs time and it can be critical when a person requires medical attention. This work presents a multimodal location-based system for locating and assigning medical entities called ITOHealth. ITOHealth provides a multimodal middleware-oriented integrated architecture using a service-oriented architecture in order to provide information of medical entities in mobile devices and web browsers with enriched interfaces providing multimodality support. ITOHealth's multimodality is based on the use of Microsoft Agent Characters, the integration of natural language voice to the characters, and multi-language and multi-characters support providing an advantage for users with visual impairments.
ERIC Educational Resources Information Center
Norris, John M.
2016-01-01
Language program evaluation is a pragmatic mode of inquiry that illuminates the complex nature of language-related interventions of various kinds, the factors that foster or constrain them, and the consequences that ensue. Program evaluation enables a variety of evidence-based decisions and actions, from designing programs and implementing…
BioC: a minimalist approach to interoperability for biomedical text processing
Comeau, Donald C.; Islamaj Doğan, Rezarta; Ciccarese, Paolo; Cohen, Kevin Bretonnel; Krallinger, Martin; Leitner, Florian; Lu, Zhiyong; Peng, Yifan; Rinaldi, Fabio; Torii, Manabu; Valencia, Alfonso; Verspoor, Karin; Wiegers, Thomas C.; Wu, Cathy H.; Wilbur, W. John
2013-01-01
A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ PMID:24048470
ERIC Educational Resources Information Center
Dominey, Peter Ford; Inui, Toshio; Hoen, Michel
2009-01-01
A central issue in cognitive neuroscience today concerns how distributed neural networks in the brain that are used in language learning and processing can be involved in non-linguistic cognitive sequence learning. This issue is informed by a wealth of functional neurophysiology studies of sentence comprehension, along with a number of recent…
ERIC Educational Resources Information Center
North, Brian; Piccardo, Enrica
2016-01-01
The notion of mediation has been the object of growing interest in second language education in recent years. The increasing awareness of the complex nature of the process of learning--and teaching--stretches our collective reflection towards less explored areas. In mediation, the immediate focus is on the role of language in processes like…
Spanish language generation engine to enhance the syntactic quality of AAC systems
NASA Astrophysics Data System (ADS)
Narváez A., Cristian; Sastoque H., Sebastián.; Iregui G., Marcela
2015-12-01
People with Complex Communication Needs (CCN) face difficulties to communicate their ideas, feelings and needs. Augmentative and Alternative Communication (AAC) approaches aim to provide support to enhance socialization of these individuals. However, there are many limitations in current applications related with systems operation, target scenarios and language consistency. This work presents an AAC approach to enhance produced messages by applying elements of Natural Language Generation. Specifically, a Spanish language engine, composed of a grammar ontology and a set of linguistic rules, is proposed to improve the naturalness in the communication process, when persons with CCN tell stories about their daily activities to non-disabled receivers. The assessment of the proposed method confirms the validity of the model to improve messages quality.
Right Lateral Cerebellum Represents Linguistic Predictability.
Lesage, Elise; Hansen, Peter C; Miall, R Chris
2017-06-28
Mounting evidence indicates that posterolateral portions of the cerebellum (right Crus I/II) contribute to language processing, but the nature of this role remains unclear. Based on a well-supported theory of cerebellar motor function, which ascribes to the cerebellum a role in short-term prediction through internal modeling, we hypothesize that right cerebellar Crus I/II supports prediction of upcoming sentence content. We tested this hypothesis using event-related fMRI in male and female human subjects by manipulating the predictability of written sentences. Our design controlled for motor planning and execution, as well as for linguistic features and working memory load; it also allowed separation of the prediction interval from the presentation of the final sentence item. In addition, three further fMRI tasks captured semantic, phonological, and orthographic processing to shed light on the nature of the information processed. As hypothesized, activity in right posterolateral cerebellum correlated with the predictability of the upcoming target word. This cerebellar region also responded to prediction error during the outcome of the trial. Further, this region was engaged in phonological, but not semantic or orthographic, processing. This is the first imaging study to demonstrate a right cerebellar contribution in language comprehension independently from motor, cognitive, and linguistic confounds. These results complement our work using other methodologies showing cerebellar engagement in linguistic prediction and suggest that internal modeling of phonological representations aids language production and comprehension. SIGNIFICANCE STATEMENT The cerebellum is traditionally seen as a motor structure that allows for smooth movement by predicting upcoming signals. However, the cerebellum is also consistently implicated in nonmotor functions such as language and working memory. Using fMRI, we identify a cerebellar area that is active when words are predicted and when these predictions are violated. This area is active in a separate task that requires phonological processing, but not in tasks that require semantic or visuospatial processing. Our results support the idea of prediction as a unifying cerebellar function in motor and nonmotor domains. We provide new insights by linking the cerebellar role in prediction to its role in verbal working memory, suggesting that these predictions involve phonological processing. Copyright © 2017 Lesage et al.
Voice-enabled Knowledge Engine using Flood Ontology and Natural Language Processing
NASA Astrophysics Data System (ADS)
Sermet, M. Y.; Demir, I.; Krajewski, W. F.
2015-12-01
The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The IFIS is designed for use by general public, often people with no domain knowledge and limited general science background. To improve effective communication with such audience, we have introduced a voice-enabled knowledge engine on flood related issues in IFIS. Instead of navigating within many features and interfaces of the information system and web-based sources, the system provides dynamic computations based on a collection of built-in data, analysis, and methods. The IFIS Knowledge Engine connects to real-time stream gauges, in-house data sources, analysis and visualization tools to answer natural language questions. Our goal is the systematization of data and modeling results on flood related issues in Iowa, and to provide an interface for definitive answers to factual queries. The goal of the knowledge engine is to make all flood related knowledge in Iowa easily accessible to everyone, and support voice-enabled natural language input. We aim to integrate and curate all flood related data, implement analytical and visualization tools, and make it possible to compute answers from questions. The IFIS explicitly implements analytical methods and models, as algorithms, and curates all flood related data and resources so that all these resources are computable. The IFIS Knowledge Engine computes the answer by deriving it from its computational knowledge base. The knowledge engine processes the statement, access data warehouse, run complex database queries on the server-side and return outputs in various formats. This presentation provides an overview of IFIS Knowledge Engine, its unique information interface and functionality as an educational tool, and discusses the future plans for providing knowledge on flood related issues and resources. IFIS Knowledge Engine provides an alternative access method to these comprehensive set of tools and data resources available in IFIS. Current implementation of the system accepts free-form input and voice recognition capabilities within browser and mobile applications.
Pakulak, Eric; Neville, Helen J.
2010-01-01
While anecdotally there appear to be differences in the way native speakers use and comprehend their native language, most empirical investigations of language processing study university students and none have studied differences in language proficiency which may be independent of resource limitations such as working memory span. We examined differences in language proficiency in adult monolingual native speakers of English using an event-related potential (ERP) paradigm. ERPs were recorded to insertion phrase structure violations in naturally spoken English sentences. Participants recruited from a wide spectrum of society were given standardized measures of English language proficiency, and two complementary ERP analyses were performed. In between-groups analyses, participants were divided, based on standardized proficiency scores, into Lower Proficiency (LP) and Higher Proficiency (HP) groups. Compared to LP participants, HP participants showed an early anterior negativity that was more focal, both spatially and temporally, and a larger and more widely distributed positivity (P600) to violations. In correlational analyses, we utilized a wide spectrum of proficiency scores to examine the degree to which individual proficiency scores correlated with individual neural responses to syntactic violations in regions and time windows identified in the between-group analyses. This approach also employed partial correlation analyses to control for possible confounding variables. These analyses provided evidence for the effects of proficiency that converged with the between-groups analyses. These results suggest that adult monolingual native speakers of English who vary in language proficiency differ in the recruitment of syntactic processes that are hypothesized to be at least in part automatic as well as of those thought to be more controlled. These results also suggest that in order to fully characterize neural organization for language in native speakers it is necessary to include participants of varying proficiency. PMID:19925188
ERIC Educational Resources Information Center
Harbusch, Karin; Cameran, Christel-Joy; Härtel, Johannes
2014-01-01
We present a new feedback strategy implemented in a natural language generation-based e-learning system for German as a second language (L2). Although the system recognizes a large proportion of the grammar errors in learner-produced written sentences, its automatically generated feedback only addresses errors against rules that are relevant at…
Drawing Dynamic Geometry Figures Online with Natural Language for Junior High School Geometry
ERIC Educational Resources Information Center
Wong, Wing-Kwong; Yin, Sheng-Kai; Yang, Chang-Zhe
2012-01-01
This paper presents a tool for drawing dynamic geometric figures by understanding the texts of geometry problems. With the tool, teachers and students can construct dynamic geometric figures on a web page by inputting a geometry problem in natural language. First we need to build the knowledge base for understanding geometry problems. With the…
Teaching the Tacit Knowledge of Programming to Novices with Natural Language Tutoring
ERIC Educational Resources Information Center
Lane, H. Chad; VanLehn, Kurt
2005-01-01
For beginning programmers, inadequate problem solving and planning skills are among the most salient of their weaknesses. In this paper, we test the efficacy of natural language tutoring to teach and scaffold acquisition of these skills. We describe ProPL (Pro-PELL), a dialogue-based intelligent tutoring system that elicits goal decompositions and…
Named Entity Recognition in a Hungarian NL Based QA System
NASA Astrophysics Data System (ADS)
Tikkl, Domonkos; Szidarovszky, P. Ferenc; Kardkovacs, Zsolt T.; Magyar, Gábor
In WoW project our purpose is to create a complex search interface with the following features: search in the deep web content of contracted partners' databases, processing Hungarian natural language (NL) questions and transforming them to SQL queries for database access, image search supported by a visual thesaurus that describes in a structural form the visual content of images (also in Hungarian). This paper primarily focuses on a particular problem of question processing task: the entity recognition. Before going into details we give a short overview of the project's aims.
Corina, David P; Knapp, Heather Patterson
2008-12-01
In the quest to further understand the neural underpinning of human communication, researchers have turned to studies of naturally occurring signed languages used in Deaf communities. The comparison of the commonalities and differences between spoken and signed languages provides an opportunity to determine core neural systems responsible for linguistic communication independent of the modality in which a language is expressed. The present article examines such studies, and in addition asks what we can learn about human languages by contrasting formal visual-gestural linguistic systems (signed languages) with more general human action perception. To understand visual language perception, it is important to distinguish the demands of general human motion processing from the highly task-dependent demands associated with extracting linguistic meaning from arbitrary, conventionalized gestures. This endeavor is particularly important because theorists have suggested close homologies between perception and production of actions and functions of human language and social communication. We review recent behavioral, functional imaging, and neuropsychological studies that explore dissociations between the processing of human actions and signed languages. These data suggest incomplete overlap between the mirror-neuron systems proposed to mediate human action and language.
Dynamic changes in network activations characterize early learning of a natural language.
Plante, Elena; Patterson, Dianne; Dailey, Natalie S; Kyle, R Almyrde; Fridriksson, Julius
2014-09-01
Those who are initially exposed to an unfamiliar language have difficulty separating running speech into individual words, but over time will recognize both words and the grammatical structure of the language. Behavioral studies have used artificial languages to demonstrate that humans are sensitive to distributional information in language input, and can use this information to discover the structure of that language. This is done without direct instruction and learning occurs over the course of minutes rather than days or months. Moreover, learners may attend to different aspects of the language input as their own learning progresses. Here, we examine processing associated with the early stages of exposure to a natural language, using fMRI. Listeners were exposed to an unfamiliar language (Icelandic) while undergoing four consecutive fMRI scans. The Icelandic stimuli were constrained in ways known to produce rapid learning of aspects of language structure. After approximately 4 min of exposure to the Icelandic stimuli, participants began to differentiate between correct and incorrect sentences at above chance levels, with significant improvement between the first and last scan. An independent component analysis of the imaging data revealed four task-related components, two of which were associated with behavioral performance early in the experiment, and two with performance later in the experiment. This outcome suggests dynamic changes occur in the recruitment of neural resources even within the initial period of exposure to an unfamiliar natural language. Copyright © 2014 Elsevier Ltd. All rights reserved.
Linguistically informed digital fingerprints for text
NASA Astrophysics Data System (ADS)
Uzuner, Özlem
2006-02-01
Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.
Truth and probability in evolutionary games
NASA Astrophysics Data System (ADS)
Barrett, Jeffrey A.
2017-01-01
This paper concerns two composite Lewis-Skyrms signalling games. Each consists in a base game that evolves a language descriptive of nature and a metagame that coevolves a language descriptive of the base game and its evolving language. The first composite game shows how a pragmatic notion of truth might coevolve with a simple descriptive language. The second shows how a pragmatic notion of probability might similarly coevolve. Each of these pragmatic notions is characterised by the particular game and role that it comes to play in the game.
Natural language acquisition in large scale neural semantic networks
NASA Astrophysics Data System (ADS)
Ealey, Douglas
This thesis puts forward the view that a purely signal- based approach to natural language processing is both plausible and desirable. By questioning the veracity of symbolic representations of meaning, it argues for a unified, non-symbolic model of knowledge representation that is both biologically plausible and, potentially, highly efficient. Processes to generate a grounded, neural form of this model-dubbed the semantic filter-are discussed. The combined effects of local neural organisation, coincident with perceptual maturation, are used to hypothesise its nature. This theoretical model is then validated in light of a number of fundamental neurological constraints and milestones. The mechanisms of semantic and episodic development that the model predicts are then used to explain linguistic properties, such as propositions and verbs, syntax and scripting. To mimic the growth of locally densely connected structures upon an unbounded neural substrate, a system is developed that can grow arbitrarily large, data- dependant structures composed of individual self- organising neural networks. The maturational nature of the data used results in a structure in which the perception of concepts is refined by the networks, but demarcated by subsequent structure. As a consequence, the overall structure shows significant memory and computational benefits, as predicted by the cognitive and neural models. Furthermore, the localised nature of the neural architecture also avoids the increasing error sensitivity and redundancy of traditional systems as the training domain grows. The semantic and episodic filters have been demonstrated to perform as well, or better, than more specialist networks, whilst using significantly larger vocabularies, more complex sentence forms and more natural corpora.
A Semantic Parsing Method for Mapping Clinical Questions to Logical Forms
Roberts, Kirk; Patra, Braja Gopal
2017-01-01
This paper presents a method for converting natural language questions about structured data in the electronic health record (EHR) into logical forms. The logical forms can then subsequently be converted to EHR-dependent structured queries. The natural language processing task, known as semantic parsing, has the potential to convert questions to logical forms with extremely high precision, resulting in a system that is usable and trusted by clinicians for real-time use in clinical settings. We propose a hybrid semantic parsing method, combining rule-based methods with a machine learning-based classifier. The overall semantic parsing precision on a set of 212 questions is 95.6%. The parser’s rules furthermore allow it to “know what it does not know”, enabling the system to indicate when unknown terms prevent it from understanding the question’s full logical structure. When combined with a module for converting a logical form into an EHR-dependent query, this high-precision approach allows for a question answering system to provide a user with a single, verifiably correct answer. PMID:29854217
Machine Learning and Radiology
Wang, Shijun; Summers, Ronald M.
2012-01-01
In this paper, we give a short introduction to machine learning and survey its applications in radiology. We focused on six categories of applications in radiology: medical image segmentation, registration, computer aided detection and diagnosis, brain function or activity analysis and neurological disease diagnosis from fMR images, content-based image retrieval systems for CT or MRI images, and text analysis of radiology reports using natural language processing (NLP) and natural language understanding (NLU). This survey shows that machine learning plays a key role in many radiology applications. Machine learning identifies complex patterns automatically and helps radiologists make intelligent decisions on radiology data such as conventional radiographs, CT, MRI, and PET images and radiology reports. In many applications, the performance of machine learning-based automatic detection and diagnosis systems has shown to be comparable to that of a well-trained and experienced radiologist. Technology development in machine learning and radiology will benefit from each other in the long run. Key contributions and common characteristics of machine learning techniques in radiology are discussed. We also discuss the problem of translating machine learning applications to the radiology clinical setting, including advantages and potential barriers. PMID:22465077
Metaphor Identification in Large Texts Corpora
Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir
2013-01-01
Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus. PMID:23658625
Olsher, Daniel
2014-10-01
Noise-resistant and nuanced, COGBASE makes 10 million pieces of commonsense data and a host of novel reasoning algorithms available via a family of semantically-driven prior probability distributions. Machine learning, Big Data, natural language understanding/processing, and social AI can draw on COGBASE to determine lexical semantics, infer goals and interests, simulate emotion and affect, calculate document gists and topic models, and link commonsense knowledge to domain models and social, spatial, cultural, and psychological data. COGBASE is especially ideal for social Big Data, which tends to involve highly implicit contexts, cognitive artifacts, difficult-to-parse texts, and deep domain knowledge dependencies. Copyright © 2014 Elsevier Ltd. All rights reserved.
Natural language generation of surgical procedures.
Wagner, J C; Rogers, J E; Baud, R H; Scherrer, J R
1999-01-01
A number of compositional Medical Concept Representation systems are being developed. Although these provide for a detailed conceptual representation of the underlying information, they have to be translated back to natural language for used by end-users and applications. The GALEN programme has been developing one such representation and we report here on a tool developed to generate natural language phrases from the GALEN conceptual representations. This tool can be adapted to different source modelling schemes and to different destination languages or sublanguages of a domain. It is based on a multilingual approach to natural language generation, realised through a clean separation of the domain model from the linguistic model and their link by well defined structures. Specific knowledge structures and operations have been developed for bridging between the modelling 'style' of the conceptual representation and natural language. Using the example of the scheme developed for modelling surgical operative procedures within the GALEN-IN-USE project, we show how the generator is adapted to such a scheme. The basic characteristics of the surgical procedures scheme are presented together with the basic principles of the generation tool. Using worked examples, we discuss the transformation operations which change the initial source representation into a form which can more directly be translated to a given natural language. In particular, the linguistic knowledge which has to be introduced--such as definitions of concepts and relationships is described. We explain the overall generator strategy and how particular transformation operations are triggered by language-dependent and conceptual parameters. Results are shown for generated French phrases corresponding to surgical procedures from the urology domain.
Music, neurology, and psychology in the nineteenth century.
Graziano, Amy B; Johnson, Julene K
2015-01-01
This chapter examines connections between research in music, neurology, and psychology during the late-nineteenth century. Researchers in all three disciplines investigated how music is processed by the brain. Psychologists and comparative musicologists, such as Carl Stumpf, thought in terms of multiple levels of sensory processing and mental representation. Early thinking about music processing can be linked to the start of Gestalt psychology. Neurologists such as August Knoblauch also discussed multiple levels of music processing, basing speculation on ideas about language processing. Knoblauch and others attempted to localize music function in the brain. Other neurologists, such as John Hughlings Jackson, discussed a dissociation between music as an emotional system and language as an intellectual system. Richard Wallaschek seems to have been the only one from the late-nineteenth century to synthesize ideas from musicology, psychology, and neurology. He used ideas from psychology to explain music processing and audience reactions and also used case studies from neurology to support arguments about the nature of music. Understanding the history of this research sheds light on the development of all three disciplines-musicology, neurology, and psychology. © 2015 Elsevier B.V. All rights reserved.
On a Possible Relationship between Linguistic Expertise and EEG Gamma Band Phase Synchrony
Reiterer, Susanne; Pereda, Ernesto; Bhattacharya, Joydeep
2011-01-01
Recent research has shown that extensive training in and exposure to a second language can modify the language organization in the brain by causing both structural and functional changes. However it is not yet known how these changes are manifested by the dynamic brain oscillations and synchronization patterns subserving the language networks. In search for synchronization correlates of proficiency and expertise in second language acquisition, multivariate EEG signals were recorded from 44 high and low proficiency bilinguals during processing of natural language in their first and second languages. Gamma band (30–45 Hz) phase synchronization (PS) was calculated mainly by two recently developed methods: coarse-graining of Markov chains (estimating global phase synchrony, measuring the degree of PS between one electrode and all other electrodes), and phase lag index (PLI; estimating bivariate phase synchrony, measuring the degree of PS between a pair of electrodes). On comparing second versus first language processing, global PS by coarse-graining Markov chains indicated that processing of the second language needs significantly higher synchronization strength than first language. On comparing the proficiency groups, bivariate PS measure (i.e., PLI) revealed that during second language processing the low proficiency group showed stronger and broader network patterns than the high proficiency group, with interconnectivities between a left fronto-parietal network. Mean phase coherence analysis also indicated that the network activity was globally stronger in the low proficiency group during second language processing. PMID:22125542
1974-07-01
iiWU -immmemmmmm This document was generated by the Stanford Artificial Intelligence Laboratory’s document compiler, "PUB" and reproducec’ on a...for more sophisticated artificial (programming) languages. The new issues became those of how to represent a grammar as precise syntactic structures...challenge lies in discovering - either by synthesis of an artificial system, or by analysis of a natural one - the underlying logical (a. opposed to
Memory Operations and Structures in Sentence Comprehension: Evidence from Ellipsis
ERIC Educational Resources Information Center
Martin, Andrea Eyleen
2010-01-01
Natural language often contains dependencies that span words, phrases, or even sentences. Thus, language comprehension relies on recovering recently processed information from memory for subsequent interpretation. This dissertation investigates the memory operations that subserve dependency resolution through the lens of "verb-phrase ellipsis"…
A Novel Approach to Creating Disambiguated Multilingual Dictionaries
ERIC Educational Resources Information Center
Boguslavsky, Igor; Cardenosa, Jesus; Gallardo, Carolina
2009-01-01
Multilingual lexicons are needed in various applications, such as cross-lingual information retrieval, machine translation, and some others. Often, these applications suffer from the ambiguity of dictionary items, especially when an intermediate natural language is involved in the process of the dictionary construction, since this language adds…
NASA Astrophysics Data System (ADS)
Colucci-Gray, Laura; Perazzone, Anna; Dodman, Martin; Camino, Elena
2013-03-01
In this three-part article we seek to establish connections between the emerging framework of sustainability science and the methodological basis of research and practice in science education in order to bring forth knowledge and competences for sustainability. The first and second parts deal with the implications of taking a sustainability view in relation to knowledge processes. The complexity, uncertainty and urgency of global environmental problems challenge the foundations of reductionist Western science. Within such debate, the proposal of sustainability science advocates for inter-disciplinary and inter-paradigmatic collaboration and it includes the requirements of post- normal science proposing a respectful dialogue between experts and non-experts in the construction of new scientific knowledge. Such a change of epistemology is rooted into participation, deliberation and the gathering of extended-facts where cultural framings and values are the hard components in the face of soft facts. A reflection on language and communication processes is thus the focus of knowledge practices and educational approaches aimed at sustainability. Language contains the roots of conceptual thinking (including scientific knowledge) and each culture and society are defined and limited by the language that is used to describe and act upon the world. Within a scenario of sustainability, a discussion of scientific language is in order to retrace the connections between language and culture, and to promote a holistic view based on pluralism and dialogue. Drawing on the linguistic reflection, the third part gives examples of teaching and learning situations involving prospective science teachers in action-research contexts: these activities are set out to promote linguistic integration and to introduce reflexive process into science learning. Discussion will focus on the methodological features of a learning process that is akin to a communal and emancipatory research process within a sustainability scenario.
Hu, Weiming; Tian, Guodong; Kang, Yongxin; Yuan, Chunfeng; Maybank, Stephen
2017-09-25
In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. On combining the learnt sources and sinks, semantic motion regions, and the learnt sequence of atomic activities, the action represented by the trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.
Östling, Robert; Börstell, Carl; Courtaux, Servane
2018-01-01
We use automatic processing of 120,000 sign videos in 31 different sign languages to show a cross-linguistic pattern for two types of iconic form–meaning relationships in the visual modality. First, we demonstrate that the degree of inherent plurality of concepts, based on individual ratings by non-signers, strongly correlates with the number of hands used in the sign forms encoding the same concepts across sign languages. Second, we show that certain concepts are iconically articulated around specific parts of the body, as predicted by the associational intuitions by non-signers. The implications of our results are both theoretical and methodological. With regard to theoretical implications, we corroborate previous research by demonstrating and quantifying, using a much larger material than previously available, the iconic nature of languages in the visual modality. As for the methodological implications, we show how automatic methods are, in fact, useful for performing large-scale analysis of sign language data, to a high level of accuracy, as indicated by our manual error analysis. PMID:29867684
An Infinite Mixture Model for Coreference Resolution in Clinical Notes
Liu, Sijia; Liu, Hongfang; Chaudhary, Vipin; Li, Dingcheng
2016-01-01
It is widely acknowledged that natural language processing is indispensable to process electronic health records (EHRs). However, poor performance in relation detection tasks, such as coreference (linguistic expressions pertaining to the same entity/event) may affect the quality of EHR processing. Hence, there is a critical need to advance the research for relation detection from EHRs. Most of the clinical coreference resolution systems are based on either supervised machine learning or rule-based methods. The need for manually annotated corpus hampers the use of such system in large scale. In this paper, we present an infinite mixture model method using definite sampling to resolve coreferent relations among mentions in clinical notes. A similarity measure function is proposed to determine the coreferent relations. Our system achieved a 0.847 F-measure for i2b2 2011 coreference corpus. This promising results and the unsupervised nature make it possible to apply the system in big-data clinical setting. PMID:27595047
LLOGO: An Implementation of LOGO in LISP. Artificial Intelligence Memo Number 307.
ERIC Educational Resources Information Center
Goldstein, Ira; And Others
LISP LOGO is a computer language invented for the beginning student of man-machine interaction. The language has the advantages of simplicity and naturalness as well as that of emphasizing the difference between programs and data. The language is based on the LOGO language and uses mnemonic syllables as commands. It can be used in conjunction with…
On the nature and evolution of the neural bases of human language
NASA Technical Reports Server (NTRS)
Lieberman, Philip
2002-01-01
The traditional theory equating the brain bases of language with Broca's and Wernicke's neocortical areas is wrong. Neural circuits linking activity in anatomically segregated populations of neurons in subcortical structures and the neocortex throughout the human brain regulate complex behaviors such as walking, talking, and comprehending the meaning of sentences. When we hear or read a word, neural structures involved in the perception or real-world associations of the word are activated as well as posterior cortical regions adjacent to Wernicke's area. Many areas of the neocortex and subcortical structures support the cortical-striatal-cortical circuits that confer complex syntactic ability, speech production, and a large vocabulary. However, many of these structures also form part of the neural circuits regulating other aspects of behavior. For example, the basal ganglia, which regulate motor control, are also crucial elements in the circuits that confer human linguistic ability and abstract reasoning. The cerebellum, traditionally associated with motor control, is active in motor learning. The basal ganglia are also key elements in reward-based learning. Data from studies of Broca's aphasia, Parkinson's disease, hypoxia, focal brain damage, and a genetically transmitted brain anomaly (the putative "language gene," family KE), and from comparative studies of the brains and behavior of other species, demonstrate that the basal ganglia sequence the discrete elements that constitute a complete motor act, syntactic process, or thought process. Imaging studies of intact human subjects and electrophysiologic and tracer studies of the brains and behavior of other species confirm these findings. As Dobzansky put it, "Nothing in biology makes sense except in the light of evolution" (cited in Mayr, 1982). That applies with as much force to the human brain and the neural bases of language as it does to the human foot or jaw. The converse follows: the mark of evolution on the brains of human beings and other species provides insight into the evolution of the brain bases of human language. The neural substrate that regulated motor control in the common ancestor of apes and humans most likely was modified to enhance cognitive and linguistic ability. Speech communication played a central role in this process. However, the process that ultimately resulted in the human brain may have started when our earliest hominid ancestors began to walk.
Neurophysiological mechanisms involved in language learning in adults
Rodríguez-Fornells, Antoni; Cunillera, Toni; Mestres-Missé, Anna; de Diego-Balaguer, Ruth
2009-01-01
Little is known about the brain mechanisms involved in word learning during infancy and in second language acquisition and about the way these new words become stable representations that sustain language processing. In several studies we have adopted the human simulation perspective, studying the effects of brain-lesions and combining different neuroimaging techniques such as event-related potentials and functional magnetic resonance imaging in order to examine the language learning (LL) process. In the present article, we review this evidence focusing on how different brain signatures relate to (i) the extraction of words from speech, (ii) the discovery of their embedded grammatical structure, and (iii) how meaning derived from verbal contexts can inform us about the cognitive mechanisms underlying the learning process. We compile these findings and frame them into an integrative neurophysiological model that tries to delineate the major neural networks that might be involved in the initial stages of LL. Finally, we propose that LL simulations can help us to understand natural language processing and how the recovery from language disorders in infants and adults can be accomplished. PMID:19933142
An Evaluation Methodology for Natural Language Processing Systems
1992-12-01
8217DT".3 "_Griffiss Air Force Base U , . d J _!::! • •, .>:--------. ..._ _. ...........• E v¢ .................. ......... Av,:iabihty Codes Avv L...each " trial " or evaluation item. This approach to assessing reliability has some similarity to the reliability study per- formed by Hix and Schulman for...the clinic . Criteria: Demonstrated understanding that the object or entity expressed by the first noun benefits from the object expressed by the second
A Hybrid Approach to Clinical Question Answering
2014-11-01
participation in TREC, we submitted a single run using a hybrid Natural Language Processing ( NLP )-driven approach to accomplish the given task. Evaluation re...for the CDS track uses a variety of NLP - based techniques to address the clinical questions provided. We present a description of our approach, and...discuss our experimental setup, results and eval- uation in the subsequent sections. 2 Description of Our Approach Our hybrid NLP -driven method presents a
Compositional and enumerative designs for medical language representation.
Rassinoux, A. M.; Miller, R. A.; Baud, R. H.; Scherrer, J. R.
1997-01-01
Medical language is in essence highly compositional, allowing complex information to be expressed from more elementary pieces. Embedding the expressive power of medical language into formal systems of representation is recognized in the medical informatics community as a key step towards sharing such information among medical record, decision support, and information retrieval systems. Accordingly, such representation requires managing both the expressiveness of the formalism and its computational tractability, while coping with the level of detail expected by clinical applications. These desiderata can be supported by enumerative as well as compositional approaches, as argued in this paper. These principles have been applied in recasting a frame-based system for general medical findings developed during the 1980s. The new system captures the precise meaning of a subset of over 1500 medical terms for general internal medicine identified from the Quick Medical Reference (QMR) lexicon. In order to evaluate the adequacy of this formal structure in reflecting the deep meaning of the QMR findings, a validation process was implemented. It consists of automatically rebuilding the semantic representation of the QMR findings by analyzing them through the RECIT natural language analyzer, whose semantic components have been adjusted to this frame-based model for the understanding task. PMID:9357700
Compositional and enumerative designs for medical language representation.
Rassinoux, A M; Miller, R A; Baud, R H; Scherrer, J R
1997-01-01
Medical language is in essence highly compositional, allowing complex information to be expressed from more elementary pieces. Embedding the expressive power of medical language into formal systems of representation is recognized in the medical informatics community as a key step towards sharing such information among medical record, decision support, and information retrieval systems. Accordingly, such representation requires managing both the expressiveness of the formalism and its computational tractability, while coping with the level of detail expected by clinical applications. These desiderata can be supported by enumerative as well as compositional approaches, as argued in this paper. These principles have been applied in recasting a frame-based system for general medical findings developed during the 1980s. The new system captures the precise meaning of a subset of over 1500 medical terms for general internal medicine identified from the Quick Medical Reference (QMR) lexicon. In order to evaluate the adequacy of this formal structure in reflecting the deep meaning of the QMR findings, a validation process was implemented. It consists of automatically rebuilding the semantic representation of the QMR findings by analyzing them through the RECIT natural language analyzer, whose semantic components have been adjusted to this frame-based model for the understanding task.
Automatic generation of the index of productive syntax for child language transcripts.
Hassanali, Khairun-nisa; Liu, Yang; Iglesias, Aquiles; Solorio, Thamar; Dollaghan, Christine
2014-03-01
The index of productive syntax (IPSyn; Scarborough (Applied Psycholinguistics 11:1-22, 1990) is a measure of syntactic development in child language that has been used in research and clinical settings to investigate the grammatical development of various groups of children. However, IPSyn is mostly calculated manually, which is an extremely laborious process. In this article, we describe the AC-IPSyn system, which automatically calculates the IPSyn score for child language transcripts using natural language processing techniques. Our results show that the AC-IPSyn system performs at levels comparable to scores computed manually. The AC-IPSyn system can be downloaded from www.hlt.utdallas.edu/~nisa/ipsyn.html .
Language Supports for Journal Abstract Writing across Disciplines
ERIC Educational Resources Information Center
Liou, H.-C.; Yang, P.-C.; Chang, J. S.
2012-01-01
Various writing assistance tools have been developed through efforts in the areas of natural language processing with different degrees of success of curriculum integration depending on their functional rigor and pedagogical designs. In this paper, we developed a system, WriteAhead, that provides six types of suggestions when non-native graduate…
Inferring Speaker Affect in Spoken Natural Language Communication
ERIC Educational Resources Information Center
Pon-Barry, Heather Roberta
2013-01-01
The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…
English Complex Verb Constructions: Identification and Inference
ERIC Educational Resources Information Center
Tu, Yuancheng
2012-01-01
The fundamental problem faced by automatic text understanding in Natural Language Processing (NLP) is to identify semantically related pieces of text and integrate them together to compute the meaning of the whole text. However, the principle of compositionality runs into trouble very quickly when real language is examined with its frequent…
Evaluation of Natural Language Processors.
1980-11-01
techniques described. Common practice in describing natural language processors is to describe the programs, then give about 20 examples of correctly...make a decision based on performance as to which approaches are most promising for further research and development. The lack of evaluation leaves...successively more difficult problems. This approach might be compared to children taking achievement tests in school. A 90% score on problems involving
ERIC Educational Resources Information Center
Parker, Catherine Frieda
2010-01-01
A possible contributing factor to students' difficulty in learning advanced mathematics is the conflict between students' "natural" learning styles and the formal structure of mathematics, which is based on definitions, theorems, and proofs. Students' natural learning styles may be a function of their intuition and language skills. The purpose of…
Process for selecting engineering tools : applied to selecting a SysML tool.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Spain, Mark J.; Post, Debra S.; Taylor, Jeffrey L.
2011-02-01
Process for Selecting Engineering Tools outlines the process and tools used to select a SysML (Systems Modeling Language) tool. The process is general in nature and users could use the process to select most engineering tools and software applications.
Nowak, Martin A.; Krakauer, David C.
1999-01-01
The emergence of language was a defining moment in the evolution of modern humans. It was an innovation that changed radically the character of human society. Here, we provide an approach to language evolution based on evolutionary game theory. We explore the ways in which protolanguages can evolve in a nonlinguistic society and how specific signals can become associated with specific objects. We assume that early in the evolution of language, errors in signaling and perception would be common. We model the probability of misunderstanding a signal and show that this limits the number of objects that can be described by a protolanguage. This “error limit” is not overcome by employing more sounds but by combining a small set of more easily distinguishable sounds into words. The process of “word formation” enables a language to encode an essentially unlimited number of objects. Next, we analyze how words can be combined into sentences and specify the conditions for the evolution of very simple grammatical rules. We argue that grammar originated as a simplified rule system that evolved by natural selection to reduce mistakes in communication. Our theory provides a systematic approach for thinking about the origin and evolution of human language. PMID:10393942
Huang, Yang; Lowe, Henry J.; Klein, Dan; Cucina, Russell J.
2005-01-01
Objective: The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS® Specialist Lexicon to improve noun phrase identification within clinical radiology documents. Design: The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)® Clinical Document Architecture (CDA)–compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. Results: The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. Conclusion: The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain. PMID:15684131
Zheng, Chengyi; Luo, Yi; Mercado, Cheryl; Sy, Lina; Jacobsen, Steven J; Ackerson, Brad; Lewin, Bruno; Tseng, Hung Fu
2018-06-19
Diagnosis codes are inadequate for accurately identifying herpes zoster ophthalmicus (HZO). There is significant lack of population-based studies on HZO due to the high expense of manual review of medical records. To assess whether HZO can be identified from the clinical notes using natural language processing (NLP). To investigate the epidemiology of HZO among HZ population based on the developed approach. A retrospective cohort analysis. A total of 49,914 southern California residents aged over 18 years, who had a new diagnosis of HZ. An NLP-based algorithm was developed and validated with the manually curated validation dataset (n=461). The algorithm was applied on over 1 million clinical notes associated with the study population. HZO versus non-HZO cases were compared by age, sex, race, and comorbidities. We measured the accuracy of NLP algorithm. NLP algorithm achieved 95.6% sensitivity and 99.3% specificity. Compared to the diagnosis codes, NLP identified significant more HZO cases among HZ population (13.9% versus 1.7%). Compared to the non-HZO group, the HZO group was older, had more males, had more Whites, and had more outpatient visits. We developed and validated an automatic method to identify HZO cases with high accuracy. As one of the largest studies on HZO, our finding emphasizes the importance of preventing HZ in the elderly population. This method can be a valuable tool to support population-based studies and clinical care of HZO in the era of big data. This article is protected by copyright. All rights reserved.