Sample records for natural language parser

  1. Benchmarking natural-language parsers for biological applications using dependency graphs.

    PubMed

    Clegg, Andrew B; Shepherd, Adrian J

    2007-01-25

    Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques.

  2. Benchmarking natural-language parsers for biological applications using dependency graphs

    PubMed Central

    Clegg, Andrew B; Shepherd, Adrian J

    2007-01-01

    Background Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. Results Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. Conclusion Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques. PMID:17254351

  3. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

    PubMed

    Ferraro, Jeffrey P; Ye, Ye; Gesteland, Per H; Haug, Peter J; Tsui, Fuchiang Rich; Cooper, Gregory F; Van Bree, Rudy; Ginter, Thomas; Nowalk, Andrew J; Wagner, Michael

    2017-05-31

    This study evaluates the accuracy and portability of a natural language processing (NLP) tool for extracting clinical findings of influenza from clinical notes across two large healthcare systems. Effectiveness is evaluated on how well NLP supports downstream influenza case-detection for disease surveillance. We independently developed two NLP parsers, one at Intermountain Healthcare (IH) in Utah and the other at University of Pittsburgh Medical Center (UPMC) using local clinical notes from emergency department (ED) encounters of influenza. We measured NLP parser performance for the presence and absence of 70 clinical findings indicative of influenza. We then developed Bayesian network models from NLP processed reports and tested their ability to discriminate among cases of (1) influenza, (2) non-influenza influenza-like illness (NI-ILI), and (3) 'other' diagnosis. On Intermountain Healthcare reports, recall and precision of the IH NLP parser were 0.71 and 0.75, respectively, and UPMC NLP parser, 0.67 and 0.79. On University of Pittsburgh Medical Center reports, recall and precision of the UPMC NLP parser were 0.73 and 0.80, respectively, and IH NLP parser, 0.53 and 0.80. Bayesian case-detection performance measured by AUROC for influenza versus non-influenza on Intermountain Healthcare cases was 0.93 (using IH NLP parser) and 0.93 (using UPMC NLP parser). Case-detection on University of Pittsburgh Medical Center cases was 0.95 (using UPMC NLP parser) and 0.83 (using IH NLP parser). For influenza versus NI-ILI on Intermountain Healthcare cases performance was 0.70 (using IH NLP parser) and 0.76 (using UPMC NLP parser). On University of Pisstburgh Medical Center cases, 0.76 (using UPMC NLP parser) and 0.65 (using IH NLP parser). In all but one instance (influenza versus NI-ILI using IH cases), local parsers were more effective at supporting case-detection although performances of non-local parsers were reasonable.

  4. COD::CIF::Parser: an error-correcting CIF parser for the Perl language.

    PubMed

    Merkys, Andrius; Vaitkus, Antanas; Butkus, Justas; Okulič-Kazarinas, Mykolas; Kairys, Visvaldas; Gražulis, Saulius

    2016-02-01

    A syntax-correcting CIF parser, COD::CIF::Parser , is presented that can parse CIF 1.1 files and accurately report the position and the nature of the discovered syntactic problems. In addition, the parser is able to automatically fix the most common and the most obvious syntactic deficiencies of the input files. Bindings for Perl, C and Python programming environments are available. Based on COD::CIF::Parser , the cod-tools package for manipulating the CIFs in the Crystallography Open Database (COD) has been developed. The cod-tools package has been successfully used for continuous updates of the data in the automated COD data deposition pipeline, and to check the validity of COD data against the IUCr data validation guidelines. The performance, capabilities and applications of different parsers are compared.

  5. Toward a theory of distributed word expert natural language parsing

    NASA Technical Reports Server (NTRS)

    Rieger, C.; Small, S.

    1981-01-01

    An approach to natural language meaning-based parsing in which the unit of linguistic knowledge is the word rather than the rewrite rule is described. In the word expert parser, knowledge about language is distributed across a population of procedural experts, each representing a word of the language, and each an expert at diagnosing that word's intended usage in context. The parser is structured around a coroutine control environment in which the generator-like word experts ask questions and exchange information in coming to collective agreement on sentence meaning. The word expert theory is advanced as a better cognitive model of human language expertise than the traditional rule-based approach. The technical discussion is organized around examples taken from the prototype LISP system which implements parts of the theory.

  6. Integrated Intelligence: Robot Instruction via Interactive Grounded Learning

    DTIC Science & Technology

    2016-02-14

    ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Robotics; Natural Language Processing ; Grounded Language ...Logical Forms for Referring Expression Generation, Emperical Methods in Natural Language Processing (EMNLP). 18-OCT-13, . : , Tom Kwiatkowska, Eunsol...Choi, Yoav Artzi, Luke Zettlemoyer. Scaling Semantic Parsers with On-the-fly Ontology Matching, Emperical Methods in Natural Langauge Processing

  7. A natural language interface to databases

    NASA Technical Reports Server (NTRS)

    Ford, D. R.

    1988-01-01

    The development of a Natural Language Interface which is semantic-based and uses Conceptual Dependency representation is presented. The system was developed using Lisp and currently runs on a Symbolics Lisp machine. A key point is that the parser handles morphological analysis, which expands its capabilities of understanding more words.

  8. Semantic based man-machine interface for real-time communication

    NASA Technical Reports Server (NTRS)

    Ali, M.; Ai, C.-S.

    1988-01-01

    A flight expert system (FLES) was developed to assist pilots in monitoring, diagnosing and recovering from in-flight faults. To provide a communications interface between the flight crew and FLES, a natural language interface (NALI) was implemented. Input to NALI is processed by three processors: (1) the semantics parser; (2) the knowledge retriever; and (3) the response generator. First the semantic parser extracts meaningful words and phrases to generate an internal representation of the query. At this point, the semantic parser has the ability to map different input forms related to the same concept into the same internal representation. Then the knowledge retriever analyzes and stores the context of the query to aid in resolving ellipses and pronoun references. At the end of this process, a sequence of retrievel functions is created as a first step in generating the proper response. Finally, the response generator generates the natural language response to the query. The architecture of NALI was designed to process both temporal and nontemporal queries. The architecture and implementation of NALI are described.

  9. La Description des langues naturelles en vue d'applications linguistiques: Actes du colloque (The Description of Natural Languages with a View to Linguistic Applications: Conference Papers). Publication K-10.

    ERIC Educational Resources Information Center

    Ouellon, Conrad, Comp.

    Presentations from a colloquium on applications of research on natural languages to computer science address the following topics: (1) analysis of complex adverbs; (2) parser use in computerized text analysis; (3) French language utilities; (4) lexicographic mapping of official language notices; (5) phonographic codification of Spanish; (6)…

  10. Extracting noun phrases for all of MEDLINE.

    PubMed Central

    Bennett, N. A.; He, Q.; Powell, K.; Schatz, B. R.

    1999-01-01

    A natural language parser that could extract noun phrases for all medical texts would be of great utility in analyzing content for information retrieval. We discuss the extraction of noun phrases from MEDLINE, using a general parser not tuned specifically for any medical domain. The noun phrase extractor is made up of three modules: tokenization; part-of-speech tagging; noun phrase identification. Using our program, we extracted noun phrases from the entire MEDLINE collection, encompassing 9.3 million abstracts. Over 270 million noun phrases were generated, of which 45 million were unique. The quality of these phrases was evaluated by examining all phrases from a sample collection of abstracts. The precision and recall of the phrases from our general parser compared favorably with those from three other parsers we had previously evaluated. We are continuing to improve our parser and evaluate our claim that a generic parser can effectively extract all the different phrases across the entire medical literature. PMID:10566444

  11. Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

    PubMed

    Huang, Yang; Lowe, Henry J; Klein, Dan; Cucina, Russell J

    2005-01-01

    The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS(R) Specialist Lexicon to improve noun phrase identification within clinical radiology documents. The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)(R) Clinical Document Architecture (CDA)-compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain.

  12. Policy-Based Management Natural Language Parser

    NASA Technical Reports Server (NTRS)

    James, Mark

    2009-01-01

    The Policy-Based Management Natural Language Parser (PBEM) is a rules-based approach to enterprise management that can be used to automate certain management tasks. This parser simplifies the management of a given endeavor by establishing policies to deal with situations that are likely to occur. Policies are operating rules that can be referred to as a means of maintaining order, security, consistency, or other ways of successfully furthering a goal or mission. PBEM provides a way of managing configuration of network elements, applications, and processes via a set of high-level rules or business policies rather than managing individual elements, thus switching the control to a higher level. This software allows unique management rules (or commands) to be specified and applied to a cross-section of the Global Information Grid (GIG). This software embodies a parser that is capable of recognizing and understanding conversational English. Because all possible dialect variants cannot be anticipated, a unique capability was developed that parses passed on conversation intent rather than the exact way the words are used. This software can increase productivity by enabling a user to converse with the system in conversational English to define network policies. PBEM can be used in both manned and unmanned science-gathering programs. Because policy statements can be domain-independent, this software can be applied equally to a wide variety of applications.

  13. Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2.

    PubMed

    Chen, W; Kowatch, R; Lin, S; Splaingard, M; Huang, Y

    2015-01-01

    Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.

  14. Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

    PubMed Central

    Chen, W.; Kowatch, R.; Lin, S.; Splaingard, M.

    2015-01-01

    Summary Nationwide Children’s Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. Objective We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. Methods We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. Results 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. Conclusion Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use. PMID:26171080

  15. Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon

    PubMed Central

    Huang, Yang; Lowe, Henry J.; Klein, Dan; Cucina, Russell J.

    2005-01-01

    Objective: The aim of this study was to develop and evaluate a method of extracting noun phrases with full phrase structures from a set of clinical radiology reports using natural language processing (NLP) and to investigate the effects of using the UMLS® Specialist Lexicon to improve noun phrase identification within clinical radiology documents. Design: The noun phrase identification (NPI) module is composed of a sentence boundary detector, a statistical natural language parser trained on a nonmedical domain, and a noun phrase (NP) tagger. The NPI module processed a set of 100 XML-represented clinical radiology reports in Health Level 7 (HL7)® Clinical Document Architecture (CDA)–compatible format. Computed output was compared with manual markups made by four physicians and one author for maximal (longest) NP and those made by one author for base (simple) NP, respectively. An extended lexicon of biomedical terms was created from the UMLS Specialist Lexicon and used to improve NPI performance. Results: The test set was 50 randomly selected reports. The sentence boundary detector achieved 99.0% precision and 98.6% recall. The overall maximal NPI precision and recall were 78.9% and 81.5% before using the UMLS Specialist Lexicon and 82.1% and 84.6% after. The overall base NPI precision and recall were 88.2% and 86.8% before using the UMLS Specialist Lexicon and 93.1% and 92.6% after, reducing false-positives by 31.1% and false-negatives by 34.3%. Conclusion: The sentence boundary detector performs excellently. After the adaptation using the UMLS Specialist Lexicon, the statistical parser's NPI performance on radiology reports increased to levels comparable to the parser's native performance in its newswire training domain and to that reported by other researchers in the general nonmedical domain. PMID:15684131

  16. Parsing clinical text: how good are the state-of-the-art parsers?

    PubMed

    Jiang, Min; Huang, Yang; Fan, Jung-wei; Tang, Buzhou; Denny, Josh; Xu, Hua

    2015-01-01

    Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.

  17. Applying Semantic-based Probabilistic Context-Free Grammar to Medical Language Processing – A Preliminary Study on Parsing Medication Sentences

    PubMed Central

    Xu, Hua; AbdelRahman, Samir; Lu, Yanxin; Denny, Joshua C.; Doan, Son

    2011-01-01

    Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4,564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser. PMID:21856440

  18. Learning for Semantic Parsing with Kernels under Various Forms of Supervision

    DTIC Science & Technology

    2007-08-01

    natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing computing...sentences are semantically tractable. This indi- cates that Geoquery is more challenging domain for semantic parsing than ATIS. In the past, there have been a...Combining parsers. In Proceedings of the Conference on Em- pirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/ VLC -99), pp. 187–194

  19. Parsing clinical text: how good are the state-of-the-art parsers?

    PubMed Central

    2015-01-01

    Background Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. Methods In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Results Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Conclusions Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text. PMID:26045009

  20. A python tool for the implementation of domain-specific languages

    NASA Astrophysics Data System (ADS)

    Dejanović, Igor; Vaderna, Renata; Milosavljević, Gordana; Simić, Miloš; Vuković, Željko

    2017-07-01

    In this paper we describe textX, a meta-language and a tool for building Domain-Specific Languages. It is implemented in Python using Arpeggio PEG (Parsing Expression Grammar) parser library. From a single language description (grammar) textX will build a parser and a meta-model (a.k.a. abstract syntax) of the language. The parser is used to parse textual representations of models conforming to the meta-model. As a result of parsing, a Python object graph will be automatically created. The structure of the object graph will conform to the meta-model defined by the grammar. This approach frees a developer from the need to manually analyse a parse tree and transform it to other suitable representation. The textX library is independent of any integrated development environment and can be easily integrated in any Python project. The textX tool works as a grammar interpreter. The parser is configured at run-time using the grammar. The textX tool is a free and open-source project available at GitHub.

  1. ChemicalTagger: A tool for semantic text-mining in chemistry.

    PubMed

    Hawizy, Lezan; Jessop, David M; Adams, Nico; Murray-Rust, Peter

    2011-05-16

    The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.

  2. Robo-Sensei's NLP-Based Error Detection and Feedback Generation

    ERIC Educational Resources Information Center

    Nagata, Noriko

    2009-01-01

    This paper presents a new version of Robo-Sensei's NLP (Natural Language Processing) system which updates the version currently available as the software package "ROBO-SENSEI: Personal Japanese Tutor" (Nagata, 2004). Robo-Sensei's NLP system includes a lexicon, a morphological generator, a word segmentor, a morphological parser, a syntactic…

  3. Intelligent Agents as a Basis for Natural Language Interfaces

    DTIC Science & Technology

    1988-01-01

    language analysis component of UC, which produces a semantic representa tion of the input. This representation is in the form of a KODIAK network (see...Appendix A). Next, UC’s Concretion Mechanism performs concretion inferences ([Wilensky, 1983] and [Norvig, 1983]) based on the semantic network...The first step in UC’s processing is done by UC’s parser/understander component which produces a KODIAK semantic network representa tion of

  4. Disambiguating the species of biomedical named entities using natural language parsers

    PubMed Central

    Wang, Xinglong; Tsujii, Jun'ichi; Ananiadou, Sophia

    2010-01-01

    Motivation: Text mining technologies have been shown to reduce the laborious work involved in organizing the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts. This article reports on a comprehensive study on resolving the ambiguity in mentions of biomedical named entities with respect to model organisms and presents an array of approaches, with focus on methods utilizing natural language parsers. Results: We build a corpus for organism disambiguation where every occurrence of protein/gene entity is manually tagged with a species ID, and evaluate a number of methods on it. Promising results are obtained by training a machine learning model on syntactic parse trees, which is then used to decide whether an entity belongs to the model organism denoted by a neighbouring species-indicating word (e.g. yeast). The parser-based approaches are also compared with a supervised classification method and results indicate that the former are a more favorable choice when domain portability is of concern. The best overall performance is obtained by combining the strengths of syntactic features and supervised classification. Availability: The corpus and demo are available at http://www.nactem.ac.uk/deca_details/start.cgi, and the software is freely available as U-Compare components (Kano et al., 2009): NaCTeM Species Word Detector and NaCTeM Species Disambiguator. U-Compare is available at http://-compare.org/ Contact: xinglong.wang@manchester.ac.uk PMID:20053840

  5. Intelligent interfaces for expert systems

    NASA Technical Reports Server (NTRS)

    Villarreal, James A.; Wang, Lui

    1988-01-01

    Vital to the success of an expert system is an interface to the user which performs intelligently. A generic intelligent interface is being developed for expert systems. This intelligent interface was developed around the in-house developed Expert System for the Flight Analysis System (ESFAS). The Flight Analysis System (FAS) is comprised of 84 configuration controlled FORTRAN subroutines that are used in the preflight analysis of the space shuttle. In order to use FAS proficiently, a person must be knowledgeable in the areas of flight mechanics, the procedures involved in deploying a certain payload, and an overall understanding of the FAS. ESFAS, still in its developmental stage, is taking into account much of this knowledge. The generic intelligent interface involves the integration of a speech recognizer and synthesizer, a preparser, and a natural language parser to ESFAS. The speech recognizer being used is capable of recognizing 1000 words of connected speech. The natural language parser is a commercial software package which uses caseframe instantiation in processing the streams of words from the speech recognizer or the keyboard. The systems configuration is described along with capabilities and drawbacks.

  6. Domain Adaption of Parsing for Operative Notes

    PubMed Central

    Wang, Yan; Pakhomov, Serguei; Ryan, James O.; Melton, Genevieve B.

    2016-01-01

    Background Full syntactic parsing of clinical text as a part of clinical natural language processing (NLP) is critical for a wide range of applications, such as identification of adverse drug reactions, patient cohort identification, and gene interaction extraction. Several robust syntactic parsers are publicly available to produce linguistic representations for sentences. However, these existing parsers are mostly trained on general English text and often require adaptation for optimal performance on clinical text. Our objective was to adapt an existing general English parser for the clinical text of operative reports via lexicon augmentation, statistics adjusting, and grammar rules modification based on a set of biomedical text. Method The Stanford unlexicalized probabilistic context-free grammar (PCFG) parser lexicon was expanded with SPECIALIST lexicon along with statistics collected from a limited set of operative notes tagged with a two of POS taggers (GENIA tagger and MedPost). The most frequently occurring verb entries of the SPECIALIST lexicon were adjusted based on manual review of verb usage in operative notes. Stanford parser grammar production rules were also modified based on linguistic features of operative reports. An analogous approach was then applied to the GENIA corpus to test the generalizability of this approach to biomedical text. Results The new unlexicalized PCFG parser extended with the extra lexicon from SPECIALIST along with accurate statistics collected from an operative note corpus tagged with GENIA POS tagger improved the parser performance by 2.26% from 87.64% to 89.90%. There was a progressive improvement with the addition of multiple approaches. Most of the improvement occurred with lexicon augmentation combined with statistics from the operative notes corpus. Application of this approach on the GENIA corpus showed that parsing performance was boosted by 3.81% with a simple new grammar and the addition of the GENIA corpus lexicon. Conclusion Using statistics collected from clinical text tagged with POS taggers along with proper modification of grammars and lexicons of an unlexicalized PCFG parser can improve parsing performance. PMID:25661593

  7. The parser generator as a general purpose tool

    NASA Technical Reports Server (NTRS)

    Noonan, R. E.; Collins, W. R.

    1985-01-01

    The parser generator has proven to be an extremely useful, general purpose tool. It can be used effectively by programmers having only a knowledge of grammars and no training at all in the theory of formal parsing. Some of the application areas for which a table-driven parser can be used include interactive, query languages, menu systems, translators, and programming support tools. Each of these is illustrated by an example grammar.

  8. Morphosyntactic annotation of CHILDES transcripts*

    PubMed Central

    SAGAE, KENJI; DAVIS, ERIC; LAVIE, ALON; MACWHINNEY, BRIAN; WINTNER, SHULY

    2014-01-01

    Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes. PMID:20334720

  9. ChemicalTagger: A tool for semantic text-mining in chemistry

    PubMed Central

    2011-01-01

    Background The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Results We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). Conclusions It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision. PMID:21575201

  10. Structure before Meaning: Sentence Processing, Plausibility, and Subcategorization

    PubMed Central

    Kizach, Johannes; Nyvad, Anne Mette; Christensen, Ken Ramshøj

    2013-01-01

    Natural language processing is a fast and automatized process. A crucial part of this process is parsing, the online incremental construction of a syntactic structure. The aim of this study was to test whether a wh-filler extracted from an embedded clause is initially attached as the object of the matrix verb with subsequent reanalysis, and if so, whether the plausibility of such an attachment has an effect on reaction time. Finally, we wanted to examine whether subcategorization plays a role. We used a method called G-Maze to measure response time in a self-paced reading design. The experiments confirmed that there is early attachment of fillers to the matrix verb. When this attachment is implausible, the off-line acceptability of the whole sentence is significantly reduced. The on-line results showed that G-Maze was highly suited for this type of experiment. In accordance with our predictions, the results suggest that the parser ignores (or has no access to information about) implausibility and attaches fillers as soon as possible to the matrix verb. However, the results also show that the parser uses the subcategorization frame of the matrix verb. In short, the parser ignores semantic information and allows implausible attachments but adheres to information about which type of object a verb can take, ensuring that the parser does not make impossible attachments. We argue that the evidence supports a syntactic parser informed by syntactic cues, rather than one guided by semantic cues or one that is blind, or completely autonomous. PMID:24116101

  11. Structure before meaning: sentence processing, plausibility, and subcategorization.

    PubMed

    Kizach, Johannes; Nyvad, Anne Mette; Christensen, Ken Ramshøj

    2013-01-01

    Natural language processing is a fast and automatized process. A crucial part of this process is parsing, the online incremental construction of a syntactic structure. The aim of this study was to test whether a wh-filler extracted from an embedded clause is initially attached as the object of the matrix verb with subsequent reanalysis, and if so, whether the plausibility of such an attachment has an effect on reaction time. Finally, we wanted to examine whether subcategorization plays a role. We used a method called G-Maze to measure response time in a self-paced reading design. The experiments confirmed that there is early attachment of fillers to the matrix verb. When this attachment is implausible, the off-line acceptability of the whole sentence is significantly reduced. The on-line results showed that G-Maze was highly suited for this type of experiment. In accordance with our predictions, the results suggest that the parser ignores (or has no access to information about) implausibility and attaches fillers as soon as possible to the matrix verb. However, the results also show that the parser uses the subcategorization frame of the matrix verb. In short, the parser ignores semantic information and allows implausible attachments but adheres to information about which type of object a verb can take, ensuring that the parser does not make impossible attachments. We argue that the evidence supports a syntactic parser informed by syntactic cues, rather than one guided by semantic cues or one that is blind, or completely autonomous.

  12. ANTLR Tree Grammar Generator and Extensions

    NASA Technical Reports Server (NTRS)

    Craymer, Loring

    2005-01-01

    A computer program implements two extensions of ANTLR (Another Tool for Language Recognition), which is a set of software tools for translating source codes between different computing languages. ANTLR supports predicated- LL(k) lexer and parser grammars, a notation for annotating parser grammars to direct tree construction, and predicated tree grammars. [ LL(k) signifies left-right, leftmost derivation with k tokens of look-ahead, referring to certain characteristics of a grammar.] One of the extensions is a syntax for tree transformations. The other extension is the generation of tree grammars from annotated parser or input tree grammars. These extensions can simplify the process of generating source-to-source language translators and they make possible an approach, called "polyphase parsing," to translation between computing languages. The typical approach to translator development is to identify high-level semantic constructs such as "expressions," "declarations," and "definitions" as fundamental building blocks in the grammar specification used for language recognition. The polyphase approach is to lump ambiguous syntactic constructs during parsing and then disambiguate the alternatives in subsequent tree transformation passes. Polyphase parsing is believed to be useful for generating efficient recognizers for C++ and other languages that, like C++, have significant ambiguities.

  13. Sorry Dave, I’m Afraid I Can’t Do That: Explaining Unachievable Robot Tasks using Natural Language

    DTIC Science & Technology

    2013-06-24

    processing components used by Brooks et al. [6]: the Bikel parser [3] combined with the null element (understood subject) restoration of Gabbard et al...Intelligent Robots and Systems (IROS), pages 1988 – 1993, 2010. [12] Ryan Gabbard , Mitch Marcus, and Seth Kulick. Fully parsing the Penn Treebank. In Human

  14. A study of the transferability of influenza case detection systems between two large healthcare systems

    PubMed Central

    Wagner, Michael M.; Cooper, Gregory F.; Ferraro, Jeffrey P.; Su, Howard; Gesteland, Per H.; Haug, Peter J.; Millett, Nicholas E.; Aronis, John M.; Nowalk, Andrew J.; Ruiz, Victor M.; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang

    2017-01-01

    Objectives This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser. PMID:28380048

  15. A study of the transferability of influenza case detection systems between two large healthcare systems.

    PubMed

    Ye, Ye; Wagner, Michael M; Cooper, Gregory F; Ferraro, Jeffrey P; Su, Howard; Gesteland, Per H; Haug, Peter J; Millett, Nicholas E; Aronis, John M; Nowalk, Andrew J; Ruiz, Victor M; López Pineda, Arturo; Shi, Lingyun; Van Bree, Rudy; Ginter, Thomas; Tsui, Fuchiang

    2017-01-01

    This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients' diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution's cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser.

  16. Designing a Constraint Based Parser for Sanskrit

    NASA Astrophysics Data System (ADS)

    Kulkarni, Amba; Pokar, Sheetal; Shukl, Devanand

    Verbal understanding (śā bdabodha) of any utterance requires the knowledge of how words in that utterance are related to each other. Such knowledge is usually available in the form of cognition of grammatical relations. Generative grammars describe how a language codes these relations. Thus the knowledge of what information various grammatical relations convey is available from the generation point of view and not the analysis point of view. In order to develop a parser based on any grammar one should then know precisely the semantic content of the grammatical relations expressed in a language string, the clues for extracting these relations and finally whether these relations are expressed explicitly or implicitly. Based on the design principles that emerge from this knowledge, we model the parser as finding a directed Tree, given a graph with nodes representing the words and edges representing the possible relations between them. Further, we also use the Mīmā ṃsā constraint of ākā ṅkṣā (expectancy) to rule out non-solutions and sannidhi (proximity) to prioritize the solutions. We have implemented a parser based on these principles and its performance was found to be satisfactory giving us a confidence to extend its functionality to handle the complex sentences.

  17. Natural Language Sourcebook

    DTIC Science & Technology

    1990-01-01

    Identification of Syntactic Units Exemplar I.A. (#l) Problem (1) The tough coach the young. (2) The tough coach married a star. (3) The tough coach married ...34the tough" vs. "the tough coach" and (b) "people" vs. " married people." The problem could also be considered a problem of determining lexical...and " married " in example (2). Once the parser specifies a verb, the structure of the rest of the sentence is determined: specifying "coach" as a

  18. Errors and Intelligence in Computer-Assisted Language Learning: Parsers and Pedagogues. Routledge Studies in Computer Assisted Language Learning

    ERIC Educational Resources Information Center

    Heift, Trude; Schulze, Mathias

    2012-01-01

    This book provides the first comprehensive overview of theoretical issues, historical developments and current trends in ICALL (Intelligent Computer-Assisted Language Learning). It assumes a basic familiarity with Second Language Acquisition (SLA) theory and teaching, CALL and linguistics. It is of interest to upper undergraduate and/or graduate…

  19. Predicting complex syntactic structure in real time: Processing of negative sentences in Russian.

    PubMed

    Kazanina, Nina

    2017-11-01

    In Russian negative sentences the verb's direct object may appear either in the accusative case, which is licensed by the verb (as is common cross-linguistically), or in the genitive case, which is licensed by the negation (Russian-specific "genitive-of-negation" phenomenon). Such sentences were used to investigate whether case marking is employed for anticipating syntactic structure, and whether lexical heads other than the verb can be predicted on the basis of a case-marked noun phrase. Experiment 1, a completion task, confirmed that genitive-of-negation is part of Russian speakers' active grammatical repertoire. In Experiments 2 and 3, the genitive/accusative case manipulation on the preverbal object led to shorter reading times at the negation and verb in the genitive versus accusative condition. Furthermore, Experiment 3 manipulated linear order of the direct object and the negated verb in order to distinguish whether the abovementioned facilitatory effect was predictive or integrative in nature, and concluded that the parser actively predicts a verb and (otherwise optional) negation on the basis of a preceding genitive-marked object. Similarly to a head-final language, case-marking information on preverbal noun phrases (NPs) is used by the parser to enable incremental structure building in a free-word-order language such as Russian.

  20. Natural-Language Parser for PBEM

    NASA Technical Reports Server (NTRS)

    James, Mark

    2010-01-01

    A computer program called "Hunter" accepts, as input, a colloquial-English description of a set of policy-based-management rules, and parses that description into a form useable by policy-based enterprise management (PBEM) software. PBEM is a rules-based approach suitable for automating some management tasks. PBEM simplifies the management of a given enterprise through establishment of policies addressing situations that are likely to occur. Hunter was developed to have a unique capability to extract the intended meaning instead of focusing on parsing the exact ways in which individual words are used.

  1. Grammar as a Programming Language. Artificial Intelligence Memo 391.

    ERIC Educational Resources Information Center

    Rowe, Neil

    Student projects that involve writing generative grammars in the computer language, "LOGO," are described in this paper, which presents a grammar-running control structure that allows students to modify and improve the grammar interpreter itself while learning how a simple kind of computer parser works. Included are procedures for…

  2. The time course of syntactic activation during language processing: a model based on neuropsychological and neurophysiological data.

    PubMed

    Friederici, A D

    1995-09-01

    This paper presents a model describing the temporal and neurotopological structure of syntactic processes during comprehension. It postulates three distinct phases of language comprehension, two of which are primarily syntactic in nature. During the first phase the parser assigns the initial syntactic structure on the basis of word category information. These early structural processes are assumed to be subserved by the anterior parts of the left hemisphere, as event-related brain potentials show this area to be maximally activated when phrase structure violations are processed and as circumscribed lesions in this area lead to an impairment of the on-line structural assignment. During the second phase lexical-semantic and verb-argument structure information is processed. This phase is neurophysiologically manifest in a negative component in the event-related brain potential around 400 ms after stimulus onset which is distributed over the left and right temporo-parietal areas when lexical-semantic information is processed and over left anterior areas when verb-argument structure information is processed. During the third phase the parser tries to map the initial syntactic structure onto the available lexical-semantic and verb-argument structure information. In case of an unsuccessful match between the two types of information reanalyses may become necessary. These processes of structural reanalysis are correlated with a centroparietally distributed late positive component in the event-related brain potential.(ABSTRACT TRUNCATED AT 250 WORDS)

  3. Two models of minimalist, incremental syntactic analysis.

    PubMed

    Stabler, Edward P

    2013-07-01

    Minimalist grammars (MGs) and multiple context-free grammars (MCFGs) are weakly equivalent in the sense that they define the same languages, a large mildly context-sensitive class that properly includes context-free languages. But in addition, for each MG, there is an MCFG which is strongly equivalent in the sense that it defines the same language with isomorphic derivations. However, the structure-building rules of MGs but not MCFGs are defined in a way that generalizes across categories. Consequently, MGs can be exponentially more succinct than their MCFG equivalents, and this difference shows in parsing models too. An incremental, top-down beam parser for MGs is defined here, sound and complete for all MGs, and hence also capable of parsing all MCFG languages. But since the parser represents its grammar transparently, the relative succinctness of MGs is again evident. Although the determinants of MG structure are narrowly and discretely defined, probabilistic influences from a much broader domain can influence even the earliest analytic steps, allowing frequency and context effects to come early and from almost anywhere, as expected in incremental models. Copyright © 2013 Cognitive Science Society, Inc.

  4. Representing Information in Patient Reports Using Natural Language Processing and the Extensible Markup Language

    PubMed Central

    Friedman, Carol; Hripcsak, George; Shagina, Lyuda; Liu, Hongfang

    1999-01-01

    Objective: To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. Methods: A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP) was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. Results: The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. Conclusions: Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available. PMID:9925230

  5. Automatic Parsing of Parental Verbal Input

    PubMed Central

    Sagae, Kenji; MacWhinney, Brian; Lavie, Alon

    2006-01-01

    To evaluate theoretical proposals regarding the course of child language acquisition, researchers often need to rely on the processing of large numbers of syntactically parsed utterances, both from children and their parents. Because it is so difficult to do this by hand, there are currently no parsed corpora of child language input data. To automate this process, we developed a system that combined the MOR tagger, a rule-based parser, and statistical disambiguation techniques. The resultant system obtained nearly 80% correct parses for the sentences spoken to children. To achieve this level, we had to construct a particular processing sequence that minimizes problems caused by the coverage/ambiguity trade-off in parser design. These procedures are particularly appropriate for use with the CHILDES database, an international corpus of transcripts. The data and programs are now freely available over the Internet. PMID:15190707

  6. A translator writing system for microcomputer high-level languages and assemblers

    NASA Technical Reports Server (NTRS)

    Collins, W. R.; Knight, J. C.; Noonan, R. E.

    1980-01-01

    In order to implement high level languages whenever possible, a translator writing system of advanced design was developed. It is intended for routine production use by many programmers working on different projects. As well as a fairly conventional parser generator, it includes a system for the rapid generation of table driven code generators. The parser generator was developed from a prototype version. The translator writing system includes various tools for the management of the source text of a compiler under construction. In addition, it supplies various default source code sections so that its output is always compilable and executable. The system thereby encourages iterative enhancement as a development methodology by ensuring an executable program from the earliest stages of a compiler development project. The translator writing system includes PASCAL/48 compiler, three assemblers, and two compilers for a subset of HAL/S.

  7. Learning to Understand Natural Language with Less Human Effort

    DTIC Science & Technology

    2015-05-01

    j ); if one of these has the correct logical form, ` j = `i, then tj is taken as the approximate maximizer. 29 2.3 Discussion This chapter...where j indexes entity tuples (e1, e2). Training optimizes the semantic parser parameters θ to predict Y = yj,Z = zj given S = sj . The parameters θ...be au tif ul / J J N 1 /N 1 λ f .f L on do n /N N P N λ x .M (x ,“ lo nd on ”, C IT Y ) N : λ x .M (x ,“ lo nd on ”, C IT Y ) (S [d cl ]\\N

  8. A Risk Assessment System with Automatic Extraction of Event Types

    NASA Astrophysics Data System (ADS)

    Capet, Philippe; Delavallade, Thomas; Nakamura, Takuya; Sandor, Agnes; Tarsitano, Cedric; Voyatzi, Stavroula

    In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon-Grammar). Its role is to automatically feed the leaves with input data.

  9. "gnparser": a powerful parser for scientific names based on Parsing Expression Grammar.

    PubMed

    Mozzherin, Dmitry Y; Myltsev, Alexander A; Patterson, David J

    2017-05-26

    Scientific names in biology act as universal links. They allow us to cross-reference information about organisms globally. However variations in spelling of scientific names greatly diminish their ability to interconnect data. Such variations may include abbreviations, annotations, misspellings, etc. Authorship is a part of a scientific name and may also differ significantly. To match all possible variations of a name we need to divide them into their elements and classify each element according to its role. We refer to this as 'parsing' the name. Parsing categorizes name's elements into those that are stable and those that are prone to change. Names are matched first by combining them according to their stable elements. Matches are then refined by examining their varying elements. This two stage process dramatically improves the number and quality of matches. It is especially useful for the automatic data exchange within the context of "Big Data" in biology. We introduce Global Names Parser (gnparser). It is a Java tool written in Scala language (a language for Java Virtual Machine) to parse scientific names. It is based on a Parsing Expression Grammar. The parser can be applied to scientific names of any complexity. It assigns a semantic meaning (such as genus name, species epithet, rank, year of publication, authorship, annotations, etc.) to all elements of a name. It is able to work with nested structures as in the names of hybrids. gnparser performs with ≈99% accuracy and processes 30 million name-strings/hour per CPU thread. The gnparser library is compatible with Scala, Java, R, Jython, and JRuby. The parser can be used as a command line application, as a socket server, a web-app or as a RESTful HTTP-service. It is released under an Open source MIT license. Global Names Parser (gnparser) is a fast, high precision tool for biodiversity informaticians and biologists working with large numbers of scientific names. It can replace expensive and error-prone manual parsing and standardization of scientific names in many situations, and can quickly enhance the interoperability of distributed biological information.

  10. Parser Combinators: a Practical Application for Generating Parsers for NMR Data

    PubMed Central

    Fenwick, Matthew; Weatherby, Gerard; Ellis, Heidi JC; Gryk, Michael R.

    2013-01-01

    Nuclear Magnetic Resonance (NMR) spectroscopy is a technique for acquiring protein data at atomic resolution and determining the three-dimensional structure of large protein molecules. A typical structure determination process results in the deposition of a large data sets to the BMRB (Bio-Magnetic Resonance Data Bank). This data is stored and shared in a file format called NMR-Star. This format is syntactically and semantically complex making it challenging to parse. Nevertheless, parsing these files is crucial to applying the vast amounts of biological information stored in NMR-Star files, allowing researchers to harness the results of previous studies to direct and validate future work. One powerful approach for parsing files is to apply a Backus-Naur Form (BNF) grammar, which is a high-level model of a file format. Translation of the grammatical model to an executable parser may be automatically accomplished. This paper will show how we applied a model BNF grammar of the NMR-Star format to create a free, open-source parser, using a method that originated in the functional programming world known as “parser combinators”. This paper demonstrates the effectiveness of a principled approach to file specification and parsing. This paper also builds upon our previous work [1], in that 1) it applies concepts from Functional Programming (which is relevant even though the implementation language, Java, is more mainstream than Functional Programming), and 2) all work and accomplishments from this project will be made available under standard open source licenses to provide the community with the opportunity to learn from our techniques and methods. PMID:24352525

  11. DBPQL: A view-oriented query language for the Intel Data Base Processor

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1983-01-01

    An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.

  12. Building pathway graphs from BioPAX data in R.

    PubMed

    Benis, Nirupama; Schokker, Dirkjan; Kramer, Frank; Smits, Mari A; Suarez-Diez, Maria

    2016-01-01

    Biological pathways are increasingly available in the BioPAX format which uses an RDF model for data storage. One can retrieve the information in this data model in the scripting language R using the package rBiopaxParser , which converts the BioPAX format to one readable in R. It also has a function to build a regulatory network from the pathway information. Here we describe an extension of this function. The new function allows the user to build graphs of entire pathways, including regulated as well as non-regulated elements, and therefore provides a maximum of information. This function is available as part of the rBiopaxParser distribution from Bioconductor.

  13. Solving LR Conflicts Through Context Aware Scanning

    NASA Astrophysics Data System (ADS)

    Leon, C. Rodriguez; Forte, L. Garcia

    2011-09-01

    This paper presents a new algorithm to compute the exact list of tokens expected by any LR syntax analyzer at any point of the scanning process. The lexer can, at any time, compute the exact list of valid tokens to return only tokens in this set. In the case than more than one matching token is in the valid set, the lexer can resort to a nested LR parser to disambiguate. Allowing nested LR parsing requires some slight modifications when building the LR parsing tables. We also show how LR parsers can parse conflictive and inherently ambiguous languages using a combination of nested parsing and context aware scanning. These expanded lexical analyzers can be generated from high level specifications.

  14. Integrated verification and testing system (IVTS) for HAL/S programs

    NASA Technical Reports Server (NTRS)

    Senn, E. H.; Ames, K. R.; Smith, K. A.

    1983-01-01

    The IVTS is a large software system designed to support user-controlled verification analysis and testing activities for programs written in the HAL/S language. The system is composed of a user interface and user command language, analysis tools and an organized data base of host system files. The analysis tools are of four major types: (1) static analysis, (2) symbolic execution, (3) dynamic analysis (testing), and (4) documentation enhancement. The IVTS requires a split HAL/S compiler, divided at the natural separation point between the parser/lexical analyzer phase and the target machine code generator phase. The IVTS uses the internal program form (HALMAT) between these two phases as primary input for the analysis tools. The dynamic analysis component requires some way to 'execute' the object HAL/S program. The execution medium may be an interpretive simulation or an actual host or target machine.

  15. Multimedia CALLware: The Developer's Responsibility.

    ERIC Educational Resources Information Center

    Dodigovic, Marina

    The early computer-assisted-language-learning (CALL) programs were silent and mostly limited to screen or printer supported written text as the prevailing communication resource. The advent of powerful graphics, sound and video combined with AI-based parsers and sound recognition devices gradually turned the computer into a rather anthropomorphic…

  16. MetaJC++: A flexible and automatic program transformation technique using meta framework

    NASA Astrophysics Data System (ADS)

    Beevi, Nadera S.; Reghu, M.; Chitraprasad, D.; Vinodchandra, S. S.

    2014-09-01

    Compiler is a tool to translate abstract code containing natural language terms to machine code. Meta compilers are available to compile more than one languages. We have developed a meta framework intends to combine two dissimilar programming languages, namely C++ and Java to provide a flexible object oriented programming platform for the user. Suitable constructs from both the languages have been combined, thereby forming a new and stronger Meta-Language. The framework is developed using the compiler writing tools, Flex and Yacc to design the front end of the compiler. The lexer and parser have been developed to accommodate the complete keyword set and syntax set of both the languages. Two intermediate representations have been used in between the translation of the source program to machine code. Abstract Syntax Tree has been used as a high level intermediate representation that preserves the hierarchical properties of the source program. A new machine-independent stack-based byte-code has also been devised to act as a low level intermediate representation. The byte-code is essentially organised into an output class file that can be used to produce an interpreted output. The results especially in the spheres of providing C++ concepts in Java have given an insight regarding the potential strong features of the resultant meta-language.

  17. The Unification Space implemented as a localist neural net: predictions and error-tolerance in a constraint-based parser.

    PubMed

    Vosse, Theo; Kempen, Gerard

    2009-12-01

    We introduce a novel computer implementation of the Unification-Space parser (Vosse and Kempen in Cognition 75:105-143, 2000) in the form of a localist neural network whose dynamics is based on interactive activation and inhibition. The wiring of the network is determined by Performance Grammar (Kempen and Harbusch in Verb constructions in German and Dutch. Benjamins, Amsterdam, 2003), a lexicalist formalism with feature unification as binding operation. While the network is processing input word strings incrementally, the evolving shape of parse trees is represented in the form of changing patterns of activation in nodes that code for syntactic properties of words and phrases, and for the grammatical functions they fulfill. The system is capable, at least qualitatively and rudimentarily, of simulating several important dynamic aspects of human syntactic parsing, including garden-path phenomena and reanalysis, effects of complexity (various types of clause embeddings), fault-tolerance in case of unification failures and unknown words, and predictive parsing (expectation-based analysis, surprisal effects). English is the target language of the parser described.

  18. Saying What You're Looking For: Linguistics Meets Video Search.

    PubMed

    Barrett, Daniel Paul; Barbu, Andrei; Siddharth, N; Siskind, Jeffrey Mark

    2016-10-01

    We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse versus The horse rode the person. Given a sentential query and a natural-language parser, we produce a score indicating how well a video clip depicts that sentence for each clip in a corpus and return a ranked list of clips. Two fundamental problems are addressed simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, our approach uses the sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While most earlier work was limited to single-word queries which correspond to either verbs or nouns, we search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 2,627 naturally elicited sentential queries in 10 Hollywood movies.

  19. Syntactic dependency parsers for biomedical-NLP.

    PubMed

    Cohen, Raphael; Elhadad, Michael

    2012-01-01

    Syntactic parsers have made a leap in accuracy and speed in recent years. The high order structural information provided by dependency parsers is useful for a variety of NLP applications. We present a biomedical model for the EasyFirst parser, a fast and accurate parser for creating Stanford Dependencies. We evaluate the models trained in the biomedical domains of EasyFirst and Clear-Parser in a number of task oriented metrics. Both parsers provide stat of the art speed and accuracy in the Genia of over 89%. We show that Clear-Parser excels at tasks relating to negation identification while EasyFirst excels at tasks relating to Named Entities and is more robust to changes in domain.

  20. Development of clinical contents model markup language for electronic health records.

    PubMed

    Yun, Ji-Hyun; Ahn, Sun-Ju; Kim, Yoon

    2012-09-01

    To develop dedicated markup language for clinical contents models (CCM) to facilitate the active use of CCM in electronic health record systems. Based on analysis of the structure and characteristics of CCM in the clinical domain, we designed extensible markup language (XML) based CCM markup language (CCML) schema manually. CCML faithfully reflects CCM in both the syntactic and semantic aspects. As this language is based on XML, it can be expressed and processed in computer systems and can be used in a technology-neutral way. CCML HAS THE FOLLOWING STRENGTHS: it is machine-readable and highly human-readable, it does not require a dedicated parser, and it can be applied for existing electronic health record systems.

  1. Towards comprehensive syntactic and semantic annotations of the clinical narrative

    PubMed Central

    Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F; Warner, Colin; Hwang, Jena D; Choi, Jinho D; Dligach, Dmitriy; Nielsen, Rodney D; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana K

    2013-01-01

    Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components. Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed. Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations. Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible. PMID:23355458

  2. Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features

    PubMed Central

    Zhang, Yaoyun; Jiang, Min; Wang, Jingqi; Xu, Hua

    2016-01-01

    Semantic role labeling (SRL), which extracts shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding clinical narratives. Since semantic roles are formed by syntactic constituents in the sentence, an effective parser, as well as an effective syntactic feature set are essential to build a practical SRL system. Our study initiates a formal evaluation and comparison of SRL performance on a clinical text corpus MiPACQ, using three state-of-the-art parsers, the Stanford parser, the Berkeley parser, and the Charniak parser. First, the original parsers trained on the open domain syntactic corpus Penn Treebank were employed. Next, those parsers were retrained on the clinical Treebank of MiPACQ for further comparison. Additionally, state-of-the-art syntactic features from open domain SRL were also examined for clinical text. Experimental results showed that retraining the parsers on clinical Treebank improved the performance significantly, with an optimal F1 measure of 71.41% achieved by the Berkeley parser. PMID:28269926

  3. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

    PubMed Central

    Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G

    2010-01-01

    We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies—the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text. PMID:20819853

  4. Development of Clinical Contents Model Markup Language for Electronic Health Records

    PubMed Central

    Yun, Ji-Hyun; Kim, Yoon

    2012-01-01

    Objectives To develop dedicated markup language for clinical contents models (CCM) to facilitate the active use of CCM in electronic health record systems. Methods Based on analysis of the structure and characteristics of CCM in the clinical domain, we designed extensible markup language (XML) based CCM markup language (CCML) schema manually. Results CCML faithfully reflects CCM in both the syntactic and semantic aspects. As this language is based on XML, it can be expressed and processed in computer systems and can be used in a technology-neutral way. Conclusions CCML has the following strengths: it is machine-readable and highly human-readable, it does not require a dedicated parser, and it can be applied for existing electronic health record systems. PMID:23115739

  5. The neurobiology of syntax: beyond string sets.

    PubMed

    Petersson, Karl Magnus; Hagoort, Peter

    2012-07-19

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty.

  6. The neurobiology of syntax: beyond string sets

    PubMed Central

    Petersson, Karl Magnus; Hagoort, Peter

    2012-01-01

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty. PMID:22688633

  7. An error-resistant linguistic protocol for air traffic control

    NASA Technical Reports Server (NTRS)

    Cushing, Steven

    1989-01-01

    The research results described here are intended to enhance the effectiveness of the DATALINK interface that is scheduled by the Federal Aviation Administration (FAA) to be deployed during the 1990's to improve the safety of various aspects of aviation. While voice has a natural appeal as the preferred means of communication both among humans themselves and between humans and machines as the form of communication that people find most convenient, the complexity and flexibility of natural language are problematic, because of the confusions and misunderstandings that can arise as a result of ambiguity, unclear reference, intonation peculiarities, implicit inference, and presupposition. The DATALINK interface will avoid many of these problems by replacing voice with vision and speech with written instructions. This report describes results achieved to date on an on-going research effort to refine the protocol of the DATALINK system so as to avoid many of the linguistic problems that still remain in the visual mode. In particular, a working prototype DATALINK simulator system has been developed consisting of an unambiguous, context-free grammar and parser, based on the current air-traffic-control language and incorporated into a visual display involving simulated touch-screen buttons and three levels of menu screens. The system is written in the C programming language and runs on the Macintosh II computer. After reviewing work already done on the project, new tasks for further development are described.

  8. Exploiting multiple sources of information in learning an artificial language: human data and modeling.

    PubMed

    Perruchet, Pierre; Tillmann, Barbara

    2010-03-01

    This study investigates the joint influences of three factors on the discovery of new word-like units in a continuous artificial speech stream: the statistical structure of the ongoing input, the initial word-likeness of parts of the speech flow, and the contextual information provided by the earlier emergence of other word-like units. Results of an experiment conducted with adult participants show that these sources of information have strong and interactive influences on word discovery. The authors then examine the ability of different models of word segmentation to account for these results. PARSER (Perruchet & Vinter, 1998) is compared to the view that word segmentation relies on the exploitation of transitional probabilities between successive syllables, and with the models based on the Minimum Description Length principle, such as INCDROP. The authors submit arguments suggesting that PARSER has the advantage of accounting for the whole pattern of data without ad-hoc modifications, while relying exclusively on general-purpose learning principles. This study strengthens the growing notion that nonspecific cognitive processes, mainly based on associative learning and memory principles, are able to account for a larger part of early language acquisition than previously assumed. Copyright © 2009 Cognitive Science Society, Inc.

  9. Lexical and sublexical units in speech perception.

    PubMed

    Giroux, Ibrahima; Rey, Arnaud

    2009-03-01

    Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vinter, 1998 in an experiment where we compare the lexical and sublexical recognition performance of adults after hearing 2 or 10 min of an artificial spoken language. The results are consistent with Parser's predictions and the clustering approach, showing that performance on words is better than performance on part-words only after 10 min. This result suggests that word segmentation abilities are not merely due to stronger associations between sublexical units but to the emergence of stronger lexical representations during the development of speech perception processes. Copyright © 2009, Cognitive Science Society, Inc.

  10. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  11. Linking Semantic and Knowledge Representations in a Multi-Domain Dialogue System

    DTIC Science & Technology

    2007-06-01

    accuracy evaluation presented in the next section shows that the generic version of the grammar performs similarly well on two evaluation domains...of extra insertions; for example, discourse adverbials such as now were inserted if present in the lattice. In addition, different tense and pronoun...automatic lexicon specialization technique improves parser speed and accuracy. 1 Introduction This paper presents an architecture of a language

  12. Using a CLIPS expert system to automatically manage TCP/IP networks and their components

    NASA Technical Reports Server (NTRS)

    Faul, Ben M.

    1991-01-01

    A expert system that can directly manage networks components on a Transmission Control Protocol/Internet Protocol (TCP/IP) network is described. Previous expert systems for managing networks have focused on managing network faults after they occur. However, this proactive expert system can monitor and control network components in near real time. The ability to directly manage network elements from the C Language Integrated Production System (CLIPS) is accomplished by the integration of the Simple Network Management Protocol (SNMP) and a Abstract Syntax Notation (ASN) parser into the CLIPS artificial intelligence language.

  13. Identifying the null subject: evidence from event-related brain potentials.

    PubMed

    Demestre, J; Meltzer, S; García-Albea, J E; Vigil, A

    1999-05-01

    Event-related brain potentials (ERPs) were recorded during spoken language comprehension to study the on-line effects of gender agreement violations in controlled infinitival complements. Spanish sentences were constructed in which the complement clause contained a predicate adjective marked for syntactic gender. By manipulating the gender of the antecedent (i.e., the controller) of the implicit subject while holding constant the gender of the adjective, pairs of grammatical and ungrammatical sentences were created. The detection of such a gender agreement violation would indicate that the parser had established the coreference relation between the null subject and its antecedent. The results showed a complex biphasic ERP (i.e., an early negativity with prominence at anterior and central sites, followed by a centroparietal positivity) in the violating condition as compared to the non-violating conditions. The brain reacts to NP-adjective gender agreement violations within a few hundred milliseconds of their occurrence. The data imply that the parser has properly coindexed the null subject of an infinitive clause with its antecedent.

  14. Speed up of XML parsers with PHP language implementation

    NASA Astrophysics Data System (ADS)

    Georgiev, Bozhidar; Georgieva, Adriana

    2012-11-01

    In this paper, authors introduce PHP5's XML implementation and show how to read, parse, and write a short and uncomplicated XML file using Simple XML in a PHP environment. The possibilities for mutual work of PHP5 language and XML standard are described. The details of parsing process with Simple XML are also cleared. A practical project PHP-XML-MySQL presents the advantages of XML implementation in PHP modules. This approach allows comparatively simple search of XML hierarchical data by means of PHP software tools. The proposed project includes database, which can be extended with new data and new XML parsing functions.

  15. GBParsy: a GenBank flatfile parser library with high speed.

    PubMed

    Lee, Tae-Ho; Kim, Yeon-Ki; Nahm, Baek Hie

    2008-07-25

    GenBank flatfile (GBF) format is one of the most popular sequence file formats because of its detailed sequence features and ease of readability. To use the data in the file by a computer, a parsing process is required and is performed according to a given grammar for the sequence and the description in a GBF. Currently, several parser libraries for the GBF have been developed. However, with the accumulation of DNA sequence information from eukaryotic chromosomes, parsing a eukaryotic genome sequence with these libraries inevitably takes a long time, due to the large GBF file and its correspondingly large genomic nucleotide sequence and related feature information. Thus, there is significant need to develop a parsing program with high speed and efficient use of system memory. We developed a library, GBParsy, which was C language-based and parses GBF files. The parsing speed was maximized by using content-specified functions in place of regular expressions that are flexible but slow. In addition, we optimized an algorithm related to memory usage so that it also increased parsing performance and efficiency of memory usage. GBParsy is at least 5-100x faster than current parsers in benchmark tests. GBParsy is estimated to extract annotated information from almost 100 Mb of a GenBank flatfile for chromosomal sequence information within a second. Thus, it should be used for a variety of applications such as on-time visualization of a genome at a web site.

  16. L3 Interactive Data Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hohn, Michael; Adams, Paul

    2006-09-05

    The L3 system is a computational steering environment for image processing and scientific computing. It consists of an interactive graphical language and interface. Its purpose is to help advanced users in controlling their computational software and assist in the management of data accumulated during numerical experiments. L3 provides a combination of features not found in other environments; these are: - textual and graphical construction of programs - persistence of programs and associated data - direct mapping between the scripts, the parameters, and the produced data - implicit hierarchial data organization - full programmability, including conditionals and functions - incremental executionmore » of programs The software includes the l3 language and the graphical environment. The language is a single-assignment functional language; the implementation consists of lexer, parser, interpreter, storage handler, and editing support, The graphical environment is an event-driven nested list viewer/editor providing graphical elements corresponding to the language. These elements are both the represenation of a users program and active interfaces to the values computed by that program.« less

  17. An automatic indexing method for medical documents.

    PubMed Central

    Wagner, M. M.

    1991-01-01

    This paper describes MetaIndex, an automatic indexing program that creates symbolic representations of documents for the purpose of document retrieval. MetaIndex uses a simple transition network parser to recognize a language that is derived from the set of main concepts in the Unified Medical Language System Metathesaurus (Meta-1). MetaIndex uses a hierarchy of medical concepts, also derived from Meta-1, to represent the content of documents. The goal of this approach is to improve document retrieval performance by better representation of documents. An evaluation method is described, and the performance of MetaIndex on the task of indexing the Slice of Life medical image collection is reported. PMID:1807564

  18. A Protocol for Annotating Parser Differences. Research Report. ETS RR-16-02

    ERIC Educational Resources Information Center

    Bruno, James V.; Cahill, Aoife; Gyawali, Binod

    2016-01-01

    We present an annotation scheme for classifying differences in the outputs of syntactic constituency parsers when a gold standard is unavailable or undesired, as in the case of texts written by nonnative speakers of English. We discuss its automated implementation and the results of a case study that uses the scheme to choose a parser best suited…

  19. Processing of ICARTT Data Files Using Fuzzy Matching and Parser Combinators

    NASA Technical Reports Server (NTRS)

    Rutherford, Matthew T.; Typanski, Nathan D.; Wang, Dali; Chen, Gao

    2014-01-01

    In this paper, the task of parsing and matching inconsistent, poorly formed text data through the use of parser combinators and fuzzy matching is discussed. An object-oriented implementation of the parser combinator technique is used to allow for a relatively simple interface for adapting base parsers. For matching tasks, a fuzzy matching algorithm with Levenshtein distance calculations is implemented to match string pair, which are otherwise difficult to match due to the aforementioned irregularities and errors in one or both pair members. Used in concert, the two techniques allow parsing and matching operations to be performed which had previously only been done manually.

  20. Using Medical Text Extraction, Reasoning and Mapping System (MTERMS) to Process Medication Information in Outpatient Clinical Notes

    PubMed Central

    Zhou, Li; Plasek, Joseph M; Mahoney, Lisa M; Karipineni, Neelima; Chang, Frank; Yan, Xuemin; Chang, Fenny; Dimaggio, Dana; Goldman, Debora S.; Rocha, Roberto A.

    2011-01-01

    Clinical information is often coded using different terminologies, and therefore is not interoperable. Our goal is to develop a general natural language processing (NLP) system, called Medical Text Extraction, Reasoning and Mapping System (MTERMS), which encodes clinical text using different terminologies and simultaneously establishes dynamic mappings between them. MTERMS applies a modular, pipeline approach flowing from a preprocessor, semantic tagger, terminology mapper, context analyzer, and parser to structure inputted clinical notes. Evaluators manually reviewed 30 free-text and 10 structured outpatient clinical notes compared to MTERMS output. MTERMS achieved an overall F-measure of 90.6 and 94.0 for free-text and structured notes respectively for medication and temporal information. The local medication terminology had 83.0% coverage compared to RxNorm’s 98.0% coverage for free-text notes. 61.6% of mappings between the terminologies are exact match. Capture of duration was significantly improved (91.7% vs. 52.5%) from systems in the third i2b2 challenge. PMID:22195230

  1. Multi-lingual search engine to access PubMed monolingual subsets: a feasibility study.

    PubMed

    Darmoni, Stéfan J; Soualmia, Lina F; Griffon, Nicolas; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse

    2013-01-01

    PubMed contains many articles in languages other than English but it is difficult to find them using the English version of the Medical Subject Headings (MeSH) Thesaurus. The aim of this work is to propose a tool allowing access to a PubMed subset in one language, and to evaluate its performance. Translations of MeSH were enriched and gathered in the information system. PubMed subsets in main European languages were also added in our database, using a dedicated parser. The CISMeF generic semantic search engine was evaluated on the response time for simple queries. MeSH descriptors are currently available in 11 languages in the information system. All the 654,000 PubMed citations in French were integrated into CISMeF database. None of the response times exceed the threshold defined for usability (2 seconds). It is now possible to freely access biomedical literature in French using a tool in French; health professionals and lay people with a low English language may find it useful. It will be expended to several European languages: German, Spanish, Norwegian and Portuguese.

  2. PDB file parser and structure class implemented in Python.

    PubMed

    Hamelryck, Thomas; Manderick, Bernard

    2003-11-22

    The biopython project provides a set of bioinformatics tools implemented in Python. Recently, biopython was extended with a set of modules that deal with macromolecular structure. Biopython now contains a parser for PDB files that makes the atomic information available in an easy-to-use but powerful data structure. The parser and data structure deal with features that are often left out or handled inadequately by other packages, e.g. atom and residue disorder (if point mutants are present in the crystal), anisotropic B factors, multiple models and insertion codes. In addition, the parser performs some sanity checking to detect obvious errors. The Biopython distribution (including source code and documentation) is freely available (under the Biopython license) from http://www.biopython.org

  3. jmzML, an open-source Java API for mzML, the PSI standard for MS data.

    PubMed

    Côté, Richard G; Reisinger, Florian; Martens, Lennart

    2010-04-01

    We here present jmzML, a Java API for the Proteomics Standards Initiative mzML data standard. Based on the Java Architecture for XML Binding and XPath-based XML indexer random-access XML parser, jmzML can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mzML files using the Java programming language. jmzML also automatically resolves internal XML references on-the-fly. The library (which includes a viewer) can be downloaded from http://jmzml.googlecode.com.

  4. Modular implementation of a digital hardware design automation system

    NASA Astrophysics Data System (ADS)

    Masud, M.

    An automation system based on AHPL (A Hardware Programming Language) was developed. The project may be divided into three distinct phases: (1) Upgrading of AHPL to make it more universally applicable; (2) Implementation of a compiler for the language; and (3) illustration of how the compiler may be used to support several phases of design activities. Several new features were added to AHPL. These include: application-dependent parameters, mutliple clocks, asynchronous results, functional registers and primitive functions. The new language, called Universal AHPL, has been defined rigorously. The compiler design is modular. The parsing is done by an automatic parser generated from the SLR(1)BNF grammar of the language. The compiler produces two data bases from the AHPL description of a circuit. The first one is a tabular representation of the circuit, and the second one is a detailed interconnection linked list. The two data bases provide a means to interface the compiler to application-dependent CAD systems.

  5. GazeParser: an open-source and multiplatform library for low-cost eye tracking and analysis.

    PubMed

    Sogo, Hiroyuki

    2013-09-01

    Eye movement analysis is an effective method for research on visual perception and cognition. However, recordings of eye movements present practical difficulties related to the cost of the recording devices and the programming of device controls for use in experiments. GazeParser is an open-source library for low-cost eye tracking and data analysis; it consists of a video-based eyetracker and libraries for data recording and analysis. The libraries are written in Python and can be used in conjunction with PsychoPy and VisionEgg experimental control libraries. Three eye movement experiments are reported on performance tests of GazeParser. These showed that the means and standard deviations for errors in sampling intervals were less than 1 ms. Spatial accuracy ranged from 0.7° to 1.2°, depending on participant. In gap/overlap tasks and antisaccade tasks, the latency and amplitude of the saccades detected by GazeParser agreed with those detected by a commercial eyetracker. These results showed that the GazeParser demonstrates adequate performance for use in psychological experiments.

  6. ACPYPE - AnteChamber PYthon Parser interfacE.

    PubMed

    Sousa da Silva, Alan W; Vranken, Wim F

    2012-07-23

    ACPYPE (or AnteChamber PYthon Parser interfacE) is a wrapper script around the ANTECHAMBER software that simplifies the generation of small molecule topologies and parameters for a variety of molecular dynamics programmes like GROMACS, CHARMM and CNS. It is written in the Python programming language and was developed as a tool for interfacing with other Python based applications such as the CCPN software suite (for NMR data analysis) and ARIA (for structure calculations from NMR data). ACPYPE is open source code, under GNU GPL v3, and is available as a stand-alone application at http://www.ccpn.ac.uk/acpype and as a web portal application at http://webapps.ccpn.ac.uk/acpype. We verified the topologies generated by ACPYPE in three ways: by comparing with default AMBER topologies for standard amino acids; by generating and verifying topologies for a large set of ligands from the PDB; and by recalculating the structures for 5 protein-ligand complexes from the PDB. ACPYPE is a tool that simplifies the automatic generation of topology and parameters in different formats for different molecular mechanics programmes, including calculation of partial charges, while being object oriented for integration with other applications.

  7. Expressions Module for the Satellite Orbit Analysis Program

    NASA Technical Reports Server (NTRS)

    Edmonds, Karina

    2008-01-01

    The Expressions Module is a software module that has been incorporated into the Satellite Orbit Analysis Program (SOAP). The module includes an expressions- parser submodule built on top of an analytical system, enabling the user to define logical and numerical variables and constants. The variables can capture output from SOAP orbital-prediction and geometric-engine computations. The module can combine variables and constants with built-in logical operators (such as Boolean AND, OR, and NOT), relational operators (such as >, <, or =), and mathematical operators (such as addition, subtraction, multiplication, division, modulus, exponentiation, differentiation, and integration). Parentheses can be used to specify precedence of operations. The module contains a library of mathematical functions and operations, including logarithms, trigonometric functions, Bessel functions, minimum/ maximum operations, and floating- point-to-integer conversions. The module supports combinations of time, distance, and angular units and has a dimensional- analysis component that checks for correct usage of units. A parser based on the Flex language and the Bison program looks for and indicates errors in syntax. SOAP expressions can be built using other expressions as arguments, thus enabling the user to build analytical trees. A graphical user interface facilitates use.

  8. Light at Night Markup Language (LANML): XML Technology for Light at Night Monitoring Data

    NASA Astrophysics Data System (ADS)

    Craine, B. L.; Craine, E. R.; Craine, E. M.; Crawford, D. L.

    2013-05-01

    Light at Night Markup Language (LANML) is a standard, based upon XML, useful in acquiring, validating, transporting, archiving and analyzing multi-dimensional light at night (LAN) datasets of any size. The LANML standard can accommodate a variety of measurement scenarios including single spot measures, static time-series, web based monitoring networks, mobile measurements, and airborne measurements. LANML is human-readable, machine-readable, and does not require a dedicated parser. In addition LANML is flexible; ensuring future extensions of the format will remain backward compatible with analysis software. The XML technology is at the heart of communicating over the internet and can be equally useful at the desktop level, making this standard particularly attractive for web based applications, educational outreach and efficient collaboration between research groups.

  9. The Importance of Reading Naturally: Evidence from Combined Recordings of Eye Movements and Electric Brain Potentials

    ERIC Educational Resources Information Center

    Metzner, Paul; von der Malsburg, Titus; Vasishth, Shravan; Rösler, Frank

    2017-01-01

    How important is the ability to freely control eye movements for reading comprehension? And how does the parser make use of this freedom? We investigated these questions using coregistration of eye movements and event-related brain potentials (ERPs) while participants read either freely or in a computer-controlled word-by-word format (also known…

  10. Is human sentence parsing serial or parallel? Evidence from event-related brain potentials.

    PubMed

    Hopf, Jens-Max; Bader, Markus; Meng, Michael; Bayer, Josef

    2003-01-01

    In this ERP study we investigate the processes that occur in syntactically ambiguous German sentences at the point of disambiguation. Whereas most psycholinguistic theories agree on the view that processing difficulties arise when parsing preferences are disconfirmed (so-called garden-path effects), important differences exist with respect to theoretical assumptions about the parser's recovery from a misparse. A key distinction can be made between parsers that compute all alternative syntactic structures in parallel (parallel parsers) and parsers that compute only a single preferred analysis (serial parsers). To distinguish empirically between parallel and serial parsing models, we compare ERP responses to garden-path sentences with ERP responses to truly ungrammatical sentences. Garden-path sentences contain a temporary and ultimately curable ungrammaticality, whereas truly ungrammatical sentences remain so permanently--a difference which gives rise to different predictions in the two classes of parsing architectures. At the disambiguating word, ERPs in both sentence types show negative shifts of similar onset latency, amplitude, and scalp distribution in an initial time window between 300 and 500 ms. In a following time window (500-700 ms), the negative shift to garden-path sentences disappears at right central parietal sites, while it continues in permanently ungrammatical sentences. These data are taken as evidence for a strictly serial parser. The absence of a difference in the early time window indicates that temporary and permanent ungrammaticalities trigger the same kind of parsing responses. Later differences can be related to successful reanalysis in garden-path but not in ungrammatical sentences. Copyright 2003 Elsevier Science B.V.

  11. A person is not a number: discourse involvement in subject-verb agreement computation.

    PubMed

    Mancini, Simona; Molinaro, Nicola; Rizzi, Luigi; Carreiras, Manuel

    2011-09-02

    Agreement is a very important mechanism for language processing. Mainstream psycholinguistic research on subject-verb agreement processing has emphasized the purely formal and encapsulated nature of this phenomenon, positing an equivalent access to person and number features. However, person and number are intrinsically different, because person conveys extra-syntactic information concerning the participants in the speech act. To test the person-number dissociation hypothesis we investigated the neural correlates of subject-verb agreement in Spanish, using person and number violations. While number agreement violations produced a left-anterior negativity followed by a P600 with a posterior distribution, the negativity elicited by person anomalies had a centro-posterior maximum and was followed by a P600 effect that was frontally distributed in the early phase and posteriorly distributed in the late phase. These data reveal that the parser is differentially sensitive to the two features and that it deals with the two anomalies by adopting different strategies, due to the different levels of analysis affected by the person and number violations. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. The CMS DBS query language

    NASA Astrophysics Data System (ADS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2010-04-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  13. Highs and Lows in English Attachment.

    PubMed

    Grillo, Nino; Costa, João; Fernandes, Bruno; Santi, Andrea

    2015-11-01

    Grillo and Costa (2014) claim that Relative-Clause attachment ambiguity resolution is largely dependent on whether or not a Pseudo-Relative interpretation is available. Data from Italian, and other languages allowing Pseudo-Relatives, support this hypothesis. Pseudo-Relative availability, however, covaries with the semantics of the main predicate (e.g., perceptual vs. stative). Experiment 1 assesses whether this predicate distinction alone can account for prior attachment results by testing it with a language that disallows Pseudo-Relatives (i.e. English). Low Attachment was found independent of Predicate-Type. Predicate-Type did however have a minor modulatory role. Experiment 2 shows that English, traditionally classified as a Low Attachment language, can demonstrate High Attachment with sentences globally ambiguous between a Small-Clause and a reduced Relative-Clause interpretation. These results support a grammatical account of previous effects and provide novel evidence for the parser's preference of a Small-Clause over a Restrictive interpretation, crosslinguistically. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Computer-Assisted Update of a Consumer Health Vocabulary Through Mining of Social Network Data

    PubMed Central

    2011-01-01

    Background Consumer health vocabularies (CHVs) have been developed to aid consumer health informatics applications. This purpose is best served if the vocabulary evolves with consumers’ language. Objective Our objective was to create a computer assisted update (CAU) system that works with live corpora to identify new candidate terms for inclusion in the open access and collaborative (OAC) CHV. Methods The CAU system consisted of three main parts: a Web crawler and an HTML parser, a candidate term filter that utilizes natural language processing tools including term recognition methods, and a human review interface. In evaluation, the CAU system was applied to the health-related social network website PatientsLikeMe.com. The system’s utility was assessed by comparing the candidate term list it generated to a list of valid terms hand extracted from the text of the crawled webpages. Results The CAU system identified 88,994 unique terms 1- to 7-grams (“n-grams” are n consecutive words within a sentence) in 300 crawled PatientsLikeMe.com webpages. The manual review of the crawled webpages identified 651 valid terms not yet included in the OAC CHV or the Unified Medical Language System (UMLS) Metathesaurus, a collection of vocabularies amalgamated to form an ontology of medical terms, (ie, 1 valid term per 136.7 candidate n-grams). The term filter selected 774 candidate terms, of which 237 were valid terms, that is, 1 valid term among every 3 or 4 candidates reviewed. Conclusion The CAU system is effective for generating a list of candidate terms for human review during CHV development. PMID:21586386

  15. Computer-assisted update of a consumer health vocabulary through mining of social network data.

    PubMed

    Doing-Harris, Kristina M; Zeng-Treitler, Qing

    2011-05-17

    Consumer health vocabularies (CHVs) have been developed to aid consumer health informatics applications. This purpose is best served if the vocabulary evolves with consumers' language. Our objective was to create a computer assisted update (CAU) system that works with live corpora to identify new candidate terms for inclusion in the open access and collaborative (OAC) CHV. The CAU system consisted of three main parts: a Web crawler and an HTML parser, a candidate term filter that utilizes natural language processing tools including term recognition methods, and a human review interface. In evaluation, the CAU system was applied to the health-related social network website PatientsLikeMe.com. The system's utility was assessed by comparing the candidate term list it generated to a list of valid terms hand extracted from the text of the crawled webpages. The CAU system identified 88,994 unique terms 1- to 7-grams ("n-grams" are n consecutive words within a sentence) in 300 crawled PatientsLikeMe.com webpages. The manual review of the crawled webpages identified 651 valid terms not yet included in the OAC CHV or the Unified Medical Language System (UMLS) Metathesaurus, a collection of vocabularies amalgamated to form an ontology of medical terms, (ie, 1 valid term per 136.7 candidate n-grams). The term filter selected 774 candidate terms, of which 237 were valid terms, that is, 1 valid term among every 3 or 4 candidates reviewed. The CAU system is effective for generating a list of candidate terms for human review during CHV development.

  16. MASCOT HTML and XML parser: an implementation of a novel object model for protein identification data.

    PubMed

    Yang, Chunguang G; Granite, Stephen J; Van Eyk, Jennifer E; Winslow, Raimond L

    2006-11-01

    Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

  17. Investigating AI with BASIC and Logo: Helping the Computer to Understand INPUTS.

    ERIC Educational Resources Information Center

    Mandell, Alan; Lucking, Robert

    1988-01-01

    Investigates using the microcomputer to develop a sentence parser to simulate intelligent conversation used in artificial intelligence applications. Compares the ability of LOGO and BASIC for this use. Lists and critiques several LOGO and BASIC parser programs. (MVL)

  18. Performance evaluation of continuity of care records (CCRs): parsing models in a mobile health management system.

    PubMed

    Chen, Hung-Ming; Liou, Yong-Zan

    2014-10-01

    In a mobile health management system, mobile devices act as the application hosting devices for personal health records (PHRs) and the healthcare servers construct to exchange and analyze PHRs. One of the most popular PHR standards is continuity of care record (CCR). The CCR is expressed in XML formats. However, parsing is an expensive operation that can degrade XML processing performance. Hence, the objective of this study was to identify different operational and performance characteristics for those CCR parsing models including the XML DOM parser, the SAX parser, the PULL parser, and the JSON parser with regard to JSON data converted from XML-based CCR. Thus, developers can make sensible choices for their target PHR applications to parse CCRs when using mobile devices or servers with different system resources. Furthermore, the simulation experiments of four case studies are conducted to compare the parsing performance on Android mobile devices and the server with large quantities of CCR data.

  19. Progress in The Semantic Analysis of Scientific Code

    NASA Technical Reports Server (NTRS)

    Stewart, Mark

    2000-01-01

    This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, independent expert parsers. These semantic parsers encode domain knowledge and recognize formulae in different disciplines including physics, numerical methods, mathematics, and geometry. The parsers will automatically recognize and document some static, semantic concepts and help locate some program semantic errors. These techniques may apply to a wider range of scientific codes. If so, the techniques could reduce the time, risk, and effort required to develop and modify scientific codes.

  20. Thermo-msf-parser: an open source Java library to parse and visualize Thermo Proteome Discoverer msf files.

    PubMed

    Colaert, Niklaas; Barsnes, Harald; Vaudel, Marc; Helsens, Kenny; Timmerman, Evy; Sickmann, Albert; Gevaert, Kris; Martens, Lennart

    2011-08-05

    The Thermo Proteome Discoverer program integrates both peptide identification and quantification into a single workflow for peptide-centric proteomics. Furthermore, its close integration with Thermo mass spectrometers has made it increasingly popular in the field. Here, we present a Java library to parse the msf files that constitute the output of Proteome Discoverer. The parser is also implemented as a graphical user interface allowing convenient access to the information found in the msf files, and in Rover, a program to analyze and validate quantitative proteomics information. All code, binaries, and documentation is freely available at http://thermo-msf-parser.googlecode.com.

  1. Construction of a robust, large-scale, collaborative database for raw data in computational chemistry: the Collaborative Chemistry Database Tool (CCDBT).

    PubMed

    Chen, Mingyang; Stott, Amanda C; Li, Shenggang; Dixon, David A

    2012-04-01

    A robust metadata database called the Collaborative Chemistry Database Tool (CCDBT) for massive amounts of computational chemistry raw data has been designed and implemented. It performs data synchronization and simultaneously extracts the metadata. Computational chemistry data in various formats from different computing sources, software packages, and users can be parsed into uniform metadata for storage in a MySQL database. Parsing is performed by a parsing pyramid, including parsers written for different levels of data types and sets created by the parser loader after loading parser engines and configurations. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Memory Retrieval in Parsing and Interpretation

    ERIC Educational Resources Information Center

    Schlueter, Ananda Lila Zoe

    2017-01-01

    This dissertation explores the relationship between the parser and the grammar in error-driven retrieval by examining the mechanism underlying the illusory licensing of subject-verb agreement violations ("agreement attraction"). Previous work motivates a two-stage model of agreement attraction in which the parser predicts the verb's…

  3. Looking forwards and backwards: The real-time processing of Strong and Weak Crossover

    PubMed Central

    Lidz, Jeffrey; Phillips, Colin

    2017-01-01

    We investigated the processing of pronouns in Strong and Weak Crossover constructions as a means of probing the extent to which the incremental parser can use syntactic information to guide antecedent retrieval. In Experiment 1 we show that the parser accesses a displaced wh-phrase as an antecedent for a pronoun when no grammatical constraints prohibit binding, but the parser ignores the same wh-phrase when it stands in a Strong Crossover relation to the pronoun. These results are consistent with two possibilities. First, the parser could apply Principle C at antecedent retrieval to exclude the wh-phrase on the basis of the c-command relation between its gap and the pronoun. Alternatively, retrieval might ignore any phrases that do not occupy an Argument position. Experiment 2 distinguished between these two possibilities by testing antecedent retrieval under Weak Crossover. In Weak Crossover binding of the pronoun is ruled out by the argument condition, but not Principle C. The results of Experiment 2 indicate that antecedent retrieval accesses matching wh-phrases in Weak Crossover configurations. On the basis of these findings we conclude that the parser can make rapid use of Principle C and c-command information to constrain retrieval. We discuss how our results support a view of antecedent retrieval that integrates inferences made over unseen syntactic structure into constraints on backward-looking processes like memory retrieval. PMID:28936483

  4. Towards automated processing of clinical Finnish: sublanguage analysis and a rule-based parser.

    PubMed

    Laippala, Veronika; Ginter, Filip; Pyysalo, Sampo; Salakoski, Tapio

    2009-12-01

    In this paper, we present steps taken towards more efficient automated processing of clinical Finnish, focusing on daily nursing notes in a Finnish Intensive Care Unit (ICU). First, we analyze ICU Finnish as a sublanguage, identifying its specific features facilitating, for example, the development of a specialized syntactic analyser. The identified features include frequent omission of finite verbs, limitations in allowed syntactic structures, and domain-specific vocabulary. Second, we develop a formal grammar and a parser for ICU Finnish, thus providing better tools for the development of further applications in the clinical domain. The grammar is implemented in the LKB system in a typed feature structure formalism. The lexicon is automatically generated based on the output of the FinTWOL morphological analyzer adapted to the clinical domain. As an additional experiment, we study the effect of using Finnish constraint grammar to reduce the size of the lexicon. The parser construction thus makes efficient use of existing resources for Finnish. The grammar currently covers 76.6% of ICU Finnish sentences, producing highly accurate best-parse analyzes with F-score of 91.1%. We find that building a parser for the highly specialized domain sublanguage is not only feasible, but also surprisingly efficient, given an existing morphological analyzer with broad vocabulary coverage. The resulting parser enables a deeper analysis of the text than was previously possible.

  5. Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies

    PubMed Central

    Fan, Jung-Wei; Friedman, Carol

    2011-01-01

    Biomedical natural language processing (BioNLP) is a useful technique that unlocks valuable information stored in textual data for practice and/or research. Syntactic parsing is a critical component of BioNLP applications that rely on correctly determining the sentence and phrase structure of free text. In addition to dealing with the vast amount of domain-specific terms, a robust biomedical parser needs to model the semantic grammar to obtain viable syntactic structures. With either a rule-based or corpus-based approach, the grammar engineering process requires substantial time and knowledge from experts, and does not always yield a semantically transferable grammar. To reduce the human effort and to promote semantic transferability, we propose an automated method for deriving a probabilistic grammar based on a training corpus consisting of concept strings and semantic classes from the Unified Medical Language System (UMLS), a comprehensive terminology resource widely used by the community. The grammar is designed to specify noun phrases only due to the nominal nature of the majority of biomedical terminological concepts. Evaluated on manually parsed clinical notes, the derived grammar achieved a recall of 0.644, precision of 0.737, and average cross-bracketing of 0.61, which demonstrated better performance than a control grammar with the semantic information removed. Error analysis revealed shortcomings that could be addressed to improve performance. The results indicated the feasibility of an approach which automatically incorporates terminology semantics in the building of an operational grammar. Although the current performance of the unsupervised solution does not adequately replace manual engineering, we believe once the performance issues are addressed, it could serve as an aide in a semi-supervised solution. PMID:21549857

  6. Linking Parser Development to Acquisition of Syntactic Knowledge

    ERIC Educational Resources Information Center

    Omaki, Akira; Lidz, Jeffrey

    2015-01-01

    Traditionally, acquisition of syntactic knowledge and the development of sentence comprehension behaviors have been treated as separate disciplines. This article reviews a growing body of work on the development of incremental sentence comprehension mechanisms and discusses how a better understanding of the developing parser can shed light on two…

  7. The value of parsing as feature generation for gene mention recognition

    PubMed Central

    Smith, Larry H; Wilbur, W John

    2009-01-01

    We measured the extent to which information surrounding a base noun phrase reflects the presence of a gene name, and evaluated seven different parsers in their ability to provide information for that purpose. Using the GENETAG corpus as a gold standard, we performed machine learning to recognize from its context when a base noun phrase contained a gene name. Starting with the best lexical features, we assessed the gain of adding dependency or dependency-like relations from a full sentence parse. Features derived from parsers improved performance in this partial gene mention recognition task by a small but statistically significant amount. There were virtually no differences between parsers in these experiments. PMID:19345281

  8. Retrieval Interference in Syntactic Processing: The Case of Reflexive Binding in English.

    PubMed

    Patil, Umesh; Vasishth, Shravan; Lewis, Richard L

    2016-01-01

    It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory-which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in-constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue-based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues.

  9. Retrieval Interference in Syntactic Processing: The Case of Reflexive Binding in English

    PubMed Central

    Patil, Umesh; Vasishth, Shravan; Lewis, Richard L.

    2016-01-01

    It has been proposed that in online sentence comprehension the dependency between a reflexive pronoun such as himself/herself and its antecedent is resolved using exclusively syntactic constraints. Under this strictly syntactic search account, Principle A of the binding theory—which requires that the antecedent c-command the reflexive within the same clause that the reflexive occurs in—constrains the parser's search for an antecedent. The parser thus ignores candidate antecedents that might match agreement features of the reflexive (e.g., gender) but are ineligible as potential antecedents because they are in structurally illicit positions. An alternative possibility accords no special status to structural constraints: in addition to using Principle A, the parser also uses non-structural cues such as gender to access the antecedent. According to cue-based retrieval theories of memory (e.g., Lewis and Vasishth, 2005), the use of non-structural cues should result in increased retrieval times and occasional errors when candidates partially match the cues, even if the candidates are in structurally illicit positions. In this paper, we first show how the retrieval processes that underlie the reflexive binding are naturally realized in the Lewis and Vasishth (2005) model. We present the predictions of the model under the assumption that both structural and non-structural cues are used during retrieval, and provide a critical analysis of previous empirical studies that failed to find evidence for the use of non-structural cues, suggesting that these failures may be Type II errors. We use this analysis and the results of further modeling to motivate a new empirical design that we use in an eye tracking study. The results of this study confirm the key predictions of the model concerning the use of non-structural cues, and are inconsistent with the strictly syntactic search account. These results present a challenge for theories advocating the infallibility of the human parser in the case of reflexive resolution, and provide support for the inclusion of agreement features such as gender in the set of retrieval cues. PMID:27303315

  10. An Experiment in Scientific Code Semantic Analysis

    NASA Technical Reports Server (NTRS)

    Stewart, Mark E. M.

    1998-01-01

    This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, distributed expert parsers. These semantic parser are designed to recognize formulae in different disciplines including physical and mathematical formulae and geometrical position in a numerical scheme. The parsers will automatically recognize and document some static, semantic concepts and locate some program semantic errors. Results are shown for a subroutine test case and a collection of combustion code routines. This ability to locate some semantic errors and document semantic concepts in scientific and engineering code should reduce the time, risk, and effort of developing and using these codes.

  11. XAFSmass: a program for calculating the optimal mass of XAFS samples

    NASA Astrophysics Data System (ADS)

    Klementiev, K.; Chernikov, R.

    2016-05-01

    We present a new implementation of the XAFSmass program that calculates the optimal mass of XAFS samples. It has several improvements as compared to the old Windows based program XAFSmass: 1) it is truly platform independent, as provided by Python language, 2) it has an improved parser of chemical formulas that enables parentheses and nested inclusion-to-matrix weight percentages. The program calculates the absorption edge height given the total optical thickness, operates with differently determined sample amounts (mass, pressure, density or sample area) depending on the aggregate state of the sample and solves the inverse problem of finding the elemental composition given the experimental absorption edge jump and the chemical formula.

  12. ImageParser: a tool for finite element generation from three-dimensional medical images

    PubMed Central

    Yin, HM; Sun, LZ; Wang, G; Yamada, T; Wang, J; Vannier, MW

    2004-01-01

    Background The finite element method (FEM) is a powerful mathematical tool to simulate and visualize the mechanical deformation of tissues and organs during medical examinations or interventions. It is yet a challenge to build up an FEM mesh directly from a volumetric image partially because the regions (or structures) of interest (ROIs) may be irregular and fuzzy. Methods A software package, ImageParser, is developed to generate an FEM mesh from 3-D tomographic medical images. This software uses a semi-automatic method to detect ROIs from the context of image including neighboring tissues and organs, completes segmentation of different tissues, and meshes the organ into elements. Results The ImageParser is shown to build up an FEM model for simulating the mechanical responses of the breast based on 3-D CT images. The breast is compressed by two plate paddles under an overall displacement as large as 20% of the initial distance between the paddles. The strain and tangential Young's modulus distributions are specified for the biomechanical analysis of breast tissues. Conclusion The ImageParser can successfully exact the geometry of ROIs from a complex medical image and generate the FEM mesh with customer-defined segmentation information. PMID:15461787

  13. An Experiment in Scientific Program Understanding

    NASA Technical Reports Server (NTRS)

    Stewart, Mark E. M.; Owen, Karl (Technical Monitor)

    2000-01-01

    This paper concerns a procedure that analyzes aspects of the meaning or semantics of scientific and engineering code. This procedure involves taking a user's existing code, adding semantic declarations for some primitive variables, and parsing this annotated code using multiple, independent expert parsers. These semantic parsers encode domain knowledge and recognize formulae in different disciplines including physics, numerical methods, mathematics, and geometry. The parsers will automatically recognize and document some static, semantic concepts and help locate some program semantic errors. Results are shown for three intensively studied codes and seven blind test cases; all test cases are state of the art scientific codes. These techniques may apply to a wider range of scientific codes. If so, the techniques could reduce the time, risk, and effort required to develop and modify scientific codes.

  14. Locating Anomalies in Complex Data Sets Using Visualization and Simulation

    NASA Technical Reports Server (NTRS)

    Panetta, Karen

    2001-01-01

    The research goals are to create a simulation framework that can accept any combination of models written at the gate or behavioral level. The framework provides the ability to fault simulate and create scenarios of experiments using concurrent simulation. In order to meet these goals we have had to fulfill the following requirements. The ability to accept models written in VHDL, Verilog or the C languages. The ability to propagate faults through any model type. The ability to create experiment scenarios efficiently without generating every possible combination of variables. The ability to accept adversity of fault models beyond the single stuck-at model. Major development has been done to develop a parser that can accept models written in various languages. This work has generated considerable attention from other universities and industry for its flexibility and usefulness. The parser uses LEXX and YACC to parse Verilog and C. We have also utilized our industrial partnership with Alternative System's Inc. to import vhdl into our simulator. For multilevel simulation, we needed to modify the simulator architecture to accept models that contained multiple outputs. This enabled us to accept behavioral components. The next major accomplishment was the addition of "functional fault models". Functional fault models change the behavior of a gate or model. For example, a bridging fault can make an OR gate behave like an AND gate. This has applications beyond fault simulation. This modeling flexibility will make the simulator more useful for doing verification and model comparison. For instance, two or more versions of an ALU can be comparatively simulated in a single execution. The results will show where and how the models differed so that the performance and correctness of the models may be evaluated. A considerable amount of time has been dedicated to validating the simulator performance on larger models provided by industry and other universities.

  15. Speech rhythm facilitates syntactic ambiguity resolution: ERP evidence.

    PubMed

    Roncaglia-Denissen, Maria Paula; Schmidt-Kassow, Maren; Kotz, Sonja A

    2013-01-01

    In the current event-related potential (ERP) study, we investigated how speech rhythm impacts speech segmentation and facilitates the resolution of syntactic ambiguities in auditory sentence processing. Participants listened to syntactically ambiguous German subject- and object-first sentences that were spoken with either regular or irregular speech rhythm. Rhythmicity was established by a constant metric pattern of three unstressed syllables between two stressed ones that created rhythmic groups of constant size. Accuracy rates in a comprehension task revealed that participants understood rhythmically regular sentences better than rhythmically irregular ones. Furthermore, the mean amplitude of the P600 component was reduced in response to object-first sentences only when embedded in rhythmically regular but not rhythmically irregular context. This P600 reduction indicates facilitated processing of sentence structure possibly due to a decrease in processing costs for the less-preferred structure (object-first). Our data suggest an early and continuous use of rhythm by the syntactic parser and support language processing models assuming an interactive and incremental use of linguistic information during language processing.

  16. Speech Rhythm Facilitates Syntactic Ambiguity Resolution: ERP Evidence

    PubMed Central

    Roncaglia-Denissen, Maria Paula; Schmidt-Kassow, Maren; Kotz, Sonja A.

    2013-01-01

    In the current event-related potential (ERP) study, we investigated how speech rhythm impacts speech segmentation and facilitates the resolution of syntactic ambiguities in auditory sentence processing. Participants listened to syntactically ambiguous German subject- and object-first sentences that were spoken with either regular or irregular speech rhythm. Rhythmicity was established by a constant metric pattern of three unstressed syllables between two stressed ones that created rhythmic groups of constant size. Accuracy rates in a comprehension task revealed that participants understood rhythmically regular sentences better than rhythmically irregular ones. Furthermore, the mean amplitude of the P600 component was reduced in response to object-first sentences only when embedded in rhythmically regular but not rhythmically irregular context. This P600 reduction indicates facilitated processing of sentence structure possibly due to a decrease in processing costs for the less-preferred structure (object-first). Our data suggest an early and continuous use of rhythm by the syntactic parser and support language processing models assuming an interactive and incremental use of linguistic information during language processing. PMID:23409109

  17. An English language interface for constrained domains

    NASA Technical Reports Server (NTRS)

    Page, Brenda J.

    1989-01-01

    The Multi-Satellite Operations Control Center (MSOCC) Jargon Interpreter (MJI) demonstrates an English language interface for a constrained domain. A constrained domain is defined as one with a small and well delineated set of actions and objects. The set of actions chosen for the MJI is from the domain of MSOCC Applications Executive (MAE) Systems Test and Operations Language (STOL) directives and contains directives for signing a cathode ray tube (CRT) on or off, calling up or clearing a display page, starting or stopping a procedure, and controlling history recording. The set of objects chosen consists of CRTs, display pages, STOL procedures, and history files. Translation from English sentences to STOL directives is done in two phases. In the first phase, an augmented transition net (ATN) parser and dictionary are used for determining grammatically correct parsings of input sentences. In the second phase, grammatically typed sentences are submitted to a forward-chaining rule-based system for interpretation and translation into equivalent MAE STOL directives. Tests of the MJI show that it is able to translate individual clearly stated sentences into the subset of directives selected for the prototype. This approach to an English language interface may be used for similarly constrained situations by modifying the MJI's dictionary and rules to reflect the change of domain.

  18. Syntactic analysis in sentence comprehension: effects of dependency types and grammatical constraints.

    PubMed

    De Vincenzi, M

    1996-01-01

    This paper presents three experiments on the parsing of Italian wh-questions that manipulate the wh-type (who vs. which-N) and the wh extraction site (main clause, dependent clause with or without complementizer). The aim of these manipulations is to see whether the parser is sensitive to the type of dependencies being processed and whether the processing effects can be explained by a unique processing principle, the minimal chain principle (MCP; De Vincenzi, 1991). The results show that the parser, following the MCP, prefers structures with fewer and less complex chains. In particular: (1) There is a processing advantage for the wh-subject extractions, the structures with less complex chains; (2) there is a processing dissociation between the who and which questions; (3) the parser respects the principle that governs the well-formedness of the empty categories (ECP).

  19. Hyper-active gap filling

    PubMed Central

    Omaki, Akira; Lau, Ellen F.; Davidson White, Imogen; Dakan, Myles L.; Apple, Aaron; Phillips, Colin

    2015-01-01

    Much work has demonstrated that speakers of verb-final languages are able to construct rich syntactic representations in advance of verb information. This may reflect general architectural properties of the language processor, or it may only reflect a language-specific adaptation to the demands of verb-finality. The present study addresses this issue by examining whether speakers of a verb-medial language (English) wait to consult verb transitivity information before constructing filler-gap dependencies, where internal arguments are fronted and hence precede the verb. This configuration makes it possible to investigate whether the parser actively makes representational commitments on the gap position before verb transitivity information becomes available. A key prediction of the view that rich pre-verbal structure building is a general architectural property is that speakers of verb-medial languages should predictively construct dependencies in advance of verb transitivity information, and therefore that disruption should be observed when the verb has intransitive subcategorization frames that are incompatible with the predicted structure. In three reading experiments (self-paced and eye-tracking) that manipulated verb transitivity, we found evidence for reading disruption when the verb was intransitive, although no such reading difficulty was observed when the critical verb was embedded inside a syntactic island structure, which blocks filler-gap dependency completion. These results are consistent with the hypothesis that in English, as in verb-final languages, information from preverbal noun phrases is sufficient to trigger active dependency completion without having access to verb transitivity information. PMID:25914658

  20. Hyper-active gap filling.

    PubMed

    Omaki, Akira; Lau, Ellen F; Davidson White, Imogen; Dakan, Myles L; Apple, Aaron; Phillips, Colin

    2015-01-01

    Much work has demonstrated that speakers of verb-final languages are able to construct rich syntactic representations in advance of verb information. This may reflect general architectural properties of the language processor, or it may only reflect a language-specific adaptation to the demands of verb-finality. The present study addresses this issue by examining whether speakers of a verb-medial language (English) wait to consult verb transitivity information before constructing filler-gap dependencies, where internal arguments are fronted and hence precede the verb. This configuration makes it possible to investigate whether the parser actively makes representational commitments on the gap position before verb transitivity information becomes available. A key prediction of the view that rich pre-verbal structure building is a general architectural property is that speakers of verb-medial languages should predictively construct dependencies in advance of verb transitivity information, and therefore that disruption should be observed when the verb has intransitive subcategorization frames that are incompatible with the predicted structure. In three reading experiments (self-paced and eye-tracking) that manipulated verb transitivity, we found evidence for reading disruption when the verb was intransitive, although no such reading difficulty was observed when the critical verb was embedded inside a syntactic island structure, which blocks filler-gap dependency completion. These results are consistent with the hypothesis that in English, as in verb-final languages, information from preverbal noun phrases is sufficient to trigger active dependency completion without having access to verb transitivity information.

  1. iBIOMES Lite: Summarizing Biomolecular Simulation Data in Limited Settings

    PubMed Central

    2015-01-01

    As the amount of data generated by biomolecular simulations dramatically increases, new tools need to be developed to help manage this data at the individual investigator or small research group level. In this paper, we introduce iBIOMES Lite, a lightweight tool for biomolecular simulation data indexing and summarization. The main goal of iBIOMES Lite is to provide a simple interface to summarize computational experiments in a setting where the user might have limited privileges and limited access to IT resources. A command-line interface allows the user to summarize, publish, and search local simulation data sets. Published data sets are accessible via static hypertext markup language (HTML) pages that summarize the simulation protocols and also display data analysis graphically. The publication process is customized via extensible markup language (XML) descriptors while the HTML summary template is customized through extensible stylesheet language (XSL). iBIOMES Lite was tested on different platforms and at several national computing centers using various data sets generated through classical and quantum molecular dynamics, quantum chemistry, and QM/MM. The associated parsers currently support AMBER, GROMACS, Gaussian, and NWChem data set publication. The code is available at https://github.com/jcvthibault/ibiomes. PMID:24830957

  2. An Improved Tarpit for Network Deception

    DTIC Science & Technology

    2016-03-25

    World” program was, to one who is ready to join the cyber security workforce. Thirdly, I thank my mom and dad for their constant love , support, and...arrow in a part-whole relationship . In the diagram GreaseMonkey contains the three packet handler classes. The numbers next to the PriorityQueue and...arrow from Greasy to the config_parser module represents a usage relationship , where Greasy uses functions from config_parser to parse the configuration

  3. Extracting BI-RADS Features from Portuguese Clinical Texts.

    PubMed

    Nassif, Houssam; Cunha, Filipe; Moreira, Inês C; Cruz-Correia, Ricardo; Sousa, Eliana; Page, David; Burnside, Elizabeth; Dutra, Inês

    2012-01-01

    In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation by a specialist in mammography. Our results show that our parser's performance is comparable to the manual method.

  4. A Semantic Analysis Method for Scientific and Engineering Code

    NASA Technical Reports Server (NTRS)

    Stewart, Mark E. M.

    1998-01-01

    This paper develops a procedure to statically analyze aspects of the meaning or semantics of scientific and engineering code. The analysis involves adding semantic declarations to a user's code and parsing this semantic knowledge with the original code using multiple expert parsers. These semantic parsers are designed to recognize formulae in different disciplines including physical and mathematical formulae and geometrical position in a numerical scheme. In practice, a user would submit code with semantic declarations of primitive variables to the analysis procedure, and its semantic parsers would automatically recognize and document some static, semantic concepts and locate some program semantic errors. A prototype implementation of this analysis procedure is demonstrated. Further, the relationship between the fundamental algebraic manipulations of equations and the parsing of expressions is explained. This ability to locate some semantic errors and document semantic concepts in scientific and engineering code should reduce the time, risk, and effort of developing and using these codes.

  5. NASA Tech Briefs, April 2010

    NASA Technical Reports Server (NTRS)

    2010-01-01

    Topics covered include: Active and Passive Hybrid Sensor; Quick-Response Thermal Actuator for Use as a Heat Switch; System for Hydrogen Sensing; Method for Detecting Perlite Compaction in Large Cryogenic Tanks; Using Thin-Film Thermometers as Heaters in Thermal Control Applications; Directional Spherical Cherenkov Detector; AlGaN Ultraviolet Detectors for Dual-Band UV Detection; K-Band Traveling-Wave Tube Amplifier; Simplified Load-Following Control for a Fuel Cell System; Modified Phase-meter for a Heterodyne Laser Interferometer; Loosely Coupled GPS-Aided Inertial Navigation System for Range Safety; Sideband-Separating, Millimeter-Wave Heterodyne Receiver; Coaxial Propellant Injectors With Faceplate Annulus Control; Adaptable Diffraction Gratings With Wavefront Transformation; Optimizing a Laser Process for Making Carbon Nanotubes; Thermogravimetric Analysis of Single-Wall Carbon Nanotubes; Robotic Arm Comprising Two Bending Segments; Magnetostrictive Brake; Low-Friction, Low-Profile, High-Moment Two-Axis Joint; Foil Gas Thrust Bearings for High-Speed Turbomachinery; Miniature Multi-Axis Mechanism for Hand Controllers; Digitally Enhanced Heterodyne Interferometry; Focusing Light Beams To Improve Atomic-Vapor Optical Buffers; Landmark Detection in Orbital Images Using Salience Histograms; Efficient Bit-to-Symbol Likelihood Mappings; Capacity Maximizing Constellations; Natural-Language Parser for PBEM; Policy Process Editor for P(sup 3)BM Software; A Quality System Database; Trajectory Optimization: OTIS 4; and Computer Software Configuration Item-Specific Flight Software Image Transfer Script Generator.

  6. DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx

    PubMed Central

    Mehrabi, Saeed; Krishnan, Anand; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew

    2018-01-01

    In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. PMID:25791500

  7. SOL - SIZING AND OPTIMIZATION LANGUAGE COMPILER

    NASA Technical Reports Server (NTRS)

    Scotti, S. J.

    1994-01-01

    SOL is a computer language which is geared to solving design problems. SOL includes the mathematical modeling and logical capabilities of a computer language like FORTRAN but also includes the additional power of non-linear mathematical programming methods (i.e. numerical optimization) at the language level (as opposed to the subroutine level). The language-level use of optimization has several advantages over the traditional, subroutine-calling method of using an optimizer: first, the optimization problem is described in a concise and clear manner which closely parallels the mathematical description of optimization; second, a seamless interface is automatically established between the optimizer subroutines and the mathematical model of the system being optimized; third, the results of an optimization (objective, design variables, constraints, termination criteria, and some or all of the optimization history) are output in a form directly related to the optimization description; and finally, automatic error checking and recovery from an ill-defined system model or optimization description is facilitated by the language-level specification of the optimization problem. Thus, SOL enables rapid generation of models and solutions for optimum design problems with greater confidence that the problem is posed correctly. The SOL compiler takes SOL-language statements and generates the equivalent FORTRAN code and system calls. Because of this approach, the modeling capabilities of SOL are extended by the ability to incorporate existing FORTRAN code into a SOL program. In addition, SOL has a powerful MACRO capability. The MACRO capability of the SOL compiler effectively gives the user the ability to extend the SOL language and can be used to develop easy-to-use shorthand methods of generating complex models and solution strategies. The SOL compiler provides syntactic and semantic error-checking, error recovery, and detailed reports containing cross-references to show where each variable was used. The listings summarize all optimizations, listing the objective functions, design variables, and constraints. The compiler offers error-checking specific to optimization problems, so that simple mistakes will not cost hours of debugging time. The optimization engine used by and included with the SOL compiler is a version of Vanderplatt's ADS system (Version 1.1) modified specifically to work with the SOL compiler. SOL allows the use of the over 100 ADS optimization choices such as Sequential Quadratic Programming, Modified Feasible Directions, interior and exterior penalty function and variable metric methods. Default choices of the many control parameters of ADS are made for the user, however, the user can override any of the ADS control parameters desired for each individual optimization. The SOL language and compiler were developed with an advanced compiler-generation system to ensure correctness and simplify program maintenance. Thus, SOL's syntax was defined precisely by a LALR(1) grammar and the SOL compiler's parser was generated automatically from the LALR(1) grammar with a parser-generator. Hence unlike ad hoc, manually coded interfaces, the SOL compiler's lexical analysis insures that the SOL compiler recognizes all legal SOL programs, can recover from and correct for many errors and report the location of errors to the user. This version of the SOL compiler has been implemented on VAX/VMS computer systems and requires 204 KB of virtual memory to execute. Since the SOL compiler produces FORTRAN code, it requires the VAX FORTRAN compiler to produce an executable program. The SOL compiler consists of 13,000 lines of Pascal code. It was developed in 1986 and last updated in 1988. The ADS and other utility subroutines amount to 14,000 lines of FORTRAN code and were also updated in 1988.

  8. Adding a Medical Lexicon to an English Parser

    PubMed Central

    Szolovits, Peter

    2003-01-01

    We present a heuristic method to map lexical (syntactic) information from one lexicon to another, and apply the technique to augment the lexicon of the Link Grammar Parser with an enormous medical vocabulary drawn from the Specialist lexicon developed by the National Library of Medicine. This paper presents and justifies the mapping method and addresses technical problems that have to be overcome. It illustrates the utility of the method with respect to a large corpus of emergency department notes. PMID:14728251

  9. A study of actions in operative notes.

    PubMed

    Wang, Yan; Pakhomov, Serguei; Burkart, Nora E; Ryan, James O; Melton, Genevieve B

    2012-01-01

    Operative notes contain rich information about techniques, instruments, and materials used in procedures. To assist development of effective information extraction (IE) techniques for operative notes, we investigated the sublanguage used to describe actions within the operative report 'procedure description' section. Deep parsing results of 362,310 operative notes with an expanded Stanford parser using the SPECIALIST Lexicon resulted in 200 verbs (92% coverage) including 147 action verbs. Nominal action predicates for each action verb were gathered from WordNet, SPECIALIST Lexicon, New Oxford American Dictionary and Stedman's Medical Dictionary. Coverage gaps were seen in existing lexical, domain, and semantic resources (Unified Medical Language System (UMLS) Metathesaurus, SPECIALIST Lexicon, WordNet and FrameNet). Our findings demonstrate the need to construct surgical domain-specific semantic resources for IE from operative notes.

  10. Software Development Of XML Parser Based On Algebraic Tools

    NASA Astrophysics Data System (ADS)

    Georgiev, Bozhidar; Georgieva, Adriana

    2011-12-01

    In this paper, is presented one software development and implementation of an algebraic method for XML data processing, which accelerates XML parsing process. Therefore, the proposed in this article nontraditional approach for fast XML navigation with algebraic tools contributes to advanced efforts in the making of an easier user-friendly API for XML transformations. Here the proposed software for XML documents processing (parser) is easy to use and can manage files with strictly defined data structure. The purpose of the presented algorithm is to offer a new approach for search and restructuring hierarchical XML data. This approach permits fast XML documents processing, using algebraic model developed in details in previous works of the same authors. So proposed parsing mechanism is easy accessible to the web consumer who is able to control XML file processing, to search different elements (tags) in it, to delete and to add a new XML content as well. The presented various tests show higher rapidity and low consumption of resources in comparison with some existing commercial parsers.

  11. Overview of the ArbiTER edge plasma eigenvalue code

    NASA Astrophysics Data System (ADS)

    Baver, Derek; Myra, James; Umansky, Maxim

    2011-10-01

    The Arbitrary Topology Equation Reader, or ArbiTER, is a flexible eigenvalue solver that is currently under development for plasma physics applications. The ArbiTER code builds on the equation parser framework of the existing 2DX code, extending it to include a topology parser. This will give the code the capability to model problems with complicated geometries (such as multiple X-points and scrape-off layers) or model equations with arbitrary numbers of dimensions (e.g. for kinetic analysis). In the equation parser framework, model equations are not included in the program's source code. Instead, an input file contains instructions for building a matrix from profile functions and elementary differential operators. The program then executes these instructions in a sequential manner. These instructions may also be translated into analytic form, thus giving the code transparency as well as flexibility. We will present an overview of how the ArbiTER code is to work, as well as preliminary results from early versions of this code. Work supported by the U.S. DOE.

  12. Archetype Model-Driven Development Framework for EHR Web System.

    PubMed

    Kobayashi, Shinji; Kimura, Eizen; Ishihara, Ken

    2013-12-01

    This article describes the Web application framework for Electronic Health Records (EHRs) we have developed to reduce construction costs for EHR sytems. The openEHR project has developed clinical model driven architecture for future-proof interoperable EHR systems. This project provides the specifications to standardize clinical domain model implementations, upon which the ISO/CEN 13606 standards are based. The reference implementation has been formally described in Eiffel. Moreover C# and Java implementations have been developed as reference. While scripting languages had been more popular because of their higher efficiency and faster development in recent years, they had not been involved in the openEHR implementations. From 2007, we have used the Ruby language and Ruby on Rails (RoR) as an agile development platform to implement EHR systems, which is in conformity with the openEHR specifications. We implemented almost all of the specifications, the Archetype Definition Language parser, and RoR scaffold generator from archetype. Although some problems have emerged, most of them have been resolved. We have provided an agile EHR Web framework, which can build up Web systems from archetype models using RoR. The feasibility of the archetype model to provide semantic interoperability of EHRs has been demonstrated and we have verified that that it is suitable for the construction of EHR systems.

  13. ULTRA: Universal Grammar as a Universal Parser

    PubMed Central

    Medeiros, David P.

    2018-01-01

    A central concern of generative grammar is the relationship between hierarchy and word order, traditionally understood as two dimensions of a single syntactic representation. A related concern is directionality in the grammar. Traditional approaches posit process-neutral grammars, embodying knowledge of language, put to use with infinite facility both for production and comprehension. This has crystallized in the view of Merge as the central property of syntax, perhaps its only novel feature. A growing number of approaches explore grammars with different directionalities, often with more direct connections to performance mechanisms. This paper describes a novel model of universal grammar as a one-directional, universal parser. Mismatch between word order and interpretation order is pervasive in comprehension; in the present model, word order is language-particular and interpretation order (i.e., hierarchy) is universal. These orders are not two dimensions of a unified abstract object (e.g., precedence and dominance in a single tree); rather, both are temporal sequences, and UG is an invariant real-time procedure (based on Knuth's stack-sorting algorithm) transforming word order into hierarchical order. This shift in perspective has several desirable consequences. It collapses linearization, displacement, and composition into a single performance process. The architecture provides a novel source of brackets (labeled unambiguously and without search), which are understood not as part-whole constituency relations, but as storage and retrieval routines in parsing. It also explains why neutral word order within single syntactic cycles avoids 213-like permutations. The model identifies cycles as extended projections of lexical heads, grounding the notion of phase. This is achieved with a universal processor, dispensing with parameters. The empirical focus is word order in noun phrases. This domain provides some of the clearest evidence for 213-avoidance as a cross-linguistic word order generalization. Importantly, recursive phrase structure “bottoms out” in noun phrases, which are typically a single cycle (though further cycles may be embedded, e.g., relative clauses). By contrast, a simple transitive clause plausibly involves two cycles (vP and CP), embedding further nominal cycles. In the present theory, recursion is fundamentally distinct from structure-building within a single cycle, and different word order restrictions might emerge in larger domains like clauses. PMID:29497394

  14. Detecting modification of biomedical events using a deep parsing approach.

    PubMed

    Mackinlay, Andrew; Martinez, David; Baldwin, Timothy

    2012-04-30

    This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser. To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm. Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features. Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.

  15. [The role of animacy in European Portuguese relative clause attachment: evidence from production and comprehension tasks].

    PubMed

    Soares, Ana Paula; Fraga, Isabel; Comesaña, Montserrat; Piñeiro, Ana

    2010-11-01

    This work presents an analysis of the role of animacy in attachment preferences of relative clauses to complex noun phrases in European Portuguese (EP). The study of how the human parser solves this kind of syntactic ambiguities has been focus of extensive research. However, what is known about EP is both limited and puzzling. Additionally, as recent studies have stressed the importance of extra-syntactic variables in this process, two experiments were carried out to assess EP attachment preferences considering four animacy conditions: Study 1 used a sentence-completion-task, and Study 2 a self-paced reading task. Both studies indicate a significant preference for high attachment in EP. Furthermore, they showed that this preference was modulated by the animacy of the host NP: if the first host was inanimate and the second one was animate, the parser's preference changed to low attachment preference. These findings shed light on previous results regarding EP and strengthen the idea that, even in early stages of processing, the parser seems to be sensitive to extra-syntactic information.

  16. Synonym set extraction from the biomedical literature by lexical pattern discovery.

    PubMed

    McCrae, John; Collier, Nigel

    2008-03-24

    Although there are a large number of thesauri for the biomedical domain many of them lack coverage in terms and their variant forms. Automatic thesaurus construction based on patterns was first suggested by Hearst 1, but it is still not clear how to automatically construct such patterns for different semantic relations and domains. In particular it is not certain which patterns are useful for capturing synonymy. The assumption of extant resources such as parsers is also a limiting factor for many languages, so it is desirable to find patterns that do not use syntactical analysis. Finally to give a more consistent and applicable result it is desirable to use these patterns to form synonym sets in a sound way. We present a method that automatically generates regular expression patterns by expanding seed patterns in a heuristic search and then develops a feature vector based on the occurrence of term pairs in each developed pattern. This allows for a binary classifications of term pairs as synonymous or non-synonymous. We then model this result as a probability graph to find synonym sets, which is equivalent to the well-studied problem of finding an optimal set cover. We achieved 73.2% precision and 29.7% recall by our method, out-performing hand-made resources such as MeSH and Wikipedia. We conclude that automatic methods can play a practical role in developing new thesauri or expanding on existing ones, and this can be done with only a small amount of training data and no need for resources such as parsers. We also concluded that the accuracy can be improved by grouping into synonym sets.

  17. SAGA: A project to automate the management of software production systems

    NASA Technical Reports Server (NTRS)

    Campbell, R. H.

    1983-01-01

    The current work in progress for the SAGA project are described. The highlights of this research are: a parser independent SAGA editor, design for the screen editing facilities of the editor, delivery to NASA of release 1 of Olorin, the SAGA parser generator, personal workstation environment research, release 1 of the SAGA symbol table manager, delta generation in SAGA, requirements for a proof management system, documentation for and testing of the cyber pascal make prototype, a prototype cyber-based slicing facility, a June 1984 demonstration plan, SAGA utility programs, summary of UNIX software engineering support, and theorem prover review.

  18. Detecting modification of biomedical events using a deep parsing approach

    PubMed Central

    2012-01-01

    Background This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser. Method To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm. Results Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features. Conclusions Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification. PMID:22595089

  19. Archetype Model-Driven Development Framework for EHR Web System

    PubMed Central

    Kimura, Eizen; Ishihara, Ken

    2013-01-01

    Objectives This article describes the Web application framework for Electronic Health Records (EHRs) we have developed to reduce construction costs for EHR sytems. Methods The openEHR project has developed clinical model driven architecture for future-proof interoperable EHR systems. This project provides the specifications to standardize clinical domain model implementations, upon which the ISO/CEN 13606 standards are based. The reference implementation has been formally described in Eiffel. Moreover C# and Java implementations have been developed as reference. While scripting languages had been more popular because of their higher efficiency and faster development in recent years, they had not been involved in the openEHR implementations. From 2007, we have used the Ruby language and Ruby on Rails (RoR) as an agile development platform to implement EHR systems, which is in conformity with the openEHR specifications. Results We implemented almost all of the specifications, the Archetype Definition Language parser, and RoR scaffold generator from archetype. Although some problems have emerged, most of them have been resolved. Conclusions We have provided an agile EHR Web framework, which can build up Web systems from archetype models using RoR. The feasibility of the archetype model to provide semantic interoperability of EHRs has been demonstrated and we have verified that that it is suitable for the construction of EHR systems. PMID:24523991

  20. Transformation as a Design Process and Runtime Architecture for High Integrity Software

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bespalko, S.J.; Winter, V.L.

    1999-04-05

    We have discussed two aspects of creating high integrity software that greatly benefit from the availability of transformation technology, which in this case is manifest by the requirement for a sophisticated backtracking parser. First, because of the potential for correctly manipulating programs via small changes, an automated non-procedural transformation system can be a valuable tool for constructing high assurance software. Second, modeling the processing of translating data into information as a, perhaps, context-dependent grammar leads to an efficient, compact implementation. From a practical perspective, the transformation process should begin in the domain language in which a problem is initially expressed.more » Thus in order for a transformation system to be practical it must be flexible with respect to domain-specific languages. We have argued that transformation applied to specification results in a highly reliable system. We also attempted to briefly demonstrate that transformation technology applied to the runtime environment will result in a safe and secure system. We thus believe that the sophisticated multi-lookahead backtracking parsing technology is central to the task of being in a position to demonstrate the existence of HIS.« less

  1. Syntactic Prediction in Language Comprehension: Evidence From Either…or

    PubMed Central

    Staub, Adrian; Clifton, Charles

    2006-01-01

    Readers’ eye movements were monitored as they read sentences in which two noun phrases or two independent clauses were connected by the word or (NP-coordination and S-coordination, respectively). The word either could be present or absent earlier in the sentence. When either was present, the material immediately following or was read more quickly, across both sentence types. In addition, there was evidence that readers misanalyzed the S-coordination structure as an NP-coordination structure only when either was absent. The authors interpret the results as indicating that the word either enabled readers to predict the arrival of a coordination structure; this predictive activation facilitated processing of this structure when it ultimately arrived, and in the case of S-coordination sentences, enabled readers to avoid the incorrect NP-coordination analysis. The authors argue that these results support parsing theories according to which the parser can build predictable syntactic structure before encountering the corresponding lexical input. PMID:16569157

  2. Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

    PubMed Central

    Diaz Acosta, B.

    2011-01-01

    The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.

  3. Model-based object classification using unification grammars and abstract representations

    NASA Astrophysics Data System (ADS)

    Liburdy, Kathleen A.; Schalkoff, Robert J.

    1993-04-01

    The design and implementation of a high level computer vision system which performs object classification is described. General object labelling and functional analysis require models of classes which display a wide range of geometric variations. A large representational gap exists between abstract criteria such as `graspable' and current geometric image descriptions. The vision system developed and described in this work addresses this problem and implements solutions based on a fusion of semantics, unification, and formal language theory. Object models are represented using unification grammars, which provide a framework for the integration of structure and semantics. A methodology for the derivation of symbolic image descriptions capable of interacting with the grammar-based models is described and implemented. A unification-based parser developed for this system achieves object classification by determining if the symbolic image description can be unified with the abstract criteria of an object model. Future research directions are indicated.

  4. The parser doesn't ignore intransitivity, after all

    PubMed Central

    Staub, Adrian

    2015-01-01

    Several previous studies (Adams, Clifton, & Mitchell, 1998; Mitchell, 1987; van Gompel & Pickering, 2001) have explored the question of whether the parser initially analyzes a noun phrase that follows an intransitive verb as the verb's direct object. Three eyetracking experiments examined this issue in more detail. Experiment 1 strongly replicated the finding (van Gompel & Pickering, 2001) that readers experience difficulty on this noun phrase in normal reading, and found that this difficulty occurs even with a class of intransitive verbs for which a direct object is categorically prohibited. Experiment 2, however, demonstrated that this effect is not due to syntactic misanalysis, but is instead due to disruption that occurs when a comma is absent at a subordinate clause/main clause boundary. Exploring a different construction, Experiment 3 replicated the finding (Pickering & Traxler, 2003; Traxler & Pickering, 1996) that when a noun phrase “filler” is an implausible direct object for an optionally transitive relative clause verb, processing difficulty results; however, there was no evidence for such difficulty when the relative clause verb was strictly intransitive. Taken together, the three experiments undermine the support for the claim that the parser initially ignores a verb's subcategorization restrictions. PMID:17470005

  5. How Architecture-Driven Modernization Is Changing the Game in Information System Modernization

    DTIC Science & Technology

    2010-04-01

    Health Administration MUMPS  to Java 300K 4 mo. State of OR Employee Retirement System COBOL to C# .Net 250K 4 mo. Civilian State of WA Off. of Super of...Jovial, Mumps , A MagnaX, Natural, B PVL, P owerBuilder, A SQL, Vax Basic, s V B 6, + Others E revolution, inc. C, Target System "To Be" C#, C...successfully completed in 4 months • Created a new JANUSTM MUMPS parser TM , Implementation • Final “To-Be” Documentation • JANUS rules engine

  6. The Mystro system: A comprehensive translator toolkit

    NASA Technical Reports Server (NTRS)

    Collins, W. R.; Noonan, R. E.

    1985-01-01

    Mystro is a system that facilities the construction of compilers, assemblers, code generators, query interpretors, and similar programs. It provides features to encourage the use of iterative enhancement. Mystro was developed in response to the needs of NASA Langley Research Center (LaRC) and enjoys a number of advantages over similar systems. There are other programs available that can be used in building translators. These typically build parser tables, usually supply the source of a parser and parts of a lexical analyzer, but provide little or no aid for code generation. In general, only the front end of the compiler is addressed. Mystro, on the other hand, emphasizes tools for both ends of a compiler.

  7. Parsley: a Command-Line Parser for Astronomical Applications

    NASA Astrophysics Data System (ADS)

    Deich, William

    Parsley is a sophisticated keyword + value parser, packaged as a library of routines that offers an easy method for providing command-line arguments to programs. It makes it easy for the user to enter values, and it makes it easy for the programmer to collect and validate the user's entries. Parsley is tuned for astronomical applications: for example, dates entered in Julian, Modified Julian, calendar, or several other formats are all recognized without special effort by the user or by the programmer; angles can be entered using decimal degrees or dd:mm:ss; time-like intervals as decimal hours, hh:mm:ss, or a variety of other units. Vectors of data are accepted as readily as scalars.

  8. Aural mapping of STEM concepts using literature mining

    NASA Astrophysics Data System (ADS)

    Bharadwaj, Venkatesh

    Recent technological applications have made the life of people too much dependent on Science, Technology, Engineering, and Mathematics (STEM) and its applications. Understanding basic level science is a must in order to use and contribute to this technological revolution. Science education in middle and high school levels however depends heavily on visual representations such as models, diagrams, figures, animations and presentations etc. This leaves visually impaired students with very few options to learn science and secure a career in STEM related areas. Recent experiments have shown that small aural clues called Audemes are helpful in understanding and memorization of science concepts among visually impaired students. Audemes are non-verbal sound translations of a science concept. In order to facilitate science concepts as Audemes, for visually impaired students, this thesis presents an automatic system for audeme generation from STEM textbooks. This thesis describes the systematic application of multiple Natural Language Processing tools and techniques, such as dependency parser, POS tagger, Information Retrieval algorithm, Semantic mapping of aural words, machine learning etc., to transform the science concept into a combination of atomic-sounds, thus forming an audeme. We present a rule based classification method for all STEM related concepts. This work also presents a novel way of mapping and extracting most related sounds for the words being used in textbook. Additionally, machine learning methods are used in the system to guarantee the customization of output according to a user's perception. The system being presented is robust, scalable, fully automatic and dynamically adaptable for audeme generation.

  9. Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents

    PubMed Central

    Mutalik, Pradeep G.; Deshpande, Aniruddha; Nadkarni, Prakash M.

    2001-01-01

    Objectives: To test the hypothesis that most instances of negated concepts in dictated medical documents can be detected by a strategy that relies on tools developed for the parsing of formal (computer) languages—specifically, a lexical scanner (“lexer”) that uses regular expressions to generate a finite state machine, and a parser that relies on a restricted subset of context-free grammars, known as LALR(1) grammars. Methods: A diverse training set of 40 medical documents from a variety of specialties was manually inspected and used to develop a program (Negfinder) that contained rules to recognize a large set of negated patterns occurring in the text. Negfinder's lexer and parser were developed using tools normally used to generate programming language compilers. The input to Negfinder consisted of medical narrative that was preprocessed to recognize UMLS concepts: the text of a recognized concept had been replaced with a coded representation that included its UMLS concept ID. The program generated an index with one entry per instance of a concept in the document, where the presence or absence of negation of that concept was recorded. This information was used to mark up the text of each document by color-coding it to make it easier to inspect. The parser was then evaluated in two ways: 1) a test set of 60 documents (30 discharge summaries, 30 surgical notes) marked-up by Negfinder was inspected visually to quantify false-positive and false-negative results; and 2) a different test set of 10 documents was independently examined for negatives by a human observer and by Negfinder, and the results were compared. Results: In the first evaluation using marked-up documents, 8,358 instances of UMLS concepts were detected in the 60 documents, of which 544 were negations detected by the program and verified by human observation (true-positive results, or TPs). Thirteen instances were wrongly flagged as negated (false-positive results, or FPs), and the program missed 27 instances of negation (false-negative results, or FNs), yielding a sensitivity of 95.3 percent and a specificity of 97.7 percent. In the second evaluation using independent negation detection, 1,869 concepts were detected in 10 documents, with 135 TPs, 12 FPs, and 6 FNs, yielding a sensitivity of 95.7 percent and a specificity of 91.8 percent. One of the words “no,” “denies/denied,” “not,” or “without” was present in 92.5 percent of all negations. Conclusions: Negation of most concepts in medical narrative can be reliably detected by a simple strategy. The reliability of detection depends on several factors, the most important being the accuracy of concept matching. PMID:11687566

  10. Recognition of speaker-dependent continuous speech with KEAL

    NASA Astrophysics Data System (ADS)

    Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

    1989-04-01

    A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.

  11. DICOM index tracker enterprise: advanced system for enterprise-wide quality assurance and patient safety monitoring

    NASA Astrophysics Data System (ADS)

    Zhang, Min; Pavlicek, William; Panda, Anshuman; Langer, Steve G.; Morin, Richard; Fetterly, Kenneth A.; Paden, Robert; Hanson, James; Wu, Lin-Wei; Wu, Teresa

    2015-03-01

    DICOM Index Tracker (DIT) is an integrated platform to harvest rich information available from Digital Imaging and Communications in Medicine (DICOM) to improve quality assurance in radiology practices. It is designed to capture and maintain longitudinal patient-specific exam indices of interests for all diagnostic and procedural uses of imaging modalities. Thus, it effectively serves as a quality assurance and patient safety monitoring tool. The foundation of DIT is an intelligent database system which stores the information accepted and parsed via a DICOM receiver and parser. The database system enables the basic dosimetry analysis. The success of DIT implementation at Mayo Clinic Arizona calls for the DIT deployment at the enterprise level which requires significant improvements. First, for geographically distributed multi-site implementation, the first bottleneck is the communication (network) delay; the second is the scalability of the DICOM parser to handle the large volume of exams from different sites. To address this issue, DICOM receiver and parser are separated and decentralized by site. To facilitate the enterprise wide Quality Assurance (QA), a notable challenge is the great diversities of manufacturers, modalities and software versions, as the solution DIT Enterprise provides the standardization tool for device naming, protocol naming, physician naming across sites. Thirdly, advanced analytic engines are implemented online which support the proactive QA in DIT Enterprise.

  12. The power and limits of a rule-based morpho-semantic parser.

    PubMed Central

    Baud, R. H.; Rassinoux, A. M.; Ruch, P.; Lovis, C.; Scherrer, J. R.

    1999-01-01

    The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors. PMID:10566313

  13. The power and limits of a rule-based morpho-semantic parser.

    PubMed

    Baud, R H; Rassinoux, A M; Ruch, P; Lovis, C; Scherrer, J R

    1999-01-01

    The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors.

  14. A search engine to access PubMed monolingual subsets: proof of concept and evaluation in French.

    PubMed

    Griffon, Nicolas; Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-12-01

    PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.

  15. A Search Engine to Access PubMed Monolingual Subsets: Proof of Concept and Evaluation in French

    PubMed Central

    Schuers, Matthieu; Soualmia, Lina Fatima; Grosjean, Julien; Kerdelhué, Gaétan; Kergourlay, Ivan; Dahamna, Badisse; Darmoni, Stéfan Jacques

    2014-01-01

    Background PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. Objective The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). Methods To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. Results More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. Conclusions It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese. PMID:25448528

  16. Incremental Refinement of FAÇADE Models with Attribute Grammar from 3d Point Clouds

    NASA Astrophysics Data System (ADS)

    Dehbi, Y.; Staat, C.; Mandtler, L.; Pl¨umer, L.

    2016-06-01

    Data acquisition using unmanned aerial vehicles (UAVs) has gotten more and more attention over the last years. Especially in the field of building reconstruction the incremental interpretation of such data is a demanding task. In this context formal grammars play an important role for the top-down identification and reconstruction of building objects. Up to now, the available approaches expect offline data in order to parse an a-priori known grammar. For mapping on demand an on the fly reconstruction based on UAV data is required. An incremental interpretation of the data stream is inevitable. This paper presents an incremental parser of grammar rules for an automatic 3D building reconstruction. The parser enables a model refinement based on new observations with respect to a weighted attribute context-free grammar (WACFG). The falsification or rejection of hypotheses is supported as well. The parser can deal with and adapt available parse trees acquired from previous interpretations or predictions. Parse trees derived so far are updated in an iterative way using transformation rules. A diagnostic step searches for mismatches between current and new nodes. Prior knowledge on façades is incorporated. It is given by probability densities as well as architectural patterns. Since we cannot always assume normal distributions, the derivation of location and shape parameters of building objects is based on a kernel density estimation (KDE). While the level of detail is continuously improved, the geometrical, semantic and topological consistency is ensured.

  17. Neuroanatomical term generation and comparison between two terminologies.

    PubMed

    Srinivas, Prashanti R; Gusfield, Daniel; Mason, Oliver; Gertz, Michael; Hogarth, Michael; Stone, James; Jones, Edward G; Gorin, Fredric A

    2003-01-01

    An approach and software tools are described for identifying and extracting compound terms (CTs), acronyms and their associated contexts from textual material that is associated with neuroanatomical atlases. A set of simple syntactic rules were appended to the output of a commercially available part of speech (POS) tagger (Qtag v 3.01) that extracts CTs and their associated context from the texts of neuroanatomical atlases. This "hybrid" parser. appears to be highly sensitive and recognized 96% of the potentially germane neuroanatomical CTs and acronyms present in the cat and primate thalamic atlases. A comparison of neuroanatomical CTs and acronymsbetween the cat and primate atlas texts was initially performed using exact-term matching. The implementation of string-matching algorithms significantly improved the identification of relevant terms and acronyms between the two domains. The End Gap Free string matcher identified 98% of CTs and the Needleman Wunsch (NW) string matcher matched 36% of acronyms between the two atlases. Combining several simple grammatical and lexical rules with the POS tagger ("hybrid parser") (1) extracted complex neuroanatomical terms and acronyms from selected cat and primate thalamic atlases and (2) and facilitated the semi-automated generation of a highly granular thalamic terminology. The implementation of string-matching algorithms (1) reconciled terminological errors generated by optical character recognition (OCR) software used to generate the neuroanatomical text information and (2) increased the sensitivity of matching neuroanatomical terms and acronyms between the two neuroanatomical domains that were generated by the "hybrid" parser.

  18. Discovery of Predicate-Oriented Relations among Named Entities Extracted from Thai Texts

    NASA Astrophysics Data System (ADS)

    Tongtep, Nattapong; Theeramunkong, Thanaruk

    Extracting named entities (NEs) and their relations is more difficult in Thai than in other languages due to several Thai specific characteristics, including no explicit boundaries for words, phrases and sentences; few case markers and modifier clues; high ambiguity in compound words and serial verbs; and flexible word orders. Unlike most previous works which focused on NE relations of specific actions, such as work_for, live_in, located_in, and kill, this paper proposes more general types of NE relations, called predicate-oriented relation (PoR), where an extracted action part (verb) is used as a core component to associate related named entities extracted from Thai Texts. Lacking a practical parser for the Thai language, we present three types of surface features, i.e. punctuation marks (such as token spaces), entity types and the number of entities and then apply five alternative commonly used learning schemes to investigate their performance on predicate-oriented relation extraction. The experimental results show that our approach achieves the F-measure of 97.76%, 99.19%, 95.00% and 93.50% on four different types of predicate-oriented relation (action-location, location-action, action-person and person-action) in crime-related news documents using a data set of 1,736 entity pairs. The effects of NE extraction techniques, feature sets and class unbalance on the performance of relation extraction are explored.

  19. FLIP for FLAG model visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wooten, Hasani Omar

    A graphical user interface has been developed for FLAG users. FLIP (FLAG Input deck Parser) provides users with an organized view of FLAG models and a means for efficiently and easily navigating and editing nodes, parameters, and variables.

  20. Python/Lua Benchmarks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Busby, L.

    This is an adaptation of the pre-existing Scimark benchmark code to a variety of Python and Lua implementations. It also measures performance of the Fparser expression parser and C and C++ code on a variety of simple scientific expressions.

  1. Fortran for the nineties

    NASA Technical Reports Server (NTRS)

    Himer, J. T.

    1992-01-01

    Fortran has largely enjoyed prominence for the past few decades as the computer programming language of choice for numerically intensive scientific, engineering, and process control applications. Fortran's well understood static language syntax has allowed resulting parsers and compiler optimizing technologies to often generate among the most efficient and fastest run-time executables, particularly on high-end scalar and vector supercomputers. Computing architectures and paradigms have changed considerably since the last ANSI/ISO Fortran release in 1978, and while FORTRAN 77 has more than survived, it's aged features provide only partial functionality for today's demanding computing environments. The simple block procedural languages have been necessarily evolving, or giving way, to specialized supercomputing, network resource, and object-oriented paradigms. To address these new computing demands, ANSI has worked for the last 12-years with three international public reviews to deliver Fortran 90. Fortran 90 has superseded and replaced ISO FORTRAN 77 internationally as the sole Fortran standard; while in the US, Fortran 90 is expected to be adopted as the ANSI standard this summer, coexisting with ANSI FORTRAN 77 until at least 1996. The development path and current state of Fortran will be briefly described highlighting the many new Fortran 90 syntactic and semantic additions which support (among others): free form source; array syntax; new control structures; modules and interfaces; pointers; derived data types; dynamic memory; enhanced I/O; operator overloading; data abstraction; user optional arguments; new intrinsics for array, bit manipulation, and system inquiry; and enhanced portability through better generic control of underlying system arithmetic models. Examples from dynamical astronomy, signal and image processing will attempt to illustrate Fortran 90's applicability to today's general scalar, vector, and parallel scientific and engineering requirements and object oriented programming paradigms. Time permitting, current work proceeding on the future development of Fortran 2000 and collateral standards will be introduced.

  2. Vulnerabilities in Bytecode Removed by Analysis, Nuanced Confinement and Diversification (VIBRANCE)

    DTIC Science & Technology

    2015-06-01

    VIBRANCE tool starts with a vulnerable Java application and automatically hardens it against SQL injection, OS command injection, file path traversal...7 2.2 Java Front End...7 2.2.2 Java Byte Code Parser

  3. Specification, Design, and Analysis of Advanced HUMS Architectures

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    2004-01-01

    During the two-year project period, we have worked on several aspects of domain-specific architectures for HUMS. In particular, we looked at using scenario-based approach for the design and designed a language for describing such architectures. The language is now being used in all aspects of our HUMS design. In particular, we have made contributions in the following areas. 1) We have employed scenarios in the development of HUMS in three main areas. They are: (a) To improve reusability by using scenarios as a library indexing tool and as a domain analysis tool; (b) To improve maintainability by recording design rationales from two perspectives - problem domain and solution domain; (c) To evaluate the software architecture. 2) We have defined a new architectural language called HADL or HUMS Architectural Definition Language. It is a customized version of xArch/xADL. It is based on XML and, hence, is easily portable from domain to domain, application to application, and machine to machine. Specifications written in HADL can be easily read and parsed using the currently available XML parsers. Thus, there is no need to develop a plethora of software to support HADL. 3) We have developed an automated design process that involves two main techniques: (a) Selection of solutions from a large space of designs; (b) Synthesis of designs. However, the automation process is not an absolute Artificial Intelligence (AI) approach though it uses a knowledge-based system that epitomizes a specific HUMS domain. The process uses a database of solutions as an aid to solve the problems rather than creating a new design in the literal sense. Since searching is adopted as the main technique, the challenges involved are: (a) To minimize the effort in searching the database where a very large number of possibilities exist; (b) To develop representations that could conveniently allow us to depict design knowledge evolved over many years; (c) To capture the required information that aid the automation process.

  4. chemf: A purely functional chemistry toolkit.

    PubMed

    Höck, Stefan; Riedl, Rainer

    2012-12-20

    Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project.

  5. chemf: A purely functional chemistry toolkit

    PubMed Central

    2012-01-01

    Background Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. Results We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. Conclusions We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project. PMID:23253942

  6. Knowledge Acquisition and Management for the NASA Earth Exchange (NEX)

    NASA Astrophysics Data System (ADS)

    Votava, P.; Michaelis, A.; Nemani, R. R.

    2013-12-01

    NASA Earth Exchange (NEX) is a data, computing and knowledge collaboratory that houses NASA satellite, climate and ancillary data where a focused community can come together to share modeling and analysis codes, scientific results, knowledge and expertise on a centralized platform with access to large supercomputing resources. As more and more projects are being executed on NEX, we are increasingly focusing on capturing the knowledge of the NEX users and provide mechanisms for sharing it with the community in order to facilitate reuse and accelerate research. There are many possible knowledge contributions to NEX, it can be a wiki entry on the NEX portal contributed by a developer, information extracted from a publication in an automated way, or a workflow captured during code execution on the supercomputing platform. The goal of the NEX knowledge platform is to capture and organize this information and make it easily accessible to the NEX community and beyond. The knowledge acquisition process consists of three main faucets - data and metadata, workflows and processes, and web-based information. Once the knowledge is acquired, it is processed in a number of ways ranging from custom metadata parsers to entity extraction using natural language processing techniques. The processed information is linked with existing taxonomies and aligned with internal ontology (which heavily reuses number of external ontologies). This forms a knowledge graph that can then be used to improve users' search query results as well as provide additional analytics capabilities to the NEX system. Such a knowledge graph will be an important building block in creating a dynamic knowledge base for the NEX community where knowledge is both generated and easily shared.

  7. Dependency-based Siamese long short-term memory network for learning sentence representations

    PubMed Central

    Zhu, Wenhao; Ni, Jianyue; Wei, Baogang; Lu, Zhiguo

    2018-01-01

    Textual representations play an important role in the field of natural language processing (NLP). The efficiency of NLP tasks, such as text comprehension and information extraction, can be significantly improved with proper textual representations. As neural networks are gradually applied to learn the representation of words and phrases, fairly efficient models of learning short text representations have been developed, such as the continuous bag of words (CBOW) and skip-gram models, and they have been extensively employed in a variety of NLP tasks. Because of the complex structure generated by the longer text lengths, such as sentences, algorithms appropriate for learning short textual representations are not applicable for learning long textual representations. One method of learning long textual representations is the Long Short-Term Memory (LSTM) network, which is suitable for processing sequences. However, the standard LSTM does not adequately address the primary sentence structure (subject, predicate and object), which is an important factor for producing appropriate sentence representations. To resolve this issue, this paper proposes the dependency-based LSTM model (D-LSTM). The D-LSTM divides a sentence representation into two parts: a basic component and a supporting component. The D-LSTM uses a pre-trained dependency parser to obtain the primary sentence information and generate supporting components, and it also uses a standard LSTM model to generate the basic sentence components. A weight factor that can adjust the ratio of the basic and supporting components in a sentence is introduced to generate the sentence representation. Compared with the representation learned by the standard LSTM, the sentence representation learned by the D-LSTM contains a greater amount of useful information. The experimental results show that the D-LSTM is superior to the standard LSTM for sentences involving compositional knowledge (SICK) data. PMID:29513748

  8. Wh-filler-gap dependency formation guides reflexive antecedent search

    PubMed Central

    Frazier, Michael; Ackerman, Lauren; Baumann, Peter; Potter, David; Yoshida, Masaya

    2015-01-01

    Prior studies on online sentence processing have shown that the parser can resolve non-local dependencies rapidly and accurately. This study investigates the interaction between the processing of two such non-local dependencies: wh-filler-gap dependencies (WhFGD) and reflexive-antecedent dependencies. We show that reflexive-antecedent dependency resolution is sensitive to the presence of a WhFGD, and argue that the filler-gap dependency established by WhFGD resolution is selected online as the antecedent of a reflexive dependency. We investigate the processing of constructions like (1), where two NPs might be possible antecedents for the reflexive, namely which cowgirl and Mary. Even though Mary is linearly closer to the reflexive, the only grammatically licit antecedent for the reflexive is the more distant wh-NP, which cowgirl. (1). Which cowgirl did Mary expect to have injured herself due to negligence? Four eye-tracking text-reading experiments were conducted on examples like (1), differing in whether the embedded clause was non-finite (1 and 3) or finite (2 and 4), and in whether the tail of the wh-dependency intervened between the reflexive and its closest overt antecedent (1 and 2) or the wh-dependency was associated with a position earlier in the sentence (3 and 4). The results of Experiments 1 and 2 indicate the parser accesses the result of WhFGD formation during reflexive antecedent search. The resolution of a wh-dependency alters the representation that reflexive antecedent search operates over, allowing the grammatical but linearly distant antecedent to be accessed rapidly. In the absence of a long-distance WhFGD (Experiments 3 and 4), wh-NPs were not found to impact reading times of the reflexive, indicating that the parser's ability to select distant wh-NPs as reflexive antecedents crucially involves syntactic structure. PMID:26500579

  9. Construction of a menu-based system

    NASA Technical Reports Server (NTRS)

    Noonan, R. E.; Collins, W. R.

    1985-01-01

    The development of the user interface to a software code management system is discussed. The user interface was specified using a grammar and implemented using a LR parser generator. This was found to be an effective method for the rapid prototyping of a menu based system.

  10. Mention Detection: Heuristics for the OntoNotes Annotations

    DTIC Science & Technology

    2011-01-01

    Mention Detection: Heuristics for the OntoNotes annotations Jonathan K. Kummerfeld, Mohit Bansal, David Burkett and Dan Klein Computer Science...considered the provided parses and parses produced by the Berke - ley parser (Petrov et al., 2006) trained on the pro- vided training data. We added a

  11. Natural Language Processing.

    ERIC Educational Resources Information Center

    Chowdhury, Gobinda G.

    2003-01-01

    Discusses issues related to natural language processing, including theoretical developments; natural language understanding; tools and techniques; natural language text processing systems; abstracting; information extraction; information retrieval; interfaces; software; Internet, Web, and digital library applications; machine translation for…

  12. The Effect of Syntactic Constraints on the Processing of Backwards Anaphora

    ERIC Educational Resources Information Center

    Kazanina, Nina; Lau, Ellen F.; Lieberman, Moti; Yoshida, Masaya; Phillips, Colin

    2007-01-01

    This article presents three studies that investigate when syntactic constraints become available during the processing of long-distance backwards pronominal dependencies ("backwards anaphora" or "cataphora"). Earlier work demonstrated that in such structures the parser initiates an active search for an antecedent for a pronoun, leading to gender…

  13. Brain Responses to Filled Gaps

    ERIC Educational Resources Information Center

    Hestvik, Arild; Maxfield, Nathan; Schwartz, Richard G.; Shafer, Valerie

    2007-01-01

    An unresolved issue in the study of sentence comprehension is whether the process of gap-filling is mediated by the construction of empty categories (traces), or whether the parser relates fillers directly to the associated verb's argument structure. We conducted an event-related potentials (ERP) study that used the violation paradigm to examine…

  14. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, R.C.

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese's group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a groupmore » of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.« less

  15. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese`s group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a groupmore » of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.« less

  16. Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.

    PubMed

    Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles

    2017-04-01

    The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . ibalaur@eisbm.org. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  17. Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks

    PubMed Central

    Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J.; Auffray, Charles

    2017-01-01

    Abstract Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/. The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework. Contact: ibalaur@eisbm.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27993779

  18. Structural syntactic prediction measured with ELAN: evidence from ERPs.

    PubMed

    Fonteneau, Elisabeth

    2013-02-08

    The current study used event-related potentials (ERPs) to investigate how and when argument structure information is used during the processing of sentences with a filler-gap dependency. We hypothesize that one specific property - animacy (living vs. non-living) - is used by the parser during the building of the syntactic structure. Participants heard sentences that were rated off-line as having an expected noun (Who did the Lion King chase the caravan with?) or an unexpected noun (Who did Lion King chase the animal with?). This prediction is based on the animacy properties relation between the wh-word and the noun in the object position. ERPs from the noun in the unexpected condition (animal) elicited a typical Early Left Anterior Negativity (ELAN)/P600 complex compared to the noun in the expected condition (caravan). Firstly, these results demonstrate that the ELAN reflects not only grammatical category violation but also animacy property expectations in filler-gap dependency. Secondly, our data suggests that the language comprehension system is able to make detailed predictions about aspects of the upcoming words to build up the syntactic structure. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  19. Activate/Inhibit KGCS Gateway via Master Console EIC Pad-B Display

    NASA Technical Reports Server (NTRS)

    Ferreira, Pedro Henrique

    2014-01-01

    My internship consisted of two major projects for the Launch Control System.The purpose of the first project was to implement the Application Control Language (ACL) to Activate Data Acquisition (ADA) and to Inhibit Data Acquisition (IDA) the Kennedy Ground Control Sub-Systems (KGCS) Gateway, to update existing Pad-B End Item Control (EIC) Display to program the ADA and IDA buttons with new ACL, and to test and release the ACL Display.The second project consisted of unit testing all of the Application Services Framework (ASF) by March 21st. The XmlFileReader was unit tested and reached 100 coverage. The XmlFileReader class is used to grab information from XML files and use them to initialize elements in the other framework elements by using the Xerces C++ XML Parser; which is open source commercial off the shelf software. The ScriptThread was also tested. ScriptThread manages the creation and activation of script threads. A large amount of the time was used in initializing the environment and learning how to set up unit tests and getting familiar with the specific segments of the project that were assigned to us.

  20. Language evolution and human-computer interaction

    NASA Technical Reports Server (NTRS)

    Grudin, Jonathan; Norman, Donald A.

    1991-01-01

    Many of the issues that confront designers of interactive computer systems also appear in natural language evolution. Natural languages and human-computer interfaces share as their primary mission the support of extended 'dialogues' between responsive entities. Because in each case one participant is a human being, some of the pressures operating on natural languages, causing them to evolve in order to better support such dialogue, also operate on human-computer 'languages' or interfaces. This does not necessarily push interfaces in the direction of natural language - since one entity in this dialogue is not a human, this is not to be expected. Nonetheless, by discerning where the pressures that guide natural language evolution also appear in human-computer interaction, we can contribute to the design of computer systems and obtain a new perspective on natural languages.

  1. Local anaphor licensing in an SOV language: implications for retrieval strategies

    PubMed Central

    Kush, Dave; Phillips, Colin

    2014-01-01

    Because morphological and syntactic constraints govern the distribution of potential antecedents for local anaphors, local antecedent retrieval might be expected to make equal use of both syntactic and morphological cues. However, previous research (e.g., Dillon et al., 2013) has shown that local antecedent retrieval is not susceptible to the same morphological interference effects observed during the resolution of morphologically-driven grammatical dependencies, such as subject-verb agreement checking (e.g., Pearlmutter et al., 1999). Although this lack of interference has been taken as evidence that syntactic cues are given priority over morphological cues in local antecedent retrieval, the absence of interference could also be the result of a confound in the materials used: the post-verbal position of local anaphors in prior studies may obscure morphological interference that would otherwise be visible if the critical anaphor were in a different position. We investigated the licensing of local anaphors (reciprocals) in Hindi, an SOV language, in order to determine whether pre-verbal anaphors are subject to morphological interference from feature-matching distractors in a way that post-verbal anaphors are not. Computational simulations using a version of the ACT-R parser (Lewis and Vasishth, 2005) predicted that a feature-matching distractor should facilitate the processing of an unlicensed reciprocal if morphological cues are used in antecedent retrieval. In a self-paced reading study we found no evidence that distractors eased processing of an unlicensed reciprocal. However, the presence of a distractor increased difficulty of processing following the reciprocal. We discuss the significance of these results for theories of cue selection in retrieval. PMID:25414680

  2. Natural Language Processing: Toward Large-Scale, Robust Systems.

    ERIC Educational Resources Information Center

    Haas, Stephanie W.

    1996-01-01

    Natural language processing (NLP) is concerned with getting computers to do useful things with natural language. Major applications include machine translation, text generation, information retrieval, and natural language interfaces. Reviews important developments since 1987 that have led to advances in NLP; current NLP applications; and problems…

  3. The Effect of Semantic Transparency on the Processing of Morphologically Derived Words: Evidence from Decision Latencies and Event-Related Potentials

    ERIC Educational Resources Information Center

    Jared, Debra; Jouravlev, Olessia; Joanisse, Marc F.

    2017-01-01

    Decomposition theories of morphological processing in visual word recognition posit an early morpho-orthographic parser that is blind to semantic information, whereas parallel distributed processing (PDP) theories assume that the transparency of orthographic-semantic relationships influences processing from the beginning. To test these…

  4. Disfluencies along the Garden Path: Brain Electrophysiological Evidence of Disrupted Sentence Processing

    ERIC Educational Resources Information Center

    Maxfield, Nathan D.; Lyon, Justine M.; Silliman, Elaine R.

    2009-01-01

    Bailey and Ferreira (2003) hypothesized and reported behavioral evidence that disfluencies (filled and silent pauses) undesirably affect sentence processing when they appear before disambiguating verbs in Garden Path (GP) sentences. Disfluencies here cause the parser to "linger" on, and apparently accept as correct, an erroneous parse. Critically,…

  5. Evolution of the Generic Lock System at Jefferson Lab

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brian Bevins; Yves Roblin

    2003-10-13

    The Generic Lock system is a software framework that allows highly flexible feedback control of large distributed systems. It allows system operators to implement new feedback loops between arbitrary process variables quickly and with no disturbance to the underlying control system. Several different types of feedback loops are provided and more are being added. This paper describes the further evolution of the system since it was first presented at ICALEPCS 2001 and reports on two years of successful use in accelerator operations. The framework has been enhanced in several key ways. Multiple-input, multiple-output (MIMO) lock types have been added formore » accelerator orbit and energy stabilization. The general purpose Proportional-Integral-Derivative (PID) locks can now be tuned automatically. The generic lock server now makes use of the Proxy IOC (PIOC) developed at Jefferson Lab to allow the locks to be monitored from any EPICS Channel Access aware client. (Previously clients had to be Cdev aware.) The dependency on the Qt XML parser has been replaced with the freely available Xerces DOM parser from the Apache project.« less

  6. A Python library for FAIRer access and deposition to the Metabolomics Workbench Data Repository.

    PubMed

    Smelter, Andrey; Moseley, Hunter N B

    2018-01-01

    The Metabolomics Workbench Data Repository is a public repository of mass spectrometry and nuclear magnetic resonance data and metadata derived from a wide variety of metabolomics studies. The data and metadata for each study is deposited, stored, and accessed via files in the domain-specific 'mwTab' flat file format. In order to improve the accessibility, reusability, and interoperability of the data and metadata stored in 'mwTab' formatted files, we implemented a Python library and package. This Python package, named 'mwtab', is a parser for the domain-specific 'mwTab' flat file format, which provides facilities for reading, accessing, and writing 'mwTab' formatted files. Furthermore, the package provides facilities to validate both the format and required metadata elements of a given 'mwTab' formatted file. In order to develop the 'mwtab' package we used the official 'mwTab' format specification. We used Git version control along with Python unit-testing framework as well as continuous integration service to run those tests on multiple versions of Python. Package documentation was developed using sphinx documentation generator. The 'mwtab' package provides both Python programmatic library interfaces and command-line interfaces for reading, writing, and validating 'mwTab' formatted files. Data and associated metadata are stored within Python dictionary- and list-based data structures, enabling straightforward, 'pythonic' access and manipulation of data and metadata. Also, the package provides facilities to convert 'mwTab' files into a JSON formatted equivalent, enabling easy reusability of the data by all modern programming languages that implement JSON parsers. The 'mwtab' package implements its metadata validation functionality based on a pre-defined JSON schema that can be easily specialized for specific types of metabolomics studies. The library also provides a command-line interface for interconversion between 'mwTab' and JSONized formats in raw text and a variety of compressed binary file formats. The 'mwtab' package is an easy-to-use Python package that provides FAIRer utilization of the Metabolomics Workbench Data Repository. The source code is freely available on GitHub and via the Python Package Index. Documentation includes a 'User Guide', 'Tutorial', and 'API Reference'. The GitHub repository also provides 'mwtab' package unit-tests via a continuous integration service.

  7. Computational Natural Language Inference: Robust and Interpretable Question Answering

    ERIC Educational Resources Information Center

    Sharp, Rebecca Reynolds

    2017-01-01

    We address the challenging task of "computational natural language inference," by which we mean bridging two or more natural language texts while also providing an explanation of how they are connected. In the context of question answering (i.e., finding short answers to natural language questions), this inference connects the question…

  8. Anaphora and Logical Form: On Formal Meaning Representations for Natural Language. Technical Report No. 36.

    ERIC Educational Resources Information Center

    Nash-Webber, Bonnie; Reiter, Raymond

    This paper describes a computational approach to certain problems of anaphora in natural language and argues in favor of formal meaning representation languages (MRLs) for natural language. After presenting arguments in favor of formal meaning representation languages, appropriate MRLs are discussed. Minimal requirements include provisions for…

  9. Reading Orthographically Strange Nonwords: Modelling Backup Strategies in Reading

    ERIC Educational Resources Information Center

    Perry, Conrad

    2018-01-01

    The latest version of the connectionist dual process model of reading (CDP++.parser) was tested on a set of nonwords, many of which were orthographically strange (e.g., PSIZ). A grapheme-by-grapheme read-out strategy was used because the normal strategy produced many poor responses. The new strategy allowed the model to produce results similar to…

  10. Working Memory in the Processing of Long-Distance Dependencies: Interference and Filler Maintenance

    ERIC Educational Resources Information Center

    Ness, Tal; Meltzer-Asscher, Aya

    2017-01-01

    During the temporal delay between the filler and gap sites in long-distance dependencies, the "active filler" strategy can be implemented in two ways: the filler phrase can be actively maintained in working memory ("maintenance account"), or it can be retrieved only when the parser posits a gap ("retrieval account").…

  11. Research in Knowledge Representation for Natural Language Understanding.

    DTIC Science & Technology

    1984-09-01

    TYPE OF REPORT & PERIOO COVERED RESEARCH IN KNOWLEDGE REPRESENTATION Annual Report FOR NATURAL LANGUAGE UNDERSTANDING 9/1/83 - 8/31/84 S. PERFORMING...nhaber) Artificial intelligence, natural language understanding , knowledge representation, semantics, semantic networks, KL-TWO, NIKL, belief and...attempting to understand and react to a complex, evolving situation. This report summarizes our research in knowledge representation and natural language

  12. Ideas on Learning a New Language Intertwined with the Current State of Natural Language Processing and Computational Linguistics

    ERIC Educational Resources Information Center

    Snyder, Robin M.

    2015-01-01

    In 2014, in conjunction with doing research in natural language processing and attending a global conference on computational linguistics, the author decided to learn a new foreign language, Greek, that uses a non-English character set. This paper/session will present/discuss an overview of the current state of natural language processing and…

  13. A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank.

    PubMed

    Smelter, Andrey; Astra, Morgan; Moseley, Hunter N B

    2017-03-17

    The Biological Magnetic Resonance Data Bank (BMRB) is a public repository of Nuclear Magnetic Resonance (NMR) spectroscopic data of biological macromolecules. It is an important resource for many researchers using NMR to study structural, biophysical, and biochemical properties of biological macromolecules. It is primarily maintained and accessed in a flat file ASCII format known as NMR-STAR. While the format is human readable, the size of most BMRB entries makes computer readability and explicit representation a practical requirement for almost any rigorous systematic analysis. To aid in the use of this public resource, we have developed a package called nmrstarlib in the popular open-source programming language Python. The nmrstarlib's implementation is very efficient, both in design and execution. The library has facilities for reading and writing both NMR-STAR version 2.1 and 3.1 formatted files, parsing them into usable Python dictionary- and list-based data structures, making access and manipulation of the experimental data very natural within Python programs (i.e. "saveframe" and "loop" records represented as individual Python dictionary data structures). Another major advantage of this design is that data stored in original NMR-STAR can be easily converted into its equivalent JavaScript Object Notation (JSON) format, a lightweight data interchange format, facilitating data access and manipulation using Python and any other programming language that implements a JSON parser/generator (i.e., all popular programming languages). We have also developed tools to visualize assigned chemical shift values and to convert between NMR-STAR and JSONized NMR-STAR formatted files. Full API Reference Documentation, User Guide and Tutorial with code examples are also available. We have tested this new library on all current BMRB entries: 100% of all entries are parsed without any errors for both NMR-STAR version 2.1 and version 3.1 formatted files. We also compared our software to three currently available Python libraries for parsing NMR-STAR formatted files: PyStarLib, NMRPyStar, and PyNMRSTAR. The nmrstarlib package is a simple, fast, and efficient library for accessing data from the BMRB. The library provides an intuitive dictionary-based interface with which Python programs can read, edit, and write NMR-STAR formatted files and their equivalent JSONized NMR-STAR files. The nmrstarlib package can be used as a library for accessing and manipulating data stored in NMR-STAR files and as a command-line tool to convert from NMR-STAR file format into its equivalent JSON file format and vice versa, and to visualize chemical shift values. Furthermore, the nmrstarlib implementation provides a guide for effectively JSONizing other older scientific formats, improving the FAIRness of data in these formats.

  14. Emerging Approach of Natural Language Processing in Opinion Mining: A Review

    NASA Astrophysics Data System (ADS)

    Kim, Tai-Hoon

    Natural language processing (NLP) is a subfield of artificial intelligence and computational linguistics. It studies the problems of automated generation and understanding of natural human languages. This paper outlines a framework to use computer and natural language techniques for various levels of learners to learn foreign languages in Computer-based Learning environment. We propose some ideas for using the computer as a practical tool for learning foreign language where the most of courseware is generated automatically. We then describe how to build Computer Based Learning tools, discuss its effectiveness, and conclude with some possibilities using on-line resources.

  15. An Overview of Computer-Based Natural Language Processing.

    ERIC Educational Resources Information Center

    Gevarter, William B.

    Computer-based Natural Language Processing (NLP) is the key to enabling humans and their computer-based creations to interact with machines using natural languages (English, Japanese, German, etc.) rather than formal computer languages. NLP is a major research area in the fields of artificial intelligence and computational linguistics. Commercial…

  16. Intelligent CAI: An Author Aid for a Natural Language Interface.

    ERIC Educational Resources Information Center

    Burton, Richard R.; Brown, John Seely

    This report addresses the problems of using natural language (English) as the communication language for advanced computer-based instructional systems. The instructional environment places requirements on a natural language understanding system that exceed the capabilities of all existing systems, including: (1) efficiency, (2) habitability, (3)…

  17. Conceptual Complexity and Apparent Contradictions in Mathematics Language

    ERIC Educational Resources Information Center

    Gough, John

    2007-01-01

    Mathematics is like a language, although technically it is not a natural or informal human language, but a formal, that is, artificially constructed language. Importantly, educators use their natural everyday language to teach the formal language of mathematics. At times, however, instructors encounter problems when the technical words they use,…

  18. What Is a Language?

    ERIC Educational Resources Information Center

    Le Page, R. B.

    A discussion on the nature of language argues the following: (1) the concept of a closed and finite rule system is inadequate for the description of natural languages; (2) as a consequence, the writing of variable rules to modify such rule systems so as to accommodate the properties of natural language is inappropriate; (3) the concept of such…

  19. Expressing Biomedical Ontologies in Natural Language for Expert Evaluation.

    PubMed

    Amith, Muhammad; Manion, Frank J; Harris, Marcelline R; Zhang, Yaoyun; Xu, Hua; Tao, Cui

    2017-01-01

    We report on a study of our custom Hootation software for the purposes of assessing its ability to produce clear and accurate natural language phrases from axioms embedded in three biomedical ontologies. Using multiple domain experts and three discrete rating scales, we evaluated the tool on clarity of the natural language produced, fidelity of the natural language produced from the ontology to the axiom, and the fidelity of the domain knowledge represented by the axioms. Results show that Hootation provided relatively clear natural language equivalents for a select set of OWL axioms, although the clarity of statements hinges on the accuracy and representation of axioms in the ontology.

  20. Predictive processing of novel compounds: evidence from Japanese.

    PubMed

    Hirose, Yuki; Mazuka, Reiko

    2015-03-01

    Our study argues that pre-head anticipatory processing operates at a level below the level of the sentence. A visual-world eye-tracking study demonstrated that, in processing of Japanese novel compounds, the compound structure can be constructed prior to the head if the prosodic information on the preceding modifier constituent signals that the Compound Accent Rule (CAR) is being applied. This prosodic cue rules out the single head analysis of the modifier noun, which would otherwise be a natural and economical choice. Once the structural representation for the head is computed in advance, the parser becomes faster in identifying the compound meaning. This poses a challenge to models maintaining that structural integration and word recognition are separate processes. At the same time, our results, together with previous findings, suggest the possibility that there is some degree of staging during the processing of different sources of information during the comprehension of compound nouns. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets.

    PubMed

    Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A

    2004-11-01

    RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html

  2. The Universal Parser and Interlanguage: Domain-Specific Mental Organization in the Comprehension of "Combien" Interrogatives in English-French Interlanguage.

    ERIC Educational Resources Information Center

    Dekydtspotter, Laurent

    2001-01-01

    From the perspective of Fodor's (1983) theory of mental organization and Chomsky's (1995) Minimalist theory of grammar, considers constraints on the interpretation of French-type and English-type cardinality interrogatives in the task of sentence comprehension, as a function of a universal parsing algorithm and hypotheses embodied in a French-type…

  3. jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats.

    PubMed

    Griss, Johannes; Reisinger, Florian; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2012-03-01

    We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Generating and Executing Complex Natural Language Queries across Linked Data.

    PubMed

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  5. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  6. A grammar-based semantic similarity algorithm for natural language sentences.

    PubMed

    Lee, Ming Che; Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.

  7. Do neural nets learn statistical laws behind natural language?

    PubMed

    Takahashi, Shuntaro; Tanaka-Ishii, Kumiko

    2017-01-01

    The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

  8. Do neural nets learn statistical laws behind natural language?

    PubMed Central

    Takahashi, Shuntaro

    2017-01-01

    The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks. PMID:29287076

  9. Automatic Item Generation via Frame Semantics: Natural Language Generation of Math Word Problems.

    ERIC Educational Resources Information Center

    Deane, Paul; Sheehan, Kathleen

    This paper is an exploration of the conceptual issues that have arisen in the course of building a natural language generation (NLG) system for automatic test item generation. While natural language processing techniques are applicable to general verbal items, mathematics word problems are particularly tractable targets for natural language…

  10. Linguistic Analysis of Natural Language Communication with Computers.

    ERIC Educational Resources Information Center

    Thompson, Bozena Henisz

    Interaction with computers in natural language requires a language that is flexible and suited to the task. This study of natural dialogue was undertaken to reveal those characteristics which can make computer English more natural. Experiments were made in three modes of communication: face-to-face, terminal-to-terminal, and human-to-computer,…

  11. New Ways to Learn a Foreign Language.

    ERIC Educational Resources Information Center

    Hall, Robert A., Jr.

    This text focuses on the nature of language learning in the light of modern linguistic analysis. Common linguistic problems encountered by students of eight major languages are examined--Latin, Greek, French, Spanish, Portuguese, Italian, German, and Russian. The text discusses the nature of language, building new language habits, overcoming…

  12. Applying language technology to nursing documents: pros and cons with a focus on ethics.

    PubMed

    Suominen, Hanna; Lehtikunnas, Tuija; Back, Barbro; Karsten, Helena; Salakoski, Tapio; Salanterä, Sanna

    2007-10-01

    The present study discusses ethics in building and using applications based on natural language processing in electronic nursing documentation. Specifically, we first focus on the question of how patient confidentiality can be ensured in developing language technology for the nursing documentation domain. Then, we identify and theoretically analyze the ethical outcomes which arise when using natural language processing to support clinical judgement and decision-making. In total, we put forward and justify 10 claims related to ethics in applying language technology to nursing documents. A review of recent scientific articles related to ethics in electronic patient records or in the utilization of large databases was conducted. Then, the results were compared with ethical guidelines for nurses and the Finnish legislation covering health care and processing of personal data. Finally, the practical experiences of the authors in applying the methods of natural language processing to nursing documents were appended. Patient records supplemented with natural language processing capabilities may help nurses give better, more efficient and more individualized care for their patients. In addition, language technology may facilitate patients' possibility to receive truthful information about their health and improve the nature of narratives. Because of these benefits, research about the use of language technology in narratives should be encouraged. In contrast, privacy-sensitive health care documentation brings specific ethical concerns and difficulties to the natural language processing of nursing documents. Therefore, when developing natural language processing tools, patient confidentiality must be ensured. While using the tools, health care personnel should always be responsible for the clinical judgement and decision-making. One should also consider that the use of language technology in nursing narratives may threaten patients' rights by using documentation collected for other purposes. Applying language technology to nursing documents may, on the one hand, contribute to the quality of care, but, on the other hand, threaten patient confidentiality. As an overall conclusion, natural language processing of nursing documents holds the promise of great benefits if the potential risks are taken into consideration.

  13. A Natural Language Interface Concordant with a Knowledge Base.

    PubMed

    Han, Yong-Jin; Park, Seong-Bae; Park, Se-Young

    2016-01-01

    The discordance between expressions interpretable by a natural language interface (NLI) system and those answerable by a knowledge base is a critical problem in the field of NLIs. In order to solve this discordance problem, this paper proposes a method to translate natural language questions into formal queries that can be generated from a graph-based knowledge base. The proposed method considers a subgraph of a knowledge base as a formal query. Thus, all formal queries corresponding to a concept or a predicate in the knowledge base can be generated prior to query time and all possible natural language expressions corresponding to each formal query can also be collected in advance. A natural language expression has a one-to-one mapping with a formal query. Hence, a natural language question is translated into a formal query by matching the question with the most appropriate natural language expression. If the confidence of this matching is not sufficiently high the proposed method rejects the question and does not answer it. Multipredicate queries are processed by regarding them as a set of collected expressions. The experimental results show that the proposed method thoroughly handles answerable questions from the knowledge base and rejects unanswerable ones effectively.

  14. Neurolinguistics and psycholinguistics as a basis for computer acquisition of natural language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Powers, D.M.W.

    1983-04-01

    Research into natural language understanding systems for computers has concentrated on implementing particular grammars and grammatical models of the language concerned. This paper presents a rationale for research into natural language understanding systems based on neurological and psychological principles. Important features of the approach are that it seeks to place the onus of learning the language on the computer, and that it seeks to make use of the vast wealth of relevant psycholinguistic and neurolinguistic theory. 22 references.

  15. Natural language interface for command and control

    NASA Technical Reports Server (NTRS)

    Shuler, Robert L., Jr.

    1986-01-01

    A working prototype of a flexible 'natural language' interface for command and control situations is presented. This prototype is analyzed from two standpoints. First is the role of natural language for command and control, its realistic requirements, and how well the role can be filled with current practical technology. Second, technical concepts for implementation are discussed and illustrated by their application in the prototype system. It is also shown how adaptive or 'learning' features can greatly ease the task of encoding language knowledge in the language processor.

  16. The Hermod Behavioral Synthesis System

    DTIC Science & Technology

    1988-06-08

    LDescription 1 lib tech-independent Transformation & Parser Optimization lib Hardware • g - utSynhesze Generator li Datapath lb Hardware liCotllb...Proc. 22nd Design Automation Conference, ACM/IEEE, June 1985, pp. 475-481. [7] G . De Micheli, "Synthesis of Control Systems", in Design Systems for...VLSI Circuits: Logic Synthesis and Silicon Compilation, G . De Micheli, A. Sangiovanni-Vincentelli, and P. Antognetti, (editor), Martinus Nijhoff

  17. Discourse Understanding. Technical Report No. 391.

    ERIC Educational Resources Information Center

    Scha, R. J. H.; And Others

    Artificial intelligence research on natural language understanding is discussed in this report using the notions that (1) natural language understanding systems must "see" sentences as elements whose significance resides in the contribution they make to the larger whole, and (2) a natural language understanding computer system must…

  18. Modeling Coevolution between Language and Memory Capacity during Language Origin

    PubMed Central

    Gong, Tao; Shuai, Lan

    2015-01-01

    Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language. PMID:26544876

  19. Modeling Coevolution between Language and Memory Capacity during Language Origin.

    PubMed

    Gong, Tao; Shuai, Lan

    2015-01-01

    Memory is essential to many cognitive tasks including language. Apart from empirical studies of memory effects on language acquisition and use, there lack sufficient evolutionary explorations on whether a high level of memory capacity is prerequisite for language and whether language origin could influence memory capacity. In line with evolutionary theories that natural selection refined language-related cognitive abilities, we advocated a coevolution scenario between language and memory capacity, which incorporated the genetic transmission of individual memory capacity, cultural transmission of idiolects, and natural and cultural selections on individual reproduction and language teaching. To illustrate the coevolution dynamics, we adopted a multi-agent computational model simulating the emergence of lexical items and simple syntax through iterated communications. Simulations showed that: along with the origin of a communal language, an initially-low memory capacity for acquired linguistic knowledge was boosted; and such coherent increase in linguistic understandability and memory capacities reflected a language-memory coevolution; and such coevolution stopped till memory capacities became sufficient for language communications. Statistical analyses revealed that the coevolution was realized mainly by natural selection based on individual communicative success in cultural transmissions. This work elaborated the biology-culture parallelism of language evolution, demonstrated the driving force of culturally-constituted factors for natural selection of individual cognitive abilities, and suggested that the degree difference in language-related cognitive abilities between humans and nonhuman animals could result from a coevolution with language.

  20. Understanding Student Language: An Unsupervised Dialogue Act Classification Approach

    ERIC Educational Resources Information Center

    Ezen-Can, Aysu; Boyer, Kristy Elizabeth

    2015-01-01

    Within the landscape of educational data, textual natural language is an increasingly vast source of learning-centered interactions. In natural language dialogue, student contributions hold important information about knowledge and goals. Automatically modeling the dialogue act of these student utterances is crucial for scaling natural language…

  1. How Much Language Is Enough? Some Immigrant Language Lessons from Canada and Germany. Discussion Paper.

    ERIC Educational Resources Information Center

    DeVoretz, Don J.; Hinte, Holger; Werner, Christiane

    Germany and Canada are at opposite ends of the debate over language integration and ascension to citizenship. German naturalization contains an explicit language criterion for naturalization. The first German immigration act will not only concentrate on control aspects but also focus on language as a criterion for legal immigration. Canada does…

  2. Teaching Language-Deviant Children to Generalize Newly Taught Language: A Socio-Ecological Approach. Volume I. Final Report.

    ERIC Educational Resources Information Center

    Schiefelbusch, R. L.; Rogers-Warren, Ann

    The report examines longitudinal research on language generalization in natural environments of 32 severely retarded, moderately retarded, and mildly language delayed preschool children. All Ss received language training on one of two programs and Ss' speech samples in a natural environment were collected and analyzed for evidence of…

  3. Natural Language Query System Design for Interactive Information Storage and Retrieval Systems. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1985-01-01

    The currently developed multi-level language interfaces of information systems are generally designed for experienced users. These interfaces commonly ignore the nature and needs of the largest user group, i.e., casual users. This research identifies the importance of natural language query system research within information storage and retrieval system development; addresses the topics of developing such a query system; and finally, proposes a framework for the development of natural language query systems in order to facilitate the communication between casual users and information storage and retrieval systems.

  4. A natural command language for C/3/I applications

    NASA Astrophysics Data System (ADS)

    Mergler, J. P.

    1980-03-01

    The article discusses the development of a natural command language and a control and analysis console designed to simplify the task of the operator in field of Command, Control, Communications, and Intelligence. The console is based on a DEC LSI-11 microcomputer, supported by 16-K words of memory and a serial interface component. Discussion covers the language, which utilizes English and a natural syntax, and how it is integrated with the hardware. It is concluded that results have demonstrated the effectiveness of this natural command language.

  5. Dependency distances in natural mixed languages. Comment on "Dependency distance: A new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    NASA Astrophysics Data System (ADS)

    Wang, Lin

    2017-07-01

    Haitao Liu et al.'s article [1] offers a comprehensive account of the diversity of syntactic patterns in human languages in terms of an important index of memory burden and syntactic difficulty - the dependency distance. Natural languages, a complex system, present overall shorter dependency distances under the universal pressure for dependency distance minimization; however, there exist some relatively-long-distance dependencies, which reflect that language can constantly adapt itself to some deep-level biological or functional constraints.

  6. User-defined functions in the Arden Syntax: An extension proposal.

    PubMed

    Karadimas, Harry; Ebrahiminia, Vahid; Lepage, Eric

    2015-12-11

    The Arden Syntax is a knowledge-encoding standard, started in 1989, and now in its 10th revision, maintained by the health level seven (HL7) organization. It has constructs borrowed from several language concepts that were available at that time (mainly the HELP hospital information system and the Regenstrief medical record system (RMRS), but also the Pascal language, functional languages and the data structure of frames, used in artificial intelligence). The syntax has a rationale for its constructs, and has restrictions that follow this rationale. The main goal of the Standard is to promote knowledge sharing, by avoiding the complexity of traditional programs, so that a medical logic module (MLM) written in the Arden Syntax can remain shareable and understandable across institutions. One of the restrictions of the syntax is that you cannot define your own functions and subroutines inside an MLM. An MLM can, however, call another MLM, where this MLM will serve as a function. This will add an additional dependency between MLMs, a known criticism of the Arden Syntax knowledge model. This article explains why we believe the Arden Syntax would benefit from a construct for user-defined functions, discusses the need, the benefits and the limitations of such a construct. We used the recent grammar of the Arden Syntax v.2.10, and both the Arden Syntax standard document and the Arden Syntax Rationale article as guidelines. We gradually introduced production rules to the grammar. We used the CUP parsing tool to verify that no ambiguities were detected. A new grammar was produced, that supports user-defined functions. 22 production rules were added to the grammar. A parser was built using the CUP parsing tool. A few examples are given to illustrate the concepts. All examples were parsed correctly. It is possible to add user-defined functions to the Arden Syntax in a way that remains coherent with the standard. We believe that this enhances the readability and the robustness of MLMs. A detailed proposal will be submitted by the end of the year to the HL7 workgroup on Arden Syntax. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Multilingual natural language generation as part of a medical terminology server.

    PubMed

    Wagner, J C; Solomon, W D; Michel, P A; Juge, C; Baud, R H; Rector, A L; Scherrer, J R

    1995-01-01

    Re-usable and sharable, and therefore language-independent concept models are of increasing importance in the medical domain. The GALEN project (Generalized Architecture for Languages Encyclopedias and Nomenclatures in Medicine) aims at developing language-independent concept representation systems as the foundations for the next generation of multilingual coding systems. For use within clinical applications, the content of the model has to be mapped to natural language. A so-called Multilingual Information Module (MM) establishes the link between the language-independent concept model and different natural languages. This text generation software must be versatile enough to cope at the same time with different languages and with different parts of a compositional model. It has to meet, on the one hand, the properties of the language as used in the medical domain and, on the other hand, the specific characteristics of the underlying model and its representation formalism. We propose a semantic-oriented approach to natural language generation that is based on linguistic annotations to a concept model. This approach is realized as an integral part of a Terminology Server, built around the concept model and offering different terminological services for clinical applications.

  8. Statistical Learning in a Natural Language by 8-Month-Old Infants

    PubMed Central

    Pelucchi, Bruna; Hay, Jessica F.; Saffran, Jenny R.

    2013-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants’ ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition. PMID:19489896

  9. Statistical learning in a natural language by 8-month-old infants.

    PubMed

    Pelucchi, Bruna; Hay, Jessica F; Saffran, Jenny R

    2009-01-01

    Numerous studies over the past decade support the claim that infants are equipped with powerful statistical language learning mechanisms. The primary evidence for statistical language learning in word segmentation comes from studies using artificial languages, continuous streams of synthesized syllables that are highly simplified relative to real speech. To what extent can these conclusions be scaled up to natural language learning? In the current experiments, English-learning 8-month-old infants' ability to track transitional probabilities in fluent infant-directed Italian speech was tested (N = 72). The results suggest that infants are sensitive to transitional probability cues in unfamiliar natural language stimuli, and support the claim that statistical learning is sufficiently robust to support aspects of real-world language acquisition.

  10. Using Language Learning Conditions in Mathematics. PEN 68.

    ERIC Educational Resources Information Center

    Stoessiger, Rex

    This pamphlet reports on a project in Tasmania exploring whether the "natural learning conditions" approach to language learning could be adapted for mathematics. The connections between language and mathematics, as well as the natural learning processes of language learning are described in the pamphlet. The project itself is…

  11. A Large-Scale Analysis of Variance in Written Language

    ERIC Educational Resources Information Center

    Johns, Brendan T.; Jamieson, Randall K.

    2018-01-01

    The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers,…

  12. Programming Languages, Natural Languages, and Mathematics

    ERIC Educational Resources Information Center

    Naur, Peter

    1975-01-01

    Analogies are drawn between the social aspects of programming and similar aspects of mathematics and natural languages. By analogy with the history of auxiliary languages it is suggested that Fortran and Cobol will remain dominant. (Available from the Association of Computing Machinery, 1133 Avenue of the Americas, New York, NY 10036.) (Author/TL)

  13. Testing of a Natural Language Retrieval System for a Full Text Knowledge Base.

    ERIC Educational Resources Information Center

    Bernstein, Lionel M.; Williamson, Robert E.

    1984-01-01

    The Hepatitis Knowledge Base (text of prototype information system) was used for modifying and testing "A Navigator of Natural Language Organized (Textual) Data" (ANNOD), a retrieval system which combines probabilistic, linguistic, and empirical means to rank individual paragraphs of full text for similarity to natural language queries…

  14. A natural language interface plug-in for cooperative query answering in biological databases.

    PubMed

    Jamil, Hasan M

    2012-06-11

    One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.

  15. A Wittgenstein Approach to the Learning of OO-modeling

    NASA Astrophysics Data System (ADS)

    Holmboe, Christian

    2004-12-01

    The paper uses Ludwig Wittgenstein's theories about the relationship between thought, language, and objects of the world to explore the assumption that OO-thinking resembles natural thinking. The paper imports from research in linguistic philosophy to computer science education research. I show how UML class diagrams (i.e., an artificial context-free language) correspond to the logically perfect languages described in Tractatus Logico-Philosophicus. In Philosophical Investigations Wittgenstein disputes his previous theories by showing that natural languages are not constructed by rules of mathematical logic, but are language games where the meaning of a word is constructed through its use in social contexts. Contradicting the claim that OO-thinking is easy to learn because of its similarity to natural thinking, I claim that OO-thinking is difficult to learn because of its differences from natural thinking. The nature of these differences is not currently well known or appreciated. I suggest how explicit attention to the nature and implications of different language games may improve the teaching and learning of OO-modeling as well as programming.

  16. Storytelling, behavior planning, and language evolution in context.

    PubMed

    McBride, Glen

    2014-01-01

    An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes.

  17. Storytelling, behavior planning, and language evolution in context

    PubMed Central

    McBride, Glen

    2014-01-01

    An attempt is made to specify the structure of the hominin bands that began steps to language. Storytelling could evolve without need for language yet be strongly subject to natural selection and could provide a major feedback process in evolving language. A storytelling model is examined, including its effects on the evolution of consciousness and the possible timing of language evolution. Behavior planning is presented as a model of language evolution from storytelling. The behavior programming mechanism in both directions provide a model of creating and understanding behavior and language. Culture began with societies, then family evolution, family life in troops, but storytelling created a culture of experiences, a final step in the long process of achieving experienced adults by natural selection. Most language evolution occurred in conversations where evolving non-verbal feedback ensured mutual agreements on understanding. Natural language evolved in conversations with feedback providing understanding of changes. PMID:25360123

  18. A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

    PubMed Central

    Chang, Jia Wei; Hsieh, Tung Cheng

    2014-01-01

    This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. PMID:24982952

  19. Xyce Parallel Electronic Simulator : reference guide, version 2.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  20. Effective Cyber Situation Awareness (CSA) Assessment and Training

    DTIC Science & Technology

    2013-11-01

    activity/scenario. y. Save Wireshark Captures. z. Save SNORT logs. aa. Save MySQL databases. 4. After the completion of the scenario, the reversion...line or from custom Java code. • Cisco ASA Parser: Builds normalized vendor-neutral firewall rule specifications from Cisco ASA and PIX firewall...The Service tool lets analysts build Cauldron models from either the command line or from custom Java code. Functionally, it corresponds to the

  1. Xyce™ Parallel Electronic Simulator Reference Guide Version 6.8

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  2. A Formal Model of Ambiguity and its Applications in Machine Translation

    DTIC Science & Technology

    2010-01-01

    structure indicates linguisti- cally implausible segmentation that might be generated using dictionary - driven approaches...derivation. As was done in the monolingual case, the functions LHS, RHSi, RHSo and υ can be extended to a derivation δ. D(q) where q ∈V denotes the... monolingual parses. My algorithm runs more efficiently than O(n6) with many grammars (including those that required using heuristic search with other parsers

  3. GENPLOT: A formula-based Pascal program for data manipulation and plotting

    NASA Astrophysics Data System (ADS)

    Kramer, Matthew J.

    Geochemical processes involving alteration, differentiation, fractionation, or migration of elements may be elucidated by a number of discrimination or variation diagrams (e.g., AFM, Harker, Pearce, and many others). The construction of these diagrams involves arithmetic combination of selective elements (involving major, minor, or trace elements). GENPLOT utilizes a formula-based algorithm (an expression parser) which enables the program to manipulate multiparameter databases and plot XY, ternary, tetrahedron, and REE type plots without needing to change either the source code or rearranging databases. Formulae may be any quadratic expression whose variables are the column headings of the data matrix. A full-screen editor with limited equations and arithmetic functions (spreadsheet) has been incorporated into the program to aid data entry and editing. Data are stored as ASCII files to facilitate interchange of data between other programs and computers. GENPLOT was developed in Turbo Pascal for the IBM and compatible computers but also is available in Apple Pascal for the Apple Ile and Ill. Because the source code is too extensive to list here (about 5200 lines of Pascal code), the expression parsing routine, which is central to GENPLOT's flexibility is incorporated into a smaller demonstration program named SOLVE. The following paper includes a discussion on how the expression parser works and a detailed description of GENPLOT's capabilities.

  4. Deciphering the language of nature: cryptography, secrecy, and alterity in Francis Bacon.

    PubMed

    Clody, Michael C

    2011-01-01

    The essay argues that Francis Bacon's considerations of parables and cryptography reflect larger interpretative concerns of his natural philosophic project. Bacon describes nature as having a language distinct from those of God and man, and, in so doing, establishes a central problem of his natural philosophy—namely, how can the language of nature be accessed through scientific representation? Ultimately, Bacon's solution relies on a theory of differential and duplicitous signs that conceal within them the hidden voice of nature, which is best recognized in the natural forms of efficient causality. The "alphabet of nature"—those tables of natural occurrences—consequently plays a central role in his program, as it renders nature's language susceptible to a process and decryption that mirrors the model of the bilateral cipher. It is argued that while the writing of Bacon's natural philosophy strives for literality, its investigative process preserves a space for alterity within scientific representation, that is made accessible to those with the interpretative key.

  5. Natural language generation of surgical procedures.

    PubMed

    Wagner, J C; Rogers, J E; Baud, R H; Scherrer, J R

    1998-01-01

    The GALEN-IN-USE project has developed a compositional scheme for the conceptual representation of surgical operative procedure rubrics. The complex representations which result are translated back to surface language by a tool for multilingual natural language generation. This generator can be adapted to the specific characteristics of the scheme by introducing particular definitions of concepts and relationships. We discuss how the generator uses such definitions to bridge between the modelling 'style' of the GALEN scheme and natural language.

  6. Concepts and implementations of natural language query systems

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Liu, I-Hsiung

    1984-01-01

    The currently developed user language interfaces of information systems are generally intended for serious users. These interfaces commonly ignore potentially the largest user group, i.e., casual users. This project discusses the concepts and implementations of a natural query language system which satisfy the nature and information needs of casual users by allowing them to communicate with the system in the form of their native (natural) language. In addition, a framework for the development of such an interface is also introduced for the MADAM (Multics Approach to Data Access and Management) system at the University of Southwestern Louisiana.

  7. Inferring heuristic classification hierarchies from natural language input

    NASA Technical Reports Server (NTRS)

    Hull, Richard; Gomez, Fernando

    1993-01-01

    A methodology for inferring hierarchies representing heuristic knowledge about the check out, control, and monitoring sub-system (CCMS) of the space shuttle launch processing system from natural language input is explained. Our method identifies failures explicitly and implicitly described in natural language by domain experts and uses those descriptions to recommend classifications for inclusion in the experts' heuristic hierarchies.

  8. Natural Language Processing in Game Studies Research: An Overview

    ERIC Educational Resources Information Center

    Zagal, Jose P.; Tomuro, Noriko; Shepitsen, Andriy

    2012-01-01

    Natural language processing (NLP) is a field of computer science and linguistics devoted to creating computer systems that use human (natural) language as input and/or output. The authors propose that NLP can also be used for game studies research. In this article, the authors provide an overview of NLP and describe some research possibilities…

  9. Toward a Theory-Based Natural Language Capability in Robots and Other Embodied Agents: Evaluating Hausser's SLIM Theory and Database Semantics

    ERIC Educational Resources Information Center

    Burk, Robin K.

    2010-01-01

    Computational natural language understanding and generation have been a goal of artificial intelligence since McCarthy, Minsky, Rochester and Shannon first proposed to spend the summer of 1956 studying this and related problems. Although statistical approaches dominate current natural language applications, two current research trends bring…

  10. The Boolean Is Dead, Long Live the Boolean! Natural Language versus Boolean Searching in Introductory Undergraduate Instruction

    ERIC Educational Resources Information Center

    Lowe, M. Sara; Maxson, Bronwen K.; Stone, Sean M.; Miller, Willie; Snajdr, Eric; Hanna, Kathleen

    2018-01-01

    Boolean logic can be a difficult concept for first-year, introductory students to grasp. This paper compares the results of Boolean and natural language searching across several databases with searches created from student research questions. Performance differences between databases varied. Overall, natural search language is at least as good as…

  11. A Framework for Representing and Jointly Reasoning over Linguistic and Non-Linguistic Knowledge

    ERIC Educational Resources Information Center

    Murugesan, Arthi

    2009-01-01

    Natural language poses several challenges to developing computational systems for modeling it. Natural language is not a precise problem but is rather ridden with a number of uncertainties in the form of either alternate words or interpretations. Furthermore, natural language is a generative system where the problem size is potentially infinite.…

  12. CONSTRUCT: In Search of a Theory of Meaning. Technical Report No. 238.

    ERIC Educational Resources Information Center

    Smith, R. L.; And Others

    A new language-processing system, CONSTRUCT, is described and defined as a question-answering system for elementary mathematical language using natural language input. The primary goal is said to be an attempt to reach a better understanding of the relationship between syntactic and semantic components of natural language. The "meaning…

  13. Human-Level Natural Language Understanding: False Progress and Real Challenges

    ERIC Educational Resources Information Center

    Bignoli, Perrin G.

    2013-01-01

    The field of Natural Language Processing (NLP) focuses on the study of how utterances composed of human-level languages can be understood and generated. Typically, there are considered to be three intertwined levels of structure that interact to create meaning in language: syntax, semantics, and pragmatics. Not only is a large amount of…

  14. Role of PROLOG (Programming and Logic) in natural-language processing. Report for September-December 1987

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McHale, M.L.

    The field of artificial Intelligence strives to produce computer programs that exhibit intelligent behavior. One of the areas of interest is the processing of natural language. This report discusses the role of the computer language PROLOG in Natural Language Processing (NLP) both from theoretic and pragmatic viewpoints. The reasons for using PROLOG for NLP are numerous. First, linguists can write natural-language grammars almost directly as PROLOG programs; this allows fast-prototyping of NLP systems and facilitates analysis of NLP theories. Second, semantic representations of natural-language texts that use logic formalisms are readily produced in PROLOG because of PROLOG's logical foundations. Third,more » PROLOG's built-in inferencing mechanisms are often sufficient for inferences on the logical forms produced by NLPs. Fourth, the logical, declarative nature of PROLOG may make it the language of choice for parallel computing systems. Finally, the fact that PROLOG has a de facto standard (Edinburgh) makes the porting of code from one computer system to another virtually trouble free. Perhaps the strongest tie one could make between NLP and PROLOG was stated by John Stuart Mill in his inaugural Address at St. Andrews: The structure of every sentence is a lesson in logic.« less

  15. Modeling Memory for Language Understanding.

    DTIC Science & Technology

    1982-02-01

    Abstract Research on natural language understanding by computer has shown that the nature and organization of memory plays j central role in the...block number) Research on natural language understanding by computer has shown that the nature and organization of memory plays a central role in the...understanding mechanism. Further we claim that such reminding is at the root of how we learn. Issues such as these have played an important part in shaping the

  16. Knowledge-Based Extensible Natural Language Interface Technology Program

    DTIC Science & Technology

    1989-11-30

    natural language as its own meta-language to explain the meaning and attributes of the words and idioms of the larguage. Educational courses in language...understood and used by Lydia for human-computer dialogue. The KL enables a systems developer or " teacher -user" to build the system to a point where new...language can be "formal" as in a structured educational language program or it can be "informal" as in the case of a person consulting a dictionary for the

  17. Integration of Speech and Natural Language

    DTIC Science & Technology

    1988-04-01

    major activities: • Development of the syntax and semantics components for natural language processing. • Integration of the developed syntax and...evaluating the performance of speech recognition algonthms developed K» under the Strategic Computing Program. grs Our work on natural language processing...included the developement of a grammar (syntax) that uses the Uiuficanon gnmmaj formaMsm (an augmented context free formalism). The Unification

  18. The language of nature matters: we need a more public ecology

    Treesearch

    Bruce R. Hull; David P. Robertson

    2000-01-01

    The language we use to describe nature matters. It is used by policy analysts to set goals for ecological restoration and management, by scientists to describe the nature that did, does, or could exist, and by all of us to imagine possible and acceptable conditions of environmental quality. Participants in environmental decision making demand a lot of the language and...

  19. Automatic Requirements Specification Extraction from Natural Language (ARSENAL)

    DTIC Science & Technology

    2014-10-01

    designers, implementers) involved in the design of software systems. However, natural language descriptions can be informal, incomplete, imprecise...communication of technical descriptions between the various stakeholders (e.g., customers, designers, imple- menters) involved in the design of software systems...the accuracy of the natural language processing stage, the degree of automation, and robustness to noise. 1 2 Introduction Software systems operate in

  20. Semi-Automated Methods for Refining a Domain-Specific Terminology Base

    DTIC Science & Technology

    2011-02-01

    only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National

  1. Bibliography of Research in Natural Language Generation

    DTIC Science & Technology

    1993-11-01

    on 1397] Barbara J. Gross Focuing and description in Artifcial Intelligence (GWAI-88), Geseke, West natural language dialogues, In Joshi et al. (557...Proceedings of the Fifth Canadian Conference from information in a frame structure. Data and on Artificial Intelligence , pages Ŕ-24, London, Knowledge...generation workshops (IWNLGS, ENLGWS), natural language processing conferences (ANLP, TINLAP, SPEECH), artificial intelligence conferences (AAAI, SCA

  2. Research in Knowledge Representation for Natural Language Understanding

    DTIC Science & Technology

    1980-11-01

    artificial intelligence, natural language understanding , parsing, syntax, semantics, speaker meaning, knowledge representation, semantic networks...TinB PAGE map M W006 1Report No. 4513 L RESEARCH IN KNOWLEDGE REPRESENTATION FOR NATURAL LANGUAGE UNDERSTANDING Annual Report 1 September 1979 to 31... understanding , knowledge representation, and knowledge based inference. The work that we have been doing falls into three classes, successively motivated by

  3. Sociolinguistic Typology and Sign Languages.

    PubMed

    Schembri, Adam; Fenlon, Jordan; Cormier, Kearsy; Johnston, Trevor

    2018-01-01

    This paper examines the possible relationship between proposed social determinants of morphological 'complexity' and how this contributes to linguistic diversity, specifically via the typological nature of the sign languages of deaf communities. We sketch how the notion of morphological complexity, as defined by Trudgill (2011), applies to sign languages. Using these criteria, sign languages appear to be languages with low to moderate levels of morphological complexity. This may partly reflect the influence of key social characteristics of communities on the typological nature of languages. Although many deaf communities are relatively small and may involve dense social networks (both social characteristics that Trudgill claimed may lend themselves to morphological 'complexification'), the picture is complicated by the highly variable nature of the sign language acquisition for most deaf people, and the ongoing contact between native signers, hearing non-native signers, and those deaf individuals who only acquire sign languages in later childhood and early adulthood. These are all factors that may work against the emergence of morphological complexification. The relationship between linguistic typology and these key social factors may lead to a better understanding of the nature of sign language grammar. This perspective stands in contrast to other work where sign languages are sometimes presented as having complex morphology despite being young languages (e.g., Aronoff et al., 2005); in some descriptions, the social determinants of morphological complexity have not received much attention, nor has the notion of complexity itself been specifically explored.

  4. Understanding the Nature of Learners' Out-of-Class Language Learning Experience with Technology

    ERIC Educational Resources Information Center

    Lai, Chun; Hu, Xiao; Lyu, Boning

    2018-01-01

    Out-of-class learning with technology comprises an essential context of second language development. Understanding the nature of out-of-class language learning with technology is the initial step towards safeguarding its quality. This study examined the types of learning experiences that language learners engaged in outside the classroom and the…

  5. "Use Your Words:" Reconsidering the Language of Conflict in the Early Years

    ERIC Educational Resources Information Center

    Blank, Jolyn; Schneider, Jenifer Jasinski

    2011-01-01

    This article explores the nature of classroom conflict as language practice. The authors describe the enactment of conflict events in one kindergarten classroom and analyze the events in order to identify the language practices teachers use, considering teachers' desires for language use in relation to conflict and exploring the nature of the…

  6. Parent-Implemented Natural Language Paradigm to Increase Language and Play in Children with Autism

    ERIC Educational Resources Information Center

    Gillett, Jill N.; LeBlanc, Linda A.

    2007-01-01

    Three parents of children with autism were taught to implement the Natural Language Paradigm (NLP). Data were collected on parent implementation, multiple measures of child language, and play. The parents were able to learn to implement the NLP procedures quickly and accurately with beneficial results for their children. Increases in the overall…

  7. Beliefs about Language Learning in Study Abroad: Advocating for a Language Ideology Approach

    ERIC Educational Resources Information Center

    Surtees, Victoria

    2016-01-01

    Study Abroad (SA) has long enjoyed the unquestioning support of the general public, governments, and its benefits for language learning in many ways have been naturalized as "common sense" (Twombly et al., 2012). Language ideology scholars would say that this naturalization itself is indication that there are strong ideological forces at…

  8. Thought beyond language: neural dissociation of algebra and natural language.

    PubMed

    Monti, Martin M; Parsons, Lawrence M; Osherson, Daniel N

    2012-08-01

    A central question in cognitive science is whether natural language provides combinatorial operations that are essential to diverse domains of thought. In the study reported here, we addressed this issue by examining the role of linguistic mechanisms in forging the hierarchical structures of algebra. In a 3-T functional MRI experiment, we showed that processing of the syntax-like operations of algebra does not rely on the neural mechanisms of natural language. Our findings indicate that processing the syntax of language elicits the known substrate of linguistic competence, whereas algebraic operations recruit bilateral parietal brain regions previously implicated in the representation of magnitude. This double dissociation argues against the view that language provides the structure of thought across all cognitive domains.

  9. PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

    PubMed

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/

  10. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature

    PubMed Central

    Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar

    2013-01-01

    One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628

  11. A Large-Scale Analysis of Variance in Written Language.

    PubMed

    Johns, Brendan T; Jamieson, Randall K

    2018-01-22

    The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing. Copyright © 2018 Cognitive Science Society, Inc.

  12. Defense Resource Planning Under Uncertainty: An Application of Robust Decision Making to Munitions Mix Planning

    DTIC Science & Technology

    2016-02-01

    In addition , the parser updates some parameters based on uncertainties. For example, Analytica was very slow to update Pk values based on...moderate range. The additional security environments helped to fill gaps in lower severity. Weapons Effectiveness Pk values were modified to account for two...project is to help improve the value and character of defense resource planning in an era of growing uncertainty and complex strategic challenges

  13. Units in the VO Version 1.0

    NASA Astrophysics Data System (ADS)

    Derriere, Sebastien; Gray, Norman; Demleitner, Markus; Louys, Mireille; Ochsenbein, Francois; Derriere, Sebastien; Gray, Norman

    2014-05-01

    This document describes a recommended syntax for writing the string representation of unit labels ("VOUnits"). In addition, it describes a set of recognised and deprecated units, which is as far as possible consistent with other relevant standards (BIPM, ISO/IEC and the IAU). The intention is that units written to conform to this specification will likely also be parsable by other well-known parsers. To this end, we include machine-readable grammars for other units syntaxes.

  14. Xyce parallel electronic simulator reference guide, Version 6.0.1.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    2014-01-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  15. Xyce parallel electronic simulator reference guide, version 6.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  16. Interservice/Industry Training, Simulation and Education Conference Partnerships for Learning in the New Millennium Abstracts

    DTIC Science & Technology

    2000-01-01

    for flight test data, and both generic and specialized tools of data filtering , data calibration, modeling , system identification, and simulation...GRAMMATICAL MODEL AND PARSER FOR AIR TRAFFIC CONTROLLER’S COMMANDS 11 A SPEECH-CONTROLLED INTERACTIVE VIRTUAL ENVIRONMENT FOR SHIP FAMILIARIZATION 12... MODELING AND SIMULATION IN THE 21ST CENTURY 23 NEW COTS HARDWARE AND SOFTWARE REDUCE THE COST AND EFFORT IN REPLACING AGING FLIGHT SIMULATORS SUBSYSTEMS

  17. Criteria for Evaluating the Performance of Compilers

    DTIC Science & Technology

    1974-10-01

    cannot be made to fit, then an auxiliary mechanism outside the parser might be used . Finally, changing the choice of parsing tech - nique to a...was not useful in providing a basic for compiler evaluation. The study of the first question eztablished criteria and methodb for assigning four...program. The study of the second question estab- lished criteria for defining a "compiler Gibson mix", and established methods for using this "mix" to

  18. CPP-TRS(C): On using visual cognitive symbols to enhance communication effectiveness

    NASA Technical Reports Server (NTRS)

    Tonfoni, Graziella

    1994-01-01

    Communicative Positioning Program/Text Representation Systems (CPP-TRS) is a visual language based on a system of 12 canvasses, 10 signals and 14 symbols. CPP-TRS is based on the fact that every communication action is the result of a set of cognitive processes and the whole system is based on the concept that you can enhance communication by visually perceiving text. With a simple syntax, CPP-TRS is capable of representing meaning and intention as well as communication functions visually. Those are precisely invisible aspects of natural language that are most relevant to getting the global meaning of a text. CPP-TRS reinforces natural language in human machine interaction systems. It complements natural language by adding certain important elements that are not represented by natural language by itself. These include communication intention and function of the text expressed by the sender, as well as the role the reader is supposed to play. The communication intention and function of a text and the reader's role are invisible in natural language because neither specific words nor punctuation conveys them sufficiently and unambiguously; they are therefore non-transparent.

  19. Linguistics and Information Science

    ERIC Educational Resources Information Center

    Montgomery, Christine A.

    1972-01-01

    This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The concept of a natural language information system is introduced as a framework for reviewing automated language processing efforts by computational linguists and information scientists. (96 references) (Author)

  20. The Exploring Nature of Definitions and Classifications of Language Learning Strategies (LLSs) in the Current Studies of Second/Foreign Language Learning

    ERIC Educational Resources Information Center

    Fazeli, Seyed Hossein

    2011-01-01

    This study aims to explore the nature of definitions and classifications of Language Learning Strategies (LLSs) in the current studies of second/foreign language learning in order to show the current problems regarding such definitions and classifications. The present study shows that there is not a universal agreeable definition and…

  1. An overview of computer-based natural language processing

    NASA Technical Reports Server (NTRS)

    Gevarter, W. B.

    1983-01-01

    Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds.

  2. A System for Natural Language Sentence Generation.

    ERIC Educational Resources Information Center

    Levison, Michael; Lessard, Gregory

    1992-01-01

    Describes the natural language computer program, "Vinci." Explains that using an attribute grammar formalism, Vinci can simulate components of several current linguistic theories. Considers the design of the system and its applications in linguistic modelling and second language acquisition research. Notes Vinci's uses in linguistics…

  3. Neural Network Computing and Natural Language Processing.

    ERIC Educational Resources Information Center

    Borchardt, Frank

    1988-01-01

    Considers the application of neural network concepts to traditional natural language processing and demonstrates that neural network computing architecture can: (1) learn from actual spoken language; (2) observe rules of pronunciation; and (3) reproduce sounds from the patterns derived by its own processes. (Author/CB)

  4. The nature of the language input affects brain activation during learning from a natural language

    PubMed Central

    Plante, Elena; Patterson, Dianne; Gómez, Rebecca; Almryde, Kyle R.; White, Milo G.; Asbjørnsen, Arve E.

    2015-01-01

    Artificial language studies have demonstrated that learners are able to segment individual word-like units from running speech using the transitional probability information. However, this skill has rarely been examined in the context of natural languages, where stimulus parameters can be quite different. In this study, two groups of English-speaking learners were exposed to Norwegian sentences over the course of three fMRI scans. One group was provided with input in which transitional probabilities predicted the presence of target words in the sentences. This group quickly learned to identify the target words and fMRI data revealed an extensive and highly dynamic learning network. These results were markedly different from activation seen for a second group of participants. This group was provided with highly similar input that was modified so that word learning based on syllable co-occurrences was not possible. These participants showed a much more restricted network. The results demonstrate that the nature of the input strongly influenced the nature of the network that learners employ to learn the properties of words in a natural language. PMID:26257471

  5. Sociolinguistic Typology and Sign Languages

    PubMed Central

    Schembri, Adam; Fenlon, Jordan; Cormier, Kearsy; Johnston, Trevor

    2018-01-01

    This paper examines the possible relationship between proposed social determinants of morphological ‘complexity’ and how this contributes to linguistic diversity, specifically via the typological nature of the sign languages of deaf communities. We sketch how the notion of morphological complexity, as defined by Trudgill (2011), applies to sign languages. Using these criteria, sign languages appear to be languages with low to moderate levels of morphological complexity. This may partly reflect the influence of key social characteristics of communities on the typological nature of languages. Although many deaf communities are relatively small and may involve dense social networks (both social characteristics that Trudgill claimed may lend themselves to morphological ‘complexification’), the picture is complicated by the highly variable nature of the sign language acquisition for most deaf people, and the ongoing contact between native signers, hearing non-native signers, and those deaf individuals who only acquire sign languages in later childhood and early adulthood. These are all factors that may work against the emergence of morphological complexification. The relationship between linguistic typology and these key social factors may lead to a better understanding of the nature of sign language grammar. This perspective stands in contrast to other work where sign languages are sometimes presented as having complex morphology despite being young languages (e.g., Aronoff et al., 2005); in some descriptions, the social determinants of morphological complexity have not received much attention, nor has the notion of complexity itself been specifically explored. PMID:29515506

  6. Natural language generation of surgical procedures.

    PubMed

    Wagner, J C; Rogers, J E; Baud, R H; Scherrer, J R

    1999-01-01

    A number of compositional Medical Concept Representation systems are being developed. Although these provide for a detailed conceptual representation of the underlying information, they have to be translated back to natural language for used by end-users and applications. The GALEN programme has been developing one such representation and we report here on a tool developed to generate natural language phrases from the GALEN conceptual representations. This tool can be adapted to different source modelling schemes and to different destination languages or sublanguages of a domain. It is based on a multilingual approach to natural language generation, realised through a clean separation of the domain model from the linguistic model and their link by well defined structures. Specific knowledge structures and operations have been developed for bridging between the modelling 'style' of the conceptual representation and natural language. Using the example of the scheme developed for modelling surgical operative procedures within the GALEN-IN-USE project, we show how the generator is adapted to such a scheme. The basic characteristics of the surgical procedures scheme are presented together with the basic principles of the generation tool. Using worked examples, we discuss the transformation operations which change the initial source representation into a form which can more directly be translated to a given natural language. In particular, the linguistic knowledge which has to be introduced--such as definitions of concepts and relationships is described. We explain the overall generator strategy and how particular transformation operations are triggered by language-dependent and conceptual parameters. Results are shown for generated French phrases corresponding to surgical procedures from the urology domain.

  7. State of the Art of Natural Language Processing

    DTIC Science & Technology

    1987-11-15

    work of Chomsky , Hewlett-Packard, Generalized Phase Structure Grammar . D. Lunar, DARPA speech understanding, Schank’s Conceptual Dependency Theory...of computers that a machine which understood natural languages was highly desirable. It also was evident from the work of Chomsky * and others that...computers. ♦Noam Chomsky , Aspects of the Theory of Syntax (Cambridge, Mass.: MIT Press, 1965). -A- One of the earliest attempts at Natural Language

  8. Subgroups in Language Trajectories from 4 to 11 Years: The Nature and Predictors of Stable, Improving and Decreasing Language Trajectory Groups

    ERIC Educational Resources Information Center

    McKean, Cristina; Wraith, Darren; Eadie, Patricia; Cook, Fallon; Mensah, Fiona; Reilly, Sheena

    2017-01-01

    Background: Little is known about the nature, range and prevalence of different subgroups in language trajectories extant in a population from 4 to 11 years. This hinders strategic targeting and design of interventions, particularly targeting those whose difficulties will likely persist. Methods: Children's language abilities from 4 to 11 years…

  9. Language Teaching with the Help of Multiple Methods. Collection d'"Etudes linguistiques," No. 21.

    ERIC Educational Resources Information Center

    Nivette, Jos, Ed.

    This book presents articles on language teaching media. Among the titles are: (1) "Il Foreign Language Teaching e l'impiego degli audio-visivi" (Foreign Language Teaching and the Use of Audio Visual Methods) by D'Agostino, (2) "Le role et la nature de l'image dans l'enseignement programme de l'anglais, langue seconde" (The Role and Nature of the…

  10. Taxa: An R package implementing data standards and methods for taxonomic data

    PubMed Central

    Foster, Zachary S.L.; Chamberlain, Scott; Grünwald, Niklaus J.

    2018-01-01

    The taxa R package provides a set of tools for defining and manipulating taxonomic data. The recent and widespread application of DNA sequencing to community composition studies is making large data sets with taxonomic information commonplace. However, compared to typical tabular data, this information is encoded in many different ways and the hierarchical nature of taxonomic classifications makes it difficult to work with. There are many R packages that use taxonomic data to varying degrees but there is currently no cross-package standard for how this information is encoded and manipulated. We developed the R package taxa to provide a robust and flexible solution to storing and manipulating taxonomic data in R and any application-specific information associated with it. Taxa provides parsers that can read common sources of taxonomic information (taxon IDs, sequence IDs, taxon names, and classifications) from nearly any format while preserving associated data. Once parsed, the taxonomic data and any associated data can be manipulated using a cohesive set of functions modeled after the popular R package dplyr. These functions take into account the hierarchical nature of taxa and can modify the taxonomy or associated data in such a way that both are kept in sync. Taxa is currently being used by the metacoder and taxize packages, which provide broadly useful functionality that we hope will speed adoption by users and developers. PMID:29707201

  11. Apprentissage naturel et apprentissage guide (Natural Learning and Guided Learning).

    ERIC Educational Resources Information Center

    Veronique, Daniel

    1984-01-01

    Although second language pedagogy has tended increasingly toward simulation, role-playing, and natural communication, it has not profited from existing research on natural learning in second languages. The emphasis should be on understanding how the processes of guided learning and natural learning differ, psychologically and sociologically, and…

  12. Analysis of the English morphology by semantic networks

    NASA Astrophysics Data System (ADS)

    Žáček, Martin; Homola, Dan

    2017-11-01

    The article is devoted to study the morphology of natural language, in this case English language. The research is of the language is from the perspective of knowledge representation, when we look at the word as a concept in the Concept languages. The research is in the relationship of the individual words and their classification in the sentence. For the analysis there are used several methods (syntax, lexical categories, morphology). This article focuses mainly on the word, as the foundation of every natural language (English).

  13. The Limited Role of Number of Nested Syntactic Dependencies in Accounting for Processing Cost: Evidence from German Simplex and Complex Verbal Clusters

    PubMed Central

    Bader, Markus

    2018-01-01

    This paper presents three acceptability experiments investigating German verb-final clauses in order to explore possible sources of sentence complexity during human parsing. The point of departure was De Vries et al.'s (2011) generalization that sentences with three or more crossed or nested dependencies are too complex for being processed by the human parsing mechanism without difficulties. This generalization is partially based on findings from Bach et al. (1986) concerning the acceptability of complex verb clusters in German and Dutch. The first experiment tests this generalization by comparing two sentence types: (i) sentences with three nested dependencies within a single clause that contains three verbs in a complex verb cluster; (ii) sentences with four nested dependencies distributed across two embedded clauses, one center-embedded within the other, each containing a two-verb cluster. The results show that sentences with four nested dependencies are judged as acceptable as control sentences with only two nested dependencies, whereas sentences with three nested dependencies are judged as only marginally acceptable. This argues against De Vries et al.'s (2011) claim that the human parser can process no more than two nested dependencies. The results are used to refine the Verb-Cluster Complexity Hypothesis of Bader and Schmid (2009a). The second and the third experiment investigate sentences with four nested dependencies in more detail in order to explore alternative sources of sentence complexity: the number of predicted heads to be held in working memory (storage cost in terms of the Dependency Locality Theory [DLT], Gibson, 2000) and the length of the involved dependencies (integration cost in terms of the DLT). Experiment 2 investigates sentences for which storage cost and integration cost make conflicting predictions. The results show that storage cost outweighs integration cost. Experiment 3 shows that increasing integration cost in sentences with two degrees of center embedding leads to decreased acceptability. Taken together, the results argue in favor of a multifactorial account of the limitations on center embedding in natural languages. PMID:29410633

  14. The Limited Role of Number of Nested Syntactic Dependencies in Accounting for Processing Cost: Evidence from German Simplex and Complex Verbal Clusters.

    PubMed

    Bader, Markus

    2017-01-01

    This paper presents three acceptability experiments investigating German verb-final clauses in order to explore possible sources of sentence complexity during human parsing. The point of departure was De Vries et al.'s (2011) generalization that sentences with three or more crossed or nested dependencies are too complex for being processed by the human parsing mechanism without difficulties. This generalization is partially based on findings from Bach et al. (1986) concerning the acceptability of complex verb clusters in German and Dutch. The first experiment tests this generalization by comparing two sentence types: (i) sentences with three nested dependencies within a single clause that contains three verbs in a complex verb cluster; (ii) sentences with four nested dependencies distributed across two embedded clauses, one center-embedded within the other, each containing a two-verb cluster. The results show that sentences with four nested dependencies are judged as acceptable as control sentences with only two nested dependencies, whereas sentences with three nested dependencies are judged as only marginally acceptable. This argues against De Vries et al.'s (2011) claim that the human parser can process no more than two nested dependencies. The results are used to refine the Verb-Cluster Complexity Hypothesis of Bader and Schmid (2009a). The second and the third experiment investigate sentences with four nested dependencies in more detail in order to explore alternative sources of sentence complexity: the number of predicted heads to be held in working memory (storage cost in terms of the Dependency Locality Theory [DLT], Gibson, 2000) and the length of the involved dependencies (integration cost in terms of the DLT). Experiment 2 investigates sentences for which storage cost and integration cost make conflicting predictions. The results show that storage cost outweighs integration cost. Experiment 3 shows that increasing integration cost in sentences with two degrees of center embedding leads to decreased acceptability. Taken together, the results argue in favor of a multifactorial account of the limitations on center embedding in natural languages.

  15. Effects of Tasks on BOLD Signal Responses to Sentence Contrasts: Review and Commentary

    PubMed Central

    Caplan, David; Gow, David

    2010-01-01

    Functional neuroimaging studies of syntactic processing have been interpreted as identifying the neural locations of parsing and interpretive operations. However, current behavioral studies of sentence processing indicate that many operations occur simultaneously with parsing and interpretation. In this review, we point to issues that arise in discriminating the effects of these concurrent processes from those of the parser/interpreter in neural measures and to approaches that may help resolve them. PMID:20932562

  16. Analysis of the Impact of Data Normalization on Cyber Event Correlation Query Performance

    DTIC Science & Technology

    2012-03-01

    2003). Organizations use it in planning, target marketing , decision-making, data analysis, and customer services (Shin, 2003). Organizations that...Following this IP address is a router message sequence number. This is a globally unique number for each router terminal and can range from...Appendix G, invokes the PERL parser for the log files from a particular USAF base, and invokes the CTL file that loads the resultant CSV file into the

  17. Open Source Software Projects Needing Security Investments

    DTIC Science & Technology

    2015-06-19

    modtls, BouncyCastle, gpg, otr, axolotl. 7. Static analyzers: Clang, Frama-C. 8. Nginx. 9. OpenVPN . It was noted that the funding model may be similar...to OpenSSL, where consulting funds the company. It was also noted that OpenVPN needs to correctly use OpenSSL in order to be secure, so focusing on...Dovecot 4. Other high-impact network services: OpenSSH, OpenVPN , BIND, ISC DHCP, University of Delaware NTPD 5. Core infrastructure data parsers

  18. Xyce parallel electronic simulator reference guide, version 6.1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  19. System Data Model (SDM) Source Code

    DTIC Science & Technology

    2012-08-23

    CROSS_COMPILE=/opt/gumstix/build_arm_nofpu/staging_dir/bin/arm-linux-uclibcgnueabi- 8 : CC=$(CROSS_COMPILE)gcc 9: CXX=$(CROSS_COMPILE)g++ 10 : AR...and flags to pass to it 6: LEX=flex 7: LEXFLAGS=-B 8 : 9: ## The parser generator to invoke and flags to pass to it 10 : YACC=bison 11: YACCFLAGS...5: # Point to default PetaLinux root directory 6: ifndef ROOTDIR 7: ROOTDIR=$(PETALINUX)/software/petalinux-dist 8 : endif 9: 10 : PATH:=$(PATH

  20. Understanding and Capturing People’s Mobile App Privacy Preferences

    DTIC Science & Technology

    2013-10-28

    The entire apps’ metadata takes up about 500MB of storage space when stored in a MySQL database and all the binary files take approximately 300GB of...functionality that can de- compile Dalvik bytecodes to Java source code faster than other de-compilers. Given the scale of the app analysis we planned on... java libraries, such as parser, sql connectors, etc Targeted Ads 137 admob, adwhirl, greystripe… Provided by mobile behavioral ads company to

  1. DSS 13 Microprocessor Antenna Controller

    NASA Technical Reports Server (NTRS)

    Gosline, R. M.

    1984-01-01

    A microprocessor based antenna controller system developed as part of the unattended station project for DSS 13 is described. Both the hardware and software top level designs are presented and the major problems encounted are discussed. Developments useful to related projects include a JPL standard 15 line interface using a single board computer, a general purpose parser, a fast floating point to ASCII conversion technique, and experience gained in using off board floating point processors with the 8080 CPU.

  2. Intelligent Information Retrieval for a Multimedia Database Using Captions

    DTIC Science & Technology

    1992-07-23

    The user was allowed to retrieve any of several multimedia types depending on the descriptors entered. An example mentioned was the assembly of a...statistics showed some performance improvements over a keyword search. Similar type work was described by Wong eL al (1987) where a vector space representation...keyword) lists for searching the lexicon (a syntactic parser is not used); a type hierarchy of terms was used in the process. The system then checked the

  3. Extracting BI-RADS Features from Portuguese Clinical Texts

    PubMed Central

    Nassif, Houssam; Cunha, Filipe; Moreira, Inês C.; Cruz-Correia, Ricardo; Sousa, Eliana; Page, David; Burnside, Elizabeth; Dutra, Inês

    2013-01-01

    In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation by a specialist in mammography. Our results show that our parser’s performance is comparable to the manual method. PMID:23797461

  4. Reconciliation of ontology and terminology to cope with linguistics.

    PubMed

    Baud, Robert H; Ceusters, Werner; Ruch, Patrick; Rassinoux, Anne-Marie; Lovis, Christian; Geissbühler, Antoine

    2007-01-01

    To discuss the relationships between ontologies, terminologies and language in the context of Natural Language Processing (NLP) applications in order to show the negative consequences of confusing them. The viewpoints of the terminologist and (computational) linguist are developed separately, and then compared, leading to the presentation of reconciliation among these points of view, with consideration of the role of the ontologist. In order to encourage appropriate usage of terminologies, guidelines are presented advocating the simultaneous publication of pragmatic vocabularies supported by terminological material based on adequate ontological analysis. Ontologies, terminologies and natural languages each have their own purpose. Ontologies support machine understanding, natural languages support human communication, and terminologies should form the bridge between them. Therefore, future terminology standards should be based on sound ontology and do justice to the diversities in natural languages. Moreover, they should support local vocabularies, in order to be easily adaptable to local needs and practices.

  5. Dynamical Systems in Psychology: Linguistic Approaches

    NASA Astrophysics Data System (ADS)

    Sulis, William

    Major goals for psychoanalysis and psychology are the description, analysis, prediction, and control of behaviour. Natural language has long provided the medium for the formulation of our theoretical understanding of behavior. But with the advent of nonlinear dynamics, a new language has appeared which offers promise to provide a quantitative theory of behaviour. In this paper, some of the limitations of natural and formal languages are discussed. Several approaches to understanding the links between natural and formal languages, as applied to the study of behavior, are discussed. These include symbolic dynamics, Moore's generalized shifts, Crutchfield's ɛ machines, and dynamical automata.

  6. Getting Answers to Natural Language Questions on the Web.

    ERIC Educational Resources Information Center

    Radev, Dragomir R.; Libner, Kelsey; Fan, Weiguo

    2002-01-01

    Describes a study that investigated the use of natural language questions on Web search engines. Highlights include query languages; differences in search engine syntax; and results of logistic regression and analysis of variance that showed aspects of questions that predicted significantly different performances, including the number of words,…

  7. Structured Natural-Language Descriptions for Semantic Content Retrieval of Visual Materials.

    ERIC Educational Resources Information Center

    Tam, A. M.; Leung, C. H. C.

    2001-01-01

    Proposes a structure for natural language descriptions of the semantic content of visual materials that requires descriptions to be (modified) keywords, phrases, or simple sentences, with components that are grammatical relations common to many languages. This structure makes it easy to implement a collection's descriptions as a relational…

  8. Integrating a Natural Language Message Pre-Processor with UIMA

    DTIC Science & Technology

    2008-01-01

    Carnegie Mellon Language Technologies Institute NL Message Preprocessing with UIMA Copyright © 2008, Carnegie Mellon. All Rights Reserved...Integrating a Natural Language Message Pre-Processor with UIMA Eric Nyberg, Eric Riebling, Richard C. Wang & Robert Frederking Language Technologies Institute...with UIMA 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER

  9. Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems.

    ERIC Educational Resources Information Center

    Burton, Richard R.

    In an attempt to overcome the lack of natural means of communication between student and computer, this thesis addresses the problem of developing a system which can understand natural language within an educational problem-solving environment. The nature of the environment imposes efficiency, habitability, self-teachability, and awareness of…

  10. Language Revitalization.

    ERIC Educational Resources Information Center

    Hinton, Leanne

    2003-01-01

    Surveys developments in language revitalization and language death. Focusing on indigenous languages, discusses the role and nature of appropriate linguistic documentation, possibilities for bilingual education, and methods of promoting oral fluency and intergenerational transmission in affected languages. (Author/VWL)

  11. Three-dimensional grammar in the brain: Dissociating the neural correlates of natural sign language and manually coded spoken language.

    PubMed

    Jednoróg, Katarzyna; Bola, Łukasz; Mostowski, Piotr; Szwed, Marcin; Boguszewski, Paweł M; Marchewka, Artur; Rutkowski, Paweł

    2015-05-01

    In several countries natural sign languages were considered inadequate for education. Instead, new sign-supported systems were created, based on the belief that spoken/written language is grammatically superior. One such system called SJM (system językowo-migowy) preserves the grammatical and lexical structure of spoken Polish and since 1960s has been extensively employed in schools and on TV. Nevertheless, the Deaf community avoids using SJM for everyday communication, its preferred language being PJM (polski język migowy), a natural sign language, structurally and grammatically independent of spoken Polish and featuring classifier constructions (CCs). Here, for the first time, we compare, with fMRI method, the neural bases of natural vs. devised communication systems. Deaf signers were presented with three types of signed sentences (SJM and PJM with/without CCs). Consistent with previous findings, PJM with CCs compared to either SJM or PJM without CCs recruited the parietal lobes. The reverse comparison revealed activation in the anterior temporal lobes, suggesting increased semantic combinatory processes in lexical sign comprehension. Finally, PJM compared with SJM engaged left posterior superior temporal gyrus and anterior temporal lobe, areas crucial for sentence-level speech comprehension. We suggest that activity in these two areas reflects greater processing efficiency for naturally evolved sign language. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Clinical Natural Language Processing in languages other than English: opportunities and challenges.

    PubMed

    Névéol, Aurélie; Dalianis, Hercules; Velupillai, Sumithra; Savova, Guergana; Zweigenbaum, Pierre

    2018-03-30

    Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

  13. Paradigms of Evaluation in Natural Language Processing: Field Linguistics for Glass Box Testing

    ERIC Educational Resources Information Center

    Cohen, Kevin Bretonnel

    2010-01-01

    Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of…

  14. Semantics of Context-Free Fragments of Natural Languages.

    ERIC Educational Resources Information Center

    Suppes, Patrick

    The objective of this paper is to combine the viewpoint of model-theoretic semantics and generative grammar, to define semantics for context-free languages, and to apply the results to some fragments of natural language. Following the introduction in the first section, Section 2 describes a simple artificial example to illustrate how a semantic…

  15. Development and Evaluation of a Thai Learning System on the Web Using Natural Language Processing.

    ERIC Educational Resources Information Center

    Dansuwan, Suyada; Nishina, Kikuko; Akahori, Kanji; Shimizu, Yasutaka

    2001-01-01

    Describes the Thai Learning System, which is designed to help learners acquire the Thai word order system. The system facilitates the lessons on the Web using HyperText Markup Language and Perl programming, which interfaces with natural language processing by means of Prolog. (Author/VWL)

  16. Syntactic Complexity and Ambiguity Resolution in a Free Word Order Language: Behavioral and Electrophysiological Evidences from Basque

    ERIC Educational Resources Information Center

    Erdocia, Kepa; Laka, Itziar; Mestres-Misse, Anna; Rodriguez-Fornells, Antoni

    2009-01-01

    In natural languages some syntactic structures are simpler than others. Syntactically complex structures require further computation that is not required by syntactically simple structures. In particular, canonical, basic word order represents the simplest sentence-structure. Natural languages have different canonical word orders, and they vary in…

  17. A Diagrammatic Language for Biochemical Networks

    NASA Astrophysics Data System (ADS)

    Maimon, Ron

    2002-03-01

    I present a diagrammatic language for representing the structure of biochemical networks. The language is designed to represent modular structure in a computational fasion, with composition of reactions replacing functional composition. This notation is used to represent arbitrarily large networks efficiently. The notation finds its most natural use in representing biological interaction networks, but it is a general computing language appropriate to any naturally occuring computation. Unlike lambda-calculus, or text-derived languages, it does not impose a tree-structure on the diagrams, and so is more effective at representing biological fucntion than competing notations.

  18. Caregiver communication to the child as moderator and mediator of genes for language.

    PubMed

    Onnis, Luca

    2017-05-15

    Human language appears to be unique among natural communication systems, and such uniqueness impinges on both nature and nurture. Human babies are endowed with cognitive abilities that predispose them to learn language, and this process cannot operate in an impoverished environment. To be effectively complete the acquisition of human language in human children requires highly socialised forms of learning, scaffolded over years of prolonged and intense caretaker-child interactions. How genes and environment operate in shaping language is unknown. These two components have traditionally been considered as independent, and often pitted against each other in terms of the nature versus nurture debate. This perspective article considers how innate abilities and experience might instead work together. In particular, it envisages potential scenarios for research, in which early caregiver verbal and non-verbal attachment practices may mediate or moderate the expression of human genetic systems for language. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Ethical dilemmas experienced by speech-language pathologists working in private practice.

    PubMed

    Flatley, Danielle R; Kenny, Belinda J; Lincoln, Michelle A

    2014-06-01

    Speech-language pathologists experience ethical dilemmas as they fulfil their professional roles and responsibilities. Previous research findings indicated that speech-language pathologists working in publicly funded settings identified ethical dilemmas when they managed complex clients, negotiated professional relationships, and addressed service delivery issues. However, little is known about ethical dilemmas experienced by speech-language pathologists working in private practice settings. The aim of this qualitative study was to describe the nature of ethical dilemmas experienced by speech-language pathologists working in private practice. Data were collected through semi-structured interviews with 10 speech-language pathologists employed in diverse private practice settings. Participants explained the nature of ethical dilemmas they experienced at work and identified their most challenging and frequently occurring ethical conflicts. Qualitative content analysis was used to analyse transcribed data and generate themes. Four themes reflected the nature of speech-language pathologists' ethical dilemmas; balancing benefit and harm, fidelity of business practices, distributing funds, and personal and professional integrity. Findings support the need for professional development activities that are specifically targeted towards facilitating ethical practice for speech-language pathologists in the private sector.

  20. Dependency distance: A new perspective on syntactic patterns in natural languages

    NASA Astrophysics Data System (ADS)

    Liu, Haitao; Xu, Chunshan; Liang, Junying

    2017-07-01

    Dependency distance, measured by the linear distance between two syntactically related words in a sentence, is generally held as an important index of memory burden and an indicator of syntactic difficulty. Since this constraint of memory is common for all human beings, there may well be a universal preference for dependency distance minimization (DDM) for the sake of reducing memory burden. This human-driven language universal is supported by big data analyses of various corpora that consistently report shorter overall dependency distance in natural languages than in artificial random languages and long-tailed distributions featuring a majority of short dependencies and a minority of long ones. Human languages, as complex systems, seem to have evolved to come up with diverse syntactic patterns under the universal pressure for dependency distance minimization. However, there always exist a small number of long-distance dependencies in natural languages, which may reflect some other biological or functional constraints. Language system may adapt itself to these sporadic long-distance dependencies. It is these universal constraints that have shaped such a rich diversity of syntactic patterns in human languages.

  1. BIBLIOGRAPHY ON LANGUAGE DEVELOPMENT.

    ERIC Educational Resources Information Center

    Harvard Univ., Cambridge, MA. Graduate School of Education.

    THIS BIBLIOGRAPHY LISTS MATERIAL ON VARIOUS ASPECTS OF LANGUAGE DEVELOPMENT. APPROXIMATELY 65 UNANNOTATED REFERENCES ARE PROVIDED TO DOCUMENTS DATING FROM 1958 TO 1966. JOURNALS, BOOKS, AND REPORT MATERIALS ARE LISTED. SUBJECT AREAS INCLUDED ARE THE NATURE OF LANGUAGE, LINGUISTICS, LANGUAGE LEARNING, LANGUAGE SKILLS, LANGUAGE PATTERNS, AND…

  2. The Organization of Knowledge in a Multi-Lingual, Integrated Parser.

    DTIC Science & Technology

    1984-11-01

    presunto S maniatico sexual quo dio muerte a golpes y a punalades a una mujer do 55 anos, informiron fuentes illegadas a Is investigacion. Literally in...el hospital la joven Rosa Areas, la que fue herida de bala por un uniformado. English: Rosa Areas is still in the hospital after being shot and wounded...by a soldier. In this sentence, the subject, " joven " (young person), is found after the verb, "se encuentra" (finds herself). To handle situations

  3. Extract and visualize geolocation from any text file

    NASA Astrophysics Data System (ADS)

    Boustani, M.

    2015-12-01

    There are variety of text file formats such as PDF, HTML and more which contains words about locations(countries, cities, regions and more). GeoParser developed as one of sub-projects under DARPA Memex to help finding any geolocation information crawled website data. It is a web application benefiting from Apache Tika to extract locations from any text file format and visualize geolocations on the map. https://github.com/MBoustani/GeoParserhttps://github.com/chrismattmann/tika-pythonhttp://www.darpa.mil/program/memex

  4. Numerical Function Generators Using LUT Cascades

    DTIC Science & Technology

    2007-06-01

    either algebraically (for example, sinðxÞ) or as a table of input/ output values. The user defines the numerical function by using the syntax of Scilab ...defined function in Scilab or specify it directly. Note that, by changing the parser of our system, any format can be used for the design entry. First...Methods for Multiple-Valued Input Address Generators,” Proc. 36th IEEE Int’l Symp. Multiple-Valued Logic (ISMVL ’06), May 2006. [29] Scilab 3.0, INRIA-ENPC

  5. Catalog Descriptions Using VOTable Files

    NASA Astrophysics Data System (ADS)

    Thompson, R.; Levay, K.; Kimball, T.; White, R.

    2008-08-01

    Additional information is frequently required to describe database table contents and make it understandable to users. For this reason, the Multimission Archive at Space Telescope (MAST) creates Òdescription filesÓ for each table/catalog. After trying various XML and CSV formats, we finally chose VOTable. These files are easy to update via an HTML form, easily read using an XML parser such as (in our case) the PHP5 SimpleXML extension, and have found multiple uses in our data access/retrieval process.

  6. Parser for Sabin-to-Mahoney Transition Model of Quasispecies Replication

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ecale Zhou, Carol

    2016-01-03

    This code is a data parse for preparing output from the Qspp agent-based stochastic simulation model for plotting in Excel. This code is specific to a set of simulations that were run for the purpose of preparing data for a publication. It is necessary to make this code open-source in order to publish the model code (Qspp), which has already been released. There is a necessity of assuring that results from using Qspp for a publication

  7. Open Radio Communications Architecture Core Framework V1.1.0 Volume 1 Software Users Manual

    DTIC Science & Technology

    2005-02-01

    on a PC utilizing the KDE desktop that comes with Red Hat Linux . The default desktop for most Red Hat Linux installations is the GNOME desktop. The...SCA) v2.2. The software was designed for a desktop computer running the Linux operating system (OS). It was developed in C++, uses ACE/TAO for CORBA...middleware, Xerces for the XML parser, and Red Hat Linux for the Operating System. The software is referred to as, Open Radio Communication

  8. The language of gene ontology: a Zipf's law analysis.

    PubMed

    Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy

    2012-06-07

    Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

  9. First Language Acquisition and Teaching

    ERIC Educational Resources Information Center

    Cruz-Ferreira, Madalena

    2011-01-01

    "First language acquisition" commonly means the acquisition of a single language in childhood, regardless of the number of languages in a child's natural environment. Language acquisition is variously viewed as predetermined, wondrous, a source of concern, and as developing through formal processes. "First language teaching" concerns schooling in…

  10. Directly Comparing Computer and Human Performance in Language Understanding and Visual Reasoning.

    ERIC Educational Resources Information Center

    Baker, Eva L.; And Others

    Evaluation models are being developed for assessing artificial intelligence (AI) systems in terms of similar performance by groups of people. Natural language understanding and vision systems are the areas of concentration. In simplest terms, the goal is to norm a given natural language system's performance on a sample of people. The specific…

  11. Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

    ERIC Educational Resources Information Center

    Jarman, Jay

    2011-01-01

    This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…

  12. Native American Rhetoric and the Pre-Socratic Ideal of "Physis."

    ERIC Educational Resources Information Center

    Miller, Bernard A.

    "House Made of Dawn" by N. Scott Momaday is about language and the sacredness of the word and about what can be understood as a peculiarly Native American theory of rhetoric. All things are hinged to the physical landscape, nature, and the implications nature bears upon language. In Momaday's book, language does not represent external…

  13. Using the Natural Language Paradigm (NLP) to Increase Vocalizations of Older Adults with Cognitive Impairments

    ERIC Educational Resources Information Center

    LeBlanc, Linda A.; Geiger, Kaneen B.; Sautter, Rachael A.; Sidener, Tina M.

    2007-01-01

    The Natural Language Paradigm (NLP) has proven effective in increasing spontaneous verbalizations for children with autism. This study investigated the use of NLP with older adults with cognitive impairments served at a leisure-based adult day program for seniors. Three individuals with limited spontaneous use of functional language participated…

  14. A Natural Language Interface to Databases

    NASA Technical Reports Server (NTRS)

    Ford, D. R.

    1990-01-01

    The development of a Natural Language Interface (NLI) is presented which is semantic-based and uses Conceptual Dependency representation. The system was developed using Lisp and currently runs on a Symbolics Lisp machine.

  15. Linguistics in Language Education

    ERIC Educational Resources Information Center

    Kumar, Rajesh; Yunus, Reva

    2014-01-01

    This article looks at the contribution of insights from theoretical linguistics to an understanding of language acquisition and the nature of language in terms of their potential benefit to language education. We examine the ideas of innateness and universal language faculty, as well as multilingualism and the language-society relationship. Modern…

  16. Deductive Coordination of Multiple Geospatial Knowledge Sources

    NASA Astrophysics Data System (ADS)

    Waldinger, R.; Reddy, M.; Culy, C.; Hobbs, J.; Jarvis, P.; Dungan, J. L.

    2002-12-01

    Deductive inference is applied to choreograph the cooperation of multiple knowledge sources to respond to geospatial queries. When no one source can provide an answer, the response may be deduced from pieces of the answer provided by many sources. Examples of sources include (1) The Alexandria Digital Library Gazetteer, a repository that gives the locations for almost six million place names, (2) The Cia World Factbook, an online almanac with basic information about more than 200 countries. (3) The SRI TerraVision 3D Terrain Visualization System, which displays a flight-simulator-like interactive display of geographic data held in a database, (4) The NASA GDACC WebGIS client for searching satellite and other geographic data available through OpenGIS Consortium (OGC) Web Map Servers, and (5) The Northern Arizona University Latitude/Longitude Distance Calculator. Queries are phrased in English and are translated into logical theorems by the Gemini Natural Language Parser. The theorems are proved by SNARK, a first-order-logic theorem prover, in the context of an axiomatic geospatial theory. The theory embodies a representational scheme that takes into account the fact that the same place may have many names, and the same name may refer to many places. SNARK has built-in procedures (RCC8 and the Allen calculus, respectively) for reasoning about spatial and temporal concepts. External knowledge sources may be consulted by SNARK as the proof is in progress, so that most knowledge need not be stored axiomatically. The Open Agent Architecture (OAA) facilitates communication between sources that may be implemented on different machines in different computer languages. An answer to the query, in the form of text or an image, is extracted from the proof. Currently, three-dimensional images are displayed by TerraVision but other displays are possible. The combined system is called Geo-Logica. Some example queries that can be handled by Geo-Logica include: (1) show the petrified forests in Oregon north of Portland, (2) show the lake in Argentina with the highest elevation, and (3) Show the IGPB land cover classification, derived using MODIS, of Montana for July, 2000. Use of a theorem prover allows sources to cooperate even if they adapt different notational conventions and representation schemes and have never been designed to work together. New sources can be added without reprogramming the system, by providing axioms that advertise their capabilities. Future directions include entering into a dialogue with the user to clarify ambiguities, elaborate on previous questions, or provide new information necessary to answer the question. In addition, of particular interest is to deal with temporally varying data, with answers displayed as animated images.

  17. Knowledge portal for Six Sigma DMAIC process

    NASA Astrophysics Data System (ADS)

    ThanhDat, N.; Claudiu, K. V.; Zobia, R.; Lobont, Lucian

    2016-08-01

    Knowledge plays a crucial role in success of DMAIC (Define, Measure, Analysis, Improve, and Control) execution. It is therefore necessary to share and renew the knowledge. Yet, one problem arising is how to create a place where knowledge are collected and shared effectively. We believe that Knowledge Portal (KP) is an important solution for the problem. In this article, the works concerning with requirements and functionalities for KP are first reviewed. Afterwards, a procedure with necessary tools to develop and implement a KP for DMAIC (KPD) is proposed. Particularly, KPD is built on the basis of free and open-source content and learning management systems, and Ontology Engineering. In order to structure and store knowledge, tools such as Protégé, OWL, as well as OWL-RDF Parsers are used. A Knowledge Reasoner module is developed in PHP language, ARC2, MySQL and SPARQL endpoint for the purpose of querying and inferring knowledge available from Ontologies. In order to validate the availability of the procedure, a KPD is built with the proposed functionalities and tools. The authors find that the KPD benefits an organization in constructing Web sites by itself with simple steps of implementation and low initial costs. It creates a space of knowledge exchange and supports effectively collecting DMAIC reports as well as sharing knowledge created. The authors’ evaluation result shows that DMAIC knowledge is found exactly with a high success rate and a good level of response time of queries.

  18. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

    PubMed

    Buckley, Julliette M; Coopey, Suzanne B; Sharko, John; Polubriaginof, Fernanda; Drohan, Brian; Belli, Ahmet K; Kim, Elizabeth M H; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Roche, Constance A; Gudewicz, Thomas M; Hughes, Kevin S

    2012-01-01

    The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text. There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders. We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

  19. Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on "Dependency distance: A new perspective on syntactic patterns in natural language" by Haitao Liu et al.

    NASA Astrophysics Data System (ADS)

    Jiang, Jingyang; Ouyang, Jinghui

    2017-07-01

    Liu et al. [1] offers a clear and informative account of the use of dependency distance in studying natural languages, with a focus on the viewpoint that dependency distance minimization (DDM) can be regarded as a linguistic universal. We would like to add the perspective of employing dependency distance in the studies of second languages acquisition (SLA), particularly the studies of syntactic development.

  20. Integrating Best Practices in Language Intervention and Curriculum Design to Facilitate First Words

    ERIC Educational Resources Information Center

    Lederer, Susan Hendler

    2014-01-01

    For children developing language typically, exposure to language through the natural, general language stimulation provided by families, siblings, and others is sufficient enough to facilitate language learning (Bloom & Lahey, 1978; Nelson, 1973; Owens, 2008). However, children with language delays (even those who are receptively and…

  1. A SUGGESTED BIBLIOGRAPHY FOR FOREIGN LANGUAGE TEACHERS.

    ERIC Educational Resources Information Center

    MICHEL, JOSEPH

    DESIGNED FOR FOREIGN LANGUAGE TEACHERS AND PERSONS PREPARING TO BECOME FOREIGN LANGUAGE TEACHERS, THIS BIBLIOGRAPHY OF WORKS PUBLISHED BETWEEN 1892 AND 1966 CONTAINS SECTIONS OF--(1) THE NATURE AND FUNCTION OF LANGUAGE, (2) LINGUISTICS, INCLUDING APPLIED LINGUISTICS FOR SPECIFIC LANGUAGES, (3) PSYCHOLOGY OF LANGUAGE, (4) PHYSIOLOGY OF SPEECH, (5)…

  2. Vectorial Representations of Meaning for a Computational Model of Language Comprehension

    ERIC Educational Resources Information Center

    Wu, Stephen Tze-Inn

    2010-01-01

    This thesis aims to define and extend a line of computational models for text comprehension that are humanly plausible. Since natural language is human by nature, computational models of human language will always be just that--models. To the degree that they miss out on information that humans would tap into, they may be improved by considering…

  3. Perceptual Decoding Processes for Language in a Visual Mode and for Language in an Auditory Mode.

    ERIC Educational Resources Information Center

    Myerson, Rosemarie Farkas

    The purpose of this paper is to gain insight into the nature of the reading process through an understanding of the general nature of sensory processing mechanisms which reorganize and restructure input signals for central recognition, and an understanding of how the grammar of the language functions in defining the set of possible sentences in…

  4. Assistance and Feedback Mechanism in an Intelligent Tutoring System for Teaching Conversion of Natural Language into Logic

    ERIC Educational Resources Information Center

    Perikos, Isidoros; Grivokostopoulou, Foteini; Hatzilygeroudis, Ioannis

    2017-01-01

    Logic as a knowledge representation and reasoning language is a fundamental topic of an Artificial Intelligence (AI) course and includes a number of sub-topics. One of them, which brings difficulties to students to deal with, is converting natural language (NL) sentences into first-order logic (FOL) formulas. To assist students to overcome those…

  5. CLIL in physics lessons at grammar school

    NASA Astrophysics Data System (ADS)

    Štefančínová, Iveta; Valovičová, Ľubomíra

    2017-01-01

    Content and Language Integrated Learning (CLIL) is one of the most outstanding approaches in foreign language teaching. This teaching method has promising prospects for the future of modern education as teaching subject and foreign languages are combined to offer a better preparation for life in Europe, especially when the mobility is becoming a highly significant factor of everyday life. We realized a project called Foreign languages in popularizing science at grammar school. Within the project five teachers with approbation subjects of English, French, German and Physics attended the methodological courses abroad. The teachers applied the gained experience in teaching and linking science teaching with the teaching of foreign languages. Outputs of the project (e.g. English-German-French-Slovak glossary of natural science terminology, student activity sheets, videos with natural science orientation in a foreign language, physical experiments in foreign languages, multimedia fairy tales with natural contents, posters of some scientists) are prepared for the CLIL-oriented lessons. We collected data of the questionnaire for students concerning attitude towards CLIL. The questionnaire for teachers showed data about the attitude, experience, and needs of teachers employing CLIL in their lessons.

  6. Interactive natural language acquisition in a multi-modal recurrent neural architecture

    NASA Astrophysics Data System (ADS)

    Heinrich, Stefan; Wermter, Stefan

    2018-01-01

    For the complex human brain that enables us to communicate in natural language, we gathered good understandings of principles underlying language acquisition and processing, knowledge about sociocultural conditions, and insights into activity patterns in the brain. However, we were not yet able to understand the behavioural and mechanistic characteristics for natural language and how mechanisms in the brain allow to acquire and process language. In bridging the insights from behavioural psychology and neuroscience, the goal of this paper is to contribute a computational understanding of appropriate characteristics that favour language acquisition. Accordingly, we provide concepts and refinements in cognitive modelling regarding principles and mechanisms in the brain and propose a neurocognitively plausible model for embodied language acquisition from real-world interaction of a humanoid robot with its environment. In particular, the architecture consists of a continuous time recurrent neural network, where parts have different leakage characteristics and thus operate on multiple timescales for every modality and the association of the higher level nodes of all modalities into cell assemblies. The model is capable of learning language production grounded in both, temporal dynamic somatosensation and vision, and features hierarchical concept abstraction, concept decomposition, multi-modal integration, and self-organisation of latent representations.

  7. Dependency distance: A new perspective on syntactic patterns in natural languages.

    PubMed

    Liu, Haitao; Xu, Chunshan; Liang, Junying

    2017-07-01

    Dependency distance, measured by the linear distance between two syntactically related words in a sentence, is generally held as an important index of memory burden and an indicator of syntactic difficulty. Since this constraint of memory is common for all human beings, there may well be a universal preference for dependency distance minimization (DDM) for the sake of reducing memory burden. This human-driven language universal is supported by big data analyses of various corpora that consistently report shorter overall dependency distance in natural languages than in artificial random languages and long-tailed distributions featuring a majority of short dependencies and a minority of long ones. Human languages, as complex systems, seem to have evolved to come up with diverse syntactic patterns under the universal pressure for dependency distance minimization. However, there always exist a small number of long-distance dependencies in natural languages, which may reflect some other biological or functional constraints. Language system may adapt itself to these sporadic long-distance dependencies. It is these universal constraints that have shaped such a rich diversity of syntactic patterns in human languages. Copyright © 2017. Published by Elsevier B.V.

  8. Cultural Perspectives Toward Language Learning

    ERIC Educational Resources Information Center

    Lin, Li-Li

    2008-01-01

    Cultural conflicts may be derived from using inappropriate language. Appropriate linguistic-pragmatic competence may also be produced by providing various and multicultural backgrounds. Culture and language are linked together naturally, unconsciously, and closely in daily social lives. Culture affects language and language affects culture through…

  9. Teaching Additional Languages. Educational Practices Series 6.

    ERIC Educational Resources Information Center

    Judd, Elliot L.; Tan, Lihua; Walberg, Herbert J.

    This booklet describes key principles of and research on teaching additional languages. The 10 chapters focus on the following: (1) "Comprehensible Input" (learners need exposure to meaningful, understandable language); (2) "Language Opportunities" (classroom activities should let students use natural and meaningful language with their…

  10. Social Network Development, Language Use, and Language Acquisition during Study Abroad: Arabic Language Learners' Perspectives

    ERIC Educational Resources Information Center

    Dewey, Dan P.; Belnap, R. Kirk; Hillstrom, Rebecca

    2013-01-01

    Language learners and educators have subscribed to the belief that those who go abroad will have many opportunities to use the target language and will naturally become proficient. They also assume that language learners will develop relationships with native speakers allowing them to use the language and become more fluent, an assumption…

  11. GALEN: a third generation terminology tool to support a multipurpose national coding system for surgical procedures.

    PubMed

    Trombert-Paviot, B; Rodrigues, J M; Rogers, J E; Baud, R; van der Haring, E; Rassinoux, A M; Abrial, V; Clavel, L; Idir, H

    2000-09-01

    Generalised architecture for languages, encyclopedia and nomenclatures in medicine (GALEN) has developed a new generation of terminology tools based on a language independent model describing the semantics and allowing computer processing and multiple reuses as well as natural language understanding systems applications to facilitate the sharing and maintaining of consistent medical knowledge. During the European Union 4 Th. framework program project GALEN-IN-USE and later on within two contracts with the national health authorities we applied the modelling and the tools to the development of a new multipurpose coding system for surgical procedures named CCAM in a minority language country, France. On one hand, we contributed to a language independent knowledge repository and multilingual semantic dictionaries for multicultural Europe. On the other hand, we support the traditional process for creating a new coding system in medicine which is very much labour consuming by artificial intelligence tools using a medically oriented recursive ontology and natural language processing. We used an integrated software named CLAW (for classification workbench) to process French professional medical language rubrics produced by the national colleges of surgeons domain experts into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation, on one hand, we generate with the LNAT natural language generator controlled French natural language to support the finalization of the linguistic labels (first generation) in relation with the meanings of the conceptual system structure. On the other hand, the Claw classification manager proves to be very powerful to retrieve the initial domain experts rubrics list with different categories of concepts (second generation) within a semantic structured representation (third generation) bridge to the electronic patient record detailed terminology.

  12. Modeling the Emergence of Lexicons in Homesign Systems

    PubMed Central

    Richie, Russell; Yang, Charles; Coppola, Marie

    2014-01-01

    It is largely acknowledged that natural languages emerge from not just human brains, but also from rich communities of interacting human brains (Senghas, 2005). Yet the precise role of such communities and such interaction in the emergence of core properties of language has largely gone uninvestigated in naturally emerging systems, leaving the few existing computational investigations of this issue at an artificial setting. Here we take a step towards investigating the precise role of community structure in the emergence of linguistic conventions with both naturalistic empirical data and computational modeling. We first show conventionalization of lexicons in two different classes of naturally emerging signed systems: (1) protolinguistic “homesigns” invented by linguistically isolated Deaf individuals, and (2) a natural sign language emerging in a recently formed rich Deaf community. We find that the latter conventionalized faster than the former. Second, we model conventionalization as a population of interacting individuals who adjust their probability of sign use in response to other individuals' actual sign use, following an independently motivated model of language learning (Yang 2002, 2004). Simulations suggest that a richer social network, like that of natural (signed) languages, conventionalizes faster than a sparser social network, like that of homesign systems. We discuss our behavioral and computational results in light of other work on language emergence, and other work of behavior on complex networks. PMID:24482343

  13. QATT: a Natural Language Interface for QPE. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    White, Douglas Robert-Graham

    1989-01-01

    QATT, a natural language interface developed for the Qualitative Process Engine (QPE) system is presented. The major goal was to evaluate the use of a preexisting natural language understanding system designed to be tailored for query processing in multiple domains of application. The other goal of QATT is to provide a comfortable environment in which to query envisionments in order to gain insight into the qualitative behavior of physical systems. It is shown that the use of the preexisting system made possible the development of a reasonably useful interface in a few months.

  14. Language as a Liberal Art.

    ERIC Educational Resources Information Center

    Stein, Jack M.

    Language, considered as a liberal art, is examined in the light of other philosophical viewpoints concerning the nature of language in relation to second language instruction in this paper. Critical of an earlier mechanistic audio-lingual learning theory, translation approaches to language learning, vocabulary list-oriented courses, graduate…

  15. Dynamical Languages

    NASA Astrophysics Data System (ADS)

    Xie, Huimin

    The following sections are included: * Definition of Dynamical Languages * Distinct Excluded Blocks * Definition and Properties * L and L″ in Chomsky Hierarchy * A Natural Equivalence Relation * Symbolic Flows * Symbolic Flows and Dynamical Languages * Subshifts of Finite Type * Sofic Systems * Graphs and Dynamical Languages * Graphs and Shannon-Graphs * Transitive Languages * Topological Entropy

  16. Attitudes and Language. Multilingual Matters: 83.

    ERIC Educational Resources Information Center

    Baker, Colin

    This book examines language attitudes, focusing on individual attitudes toward majority and minority languages and bilingualism. Special emphasis is placed on research conducted on language attitudes in Wales toward the Welsh and English languages. Six chapters address the following: (1) the nature, definition, and measurement of language…

  17. Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings.

    PubMed

    Dutta, Sayon; Long, William J; Brown, David F M; Reisner, Andrew T

    2013-08-01

    As use of radiology studies increases, there is a concurrent increase in incidental findings (eg, lung nodules) for which the radiologist issues recommendations for additional imaging for follow-up. Busy emergency physicians may be challenged to carefully communicate recommendations for additional imaging not relevant to the patient's primary evaluation. The emergence of electronic health records and natural language processing algorithms may help address this quality gap. We seek to describe recommendations for additional imaging from our institution and develop and validate an automated natural language processing algorithm to reliably identify recommendations for additional imaging. We developed a natural language processing algorithm to detect recommendations for additional imaging, using 3 iterative cycles of training and validation. The third cycle used 3,235 radiology reports (1,600 for algorithm training and 1,635 for validation) of discharged emergency department (ED) patients from which we determined the incidence of discharge-relevant recommendations for additional imaging and the frequency of appropriate discharge documentation. The test characteristics of the 3 natural language processing algorithm iterations were compared, using blinded chart review as the criterion standard. Discharge-relevant recommendations for additional imaging were found in 4.5% (95% confidence interval [CI] 3.5% to 5.5%) of ED radiology reports, but 51% (95% CI 43% to 59%) of discharge instructions failed to note those findings. The final natural language processing algorithm had 89% (95% CI 82% to 94%) sensitivity and 98% (95% CI 97% to 98%) specificity for detecting recommendations for additional imaging. For discharge-relevant recommendations for additional imaging, sensitivity improved to 97% (95% CI 89% to 100%). Recommendations for additional imaging are common, and failure to document relevant recommendations for additional imaging in ED discharge instructions occurs frequently. The natural language processing algorithm's performance improved with each iteration and offers a promising error-prevention tool. Copyright © 2013 American College of Emergency Physicians. Published by Mosby, Inc. All rights reserved.

  18. Analyzing Learner Language: Towards a Flexible Natural Language Processing Architecture for Intelligent Language Tutors

    ERIC Educational Resources Information Center

    Amaral, Luiz; Meurers, Detmar; Ziai, Ramon

    2011-01-01

    Intelligent language tutoring systems (ILTS) typically analyze learner input to diagnose learner language properties and provide individualized feedback. Despite a long history of ILTS research, such systems are virtually absent from real-life foreign language teaching (FLT). Taking a step toward more closely linking ILTS research to real-life…

  19. Signs of Change: Contemporary Attitudes to Australian Sign Language

    ERIC Educational Resources Information Center

    Slegers, Claudia

    2010-01-01

    This study explores contemporary attitudes to Australian Sign Language (Auslan). Since at least the 1960s, sign languages have been accepted by linguists as natural languages with all of the key ingredients common to spoken languages. However, these visual-spatial languages have historically been subject to ignorance and myth in Australia and…

  20. Three Dimensions of Reproducibility in Natural Language Processing.

    PubMed

    Cohen, K Bretonnel; Xia, Jingbo; Zweigenbaum, Pierre; Callahan, Tiffany J; Hargraves, Orin; Goss, Foster; Ide, Nancy; Névéol, Aurélie; Grouin, Cyril; Hunter, Lawrence E

    2018-05-01

    Despite considerable recent attention to problems with reproducibility of scientific research, there is a striking lack of agreement about the definition of the term. That is a problem, because the lack of a consensus definition makes it difficult to compare studies of reproducibility, and thus to have even a broad overview of the state of the issue in natural language processing. This paper proposes an ontology of reproducibility in that field. Its goal is to enhance both future research and communication about the topic, and retrospective meta-analyses. We show that three dimensions of reproducibility, corresponding to three kinds of claims in natural language processing papers, can account for a variety of types of research reports. These dimensions are reproducibility of a conclusion , of a finding , and of a value. Three biomedical natural language processing papers by the authors of this paper are analyzed with respect to these dimensions.

  1. Language of Uncertainty: the Expression of Decisional Conflict Related to Skin Cancer Prevention Recommendations.

    PubMed

    Strekalova, Yulia A; James, Vaughan S

    2017-09-01

    User-generated information on the Internet provides opportunities for the monitoring of health information consumer attitudes. For example, information about cancer prevention may cause decisional conflict. Yet posts and conversations shared by health information consumers online are often not readily actionable for interpretation and decision-making due to their unstandardized format. This study extends prior research on the use of natural language as a predictor of consumer attitudes and provides a link to decision-making by evaluating the predictive role of uncertainty indicators expressed in natural language. Analyzed data included free-text comments and structured scale responses related to information about skin cancer prevention options. The study identified natural language indicators of uncertainty and showed that it can serve as a predictor of decisional conflict. The natural indicators of uncertainty reported here can facilitate the monitoring of health consumer perceptions about cancer prevention recommendations and inform education and communication campaign planning and evaluation.

  2. Natural language processing and the Now-or-Never bottleneck.

    PubMed

    Gómez-Rodríguez, Carlos

    2016-01-01

    Researchers, motivated by the need to improve the efficiency of natural language processing tools to handle web-scale data, have recently arrived at models that remarkably match the expected features of human language processing under the Now-or-Never bottleneck framework. This provides additional support for said framework and highlights the research potential in the interaction between applied computational linguistics and cognitive science.

  3. Computing Accurate Grammatical Feedback in a Virtual Writing Conference for German-Speaking Elementary-School Children: An Approach Based on Natural Language Generation

    ERIC Educational Resources Information Center

    Harbusch, Karin; Itsova, Gergana; Koch, Ulrich; Kuhner, Christine

    2009-01-01

    We built a natural language processing (NLP) system implementing a "virtual writing conference" for elementary-school children, with German as the target language. Currently, state-of-the-art computer support for writing tasks is restricted to multiple-choice questions or quizzes because automatic parsing of the often ambiguous and fragmentary…

  4. Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

    PubMed

    Becker, Matthias; Böckmann, Britta

    2016-01-01

    Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

  5. Advances in natural language processing.

    PubMed

    Hirschberg, Julia; Manning, Christopher D

    2015-07-17

    Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today's researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area. Copyright © 2015, American Association for the Advancement of Science.

  6. Evolution, brain, and the nature of language.

    PubMed

    Berwick, Robert C; Friederici, Angela D; Chomsky, Noam; Bolhuis, Johan J

    2013-02-01

    Language serves as a cornerstone for human cognition, yet much about its evolution remains puzzling. Recent research on this question parallels Darwin's attempt to explain both the unity of all species and their diversity. What has emerged from this research is that the unified nature of human language arises from a shared, species-specific computational ability. This ability has identifiable correlates in the brain and has remained fixed since the origin of language approximately 100 thousand years ago. Although songbirds share with humans a vocal imitation learning ability, with a similar underlying neural organization, language is uniquely human. Copyright © 2012 Elsevier Ltd. All rights reserved.

  7. Positivity of the English Language

    PubMed Central

    Kloumann, Isabel M.; Danforth, Christopher M.; Harris, Kameron Decker; Bliss, Catherine A.; Dodds, Peter Sheridan

    2012-01-01

    Over the last million years, human language has emerged and evolved as a fundamental instrument of social communication and semiotic representation. People use language in part to convey emotional information, leading to the central and contingent questions: (1) What is the emotional spectrum of natural language? and (2) Are natural languages neutrally, positively, or negatively biased? Here, we report that the human-perceived positivity of over 10,000 of the most frequently used English words exhibits a clear positive bias. More deeply, we characterize and quantify distributions of word positivity for four large and distinct corpora, demonstrating that their form is broadly invariant with respect to frequency of word use. PMID:22247779

  8. The emergence of Zipf's law - Spontaneous encoding optimization by users of a command language

    NASA Technical Reports Server (NTRS)

    Ellis, S. R.; Hitchcock, R. J.

    1986-01-01

    The distribution of commands issued by experienced users of a computer operating system allowing command customization tends to conform to Zipf's law. This result documents the emergence of a statistical property of natural language as users master an artificial language. Analysis of Zipf's law by Mandelbrot and Cherry shows that its emergence in the computer interaction of experienced users may be interpreted as evidence that these users optimize their encoding of commands. Accordingly, the extent to which users of a command language exhibit Zipf's law can provide a metric of the naturalness and efficiency with which that language is used.

  9. Dynamic changes in network activations characterize early learning of a natural language.

    PubMed

    Plante, Elena; Patterson, Dianne; Dailey, Natalie S; Kyle, R Almyrde; Fridriksson, Julius

    2014-09-01

    Those who are initially exposed to an unfamiliar language have difficulty separating running speech into individual words, but over time will recognize both words and the grammatical structure of the language. Behavioral studies have used artificial languages to demonstrate that humans are sensitive to distributional information in language input, and can use this information to discover the structure of that language. This is done without direct instruction and learning occurs over the course of minutes rather than days or months. Moreover, learners may attend to different aspects of the language input as their own learning progresses. Here, we examine processing associated with the early stages of exposure to a natural language, using fMRI. Listeners were exposed to an unfamiliar language (Icelandic) while undergoing four consecutive fMRI scans. The Icelandic stimuli were constrained in ways known to produce rapid learning of aspects of language structure. After approximately 4 min of exposure to the Icelandic stimuli, participants began to differentiate between correct and incorrect sentences at above chance levels, with significant improvement between the first and last scan. An independent component analysis of the imaging data revealed four task-related components, two of which were associated with behavioral performance early in the experiment, and two with performance later in the experiment. This outcome suggests dynamic changes occur in the recruitment of neural resources even within the initial period of exposure to an unfamiliar natural language. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Context Analysis of Customer Requests using a Hybrid Adaptive Neuro Fuzzy Inference System and Hidden Markov Models in the Natural Language Call Routing Problem

    NASA Astrophysics Data System (ADS)

    Rustamov, Samir; Mustafayev, Elshan; Clements, Mark A.

    2018-04-01

    The context analysis of customer requests in a natural language call routing problem is investigated in the paper. One of the most significant problems in natural language call routing is a comprehension of client request. With the aim of finding a solution to this issue, the Hybrid HMM and ANFIS models become a subject to an examination. Combining different types of models (ANFIS and HMM) can prevent misunderstanding by the system for identification of user intention in dialogue system. Based on these models, the hybrid system may be employed in various language and call routing domains due to nonusage of lexical or syntactic analysis in classification process.

  11. Networks of lexical borrowing and lateral gene transfer in language and genome evolution

    PubMed Central

    List, Johann-Mattis; Nelson-Sathi, Shijulal; Geisler, Hans; Martin, William

    2014-01-01

    Like biological species, languages change over time. As noted by Darwin, there are many parallels between language evolution and biological evolution. Insights into these parallels have also undergone change in the past 150 years. Just like genes, words change over time, and language evolution can be likened to genome evolution accordingly, but what kind of evolution? There are fundamental differences between eukaryotic and prokaryotic evolution. In the former, natural variation entails the gradual accumulation of minor mutations in alleles. In the latter, lateral gene transfer is an integral mechanism of natural variation. The study of language evolution using biological methods has attracted much interest of late, most approaches focusing on language tree construction. These approaches may underestimate the important role that borrowing plays in language evolution. Network approaches that were originally designed to study lateral gene transfer may provide more realistic insights into the complexities of language evolution. PMID:24375688

  12. The Tao of Whole Language.

    ERIC Educational Resources Information Center

    Zola, Meguido

    1989-01-01

    Uses the philosophy of Taoism as a metaphor in describing the whole language approach to language arts instruction. The discussion covers the key principles that inform the whole language approach, the resulting holistic nature of language programs, and the role of the teacher in this approach. (16 references) (CLB)

  13. Natural language processing, pragmatics, and verbal behavior

    PubMed Central

    Cherpas, Chris

    1992-01-01

    Natural Language Processing (NLP) is that part of Artificial Intelligence (AI) concerned with endowing computers with verbal and listener repertoires, so that people can interact with them more easily. Most attention has been given to accurately parsing and generating syntactic structures, although NLP researchers are finding ways of handling the semantic content of language as well. It is increasingly apparent that understanding the pragmatic (contextual and consequential) dimension of natural language is critical for producing effective NLP systems. While there are some techniques for applying pragmatics in computer systems, they are piecemeal, crude, and lack an integrated theoretical foundation. Unfortunately, there is little awareness that Skinner's (1957) Verbal Behavior provides an extensive, principled pragmatic analysis of language. The implications of Skinner's functional analysis for NLP and for verbal aspects of epistemology lead to a proposal for a “user expert”—a computer system whose area of expertise is the long-term computer user. The evolutionary nature of behavior suggests an AI technology known as genetic algorithms/programming for implementing such a system. ImagesFig. 1 PMID:22477052

  14. Developing Formal Correctness Properties from Natural Language Requirements

    NASA Technical Reports Server (NTRS)

    Nikora, Allen P.

    2006-01-01

    This viewgraph presentation reviews the rationale of the program to transform natural language specifications into formal notation.Specifically, automate generation of Linear Temporal Logic (LTL)correctness properties from natural language temporal specifications. There are several reasons for this approach (1) Model-based techniques becoming more widely accepted, (2) Analytical verification techniques (e.g., model checking, theorem proving) significantly more effective at detecting types of specification design errors (e.g., race conditions, deadlock) than manual inspection, (3) Many requirements still written in natural language, which results in a high learning curve for specification languages, associated tools and increased schedule and budget pressure on projects reduce training opportunities for engineers, and (4) Formulation of correctness properties for system models can be a difficult problem. This has relevance to NASA in that it would simplify development of formal correctness properties, lead to more widespread use of model-based specification, design techniques, assist in earlier identification of defects and reduce residual defect content for space mission software systems. The presentation also discusses: potential applications, accomplishments and/or technological transfer potential and the next steps.

  15. ROPE: Recoverable Order-Preserving Embedding of Natural Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widemann, David P.; Wang, Eric X.; Thiagarajan, Jayaraman J.

    We present a novel Recoverable Order-Preserving Embedding (ROPE) of natural language. ROPE maps natural language passages from sparse concatenated one-hot representations to distributed vector representations of predetermined fixed length. We use Euclidean distance to return search results that are both grammatically and semantically similar. ROPE is based on a series of random projections of distributed word embeddings. We show that our technique typically forms a dictionary with sufficient incoherence such that sparse recovery of the original text is possible. We then show how our embedding allows for efficient and meaningful natural search and retrieval on Microsoft’s COCO dataset and themore » IMDB Movie Review dataset.« less

  16. A Requirements-Based Exploration of Open-Source Software Development Projects--Towards a Natural Language Processing Software Analysis Framework

    ERIC Educational Resources Information Center

    Vlas, Radu Eduard

    2012-01-01

    Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects,…

  17. Towards Automatic Treatment of Natural Language.

    ERIC Educational Resources Information Center

    Lonsdale, Deryle

    1984-01-01

    Because automated natural language processing relies heavily on the still developing fields of linguistics, knowledge representation, and computational linguistics, no system is capable of mimicking human linguistic capabilities. For the present, interactive systems may be used to augment today's technology. (MSE)

  18. Bilingual Language Switching in the Laboratory versus in the Wild: The Spatiotemporal Dynamics of Adaptive Language Control

    PubMed Central

    2017-01-01

    For a bilingual human, every utterance requires a choice about which language to use. This choice is commonly regarded as part of general executive control, engaging prefrontal and anterior cingulate cortices similarly to many types of effortful task switching. However, although language control within artificial switching paradigms has been heavily studied, the neurobiology of natural switching within socially cued situations has not been characterized. Additionally, although theoretical models address how language control mechanisms adapt to the distinct demands of different interactional contexts, these predictions have not been empirically tested. We used MEG (RRID: NIFINV:nlx_inv_090918) to investigate language switching in multiple contexts ranging from completely artificial to the comprehension of a fully natural bilingual conversation recorded “in the wild.” Our results showed less anterior cingulate and prefrontal cortex involvement for more natural switching. In production, voluntary switching did not engage the prefrontal cortex or elicit behavioral switch costs. In comprehension, while laboratory switches recruited executive control areas, fully natural switching within a conversation only engaged auditory cortices. Multivariate pattern analyses revealed that, in production, interlocutor identity was represented in a sustained fashion throughout the different stages of language planning until speech onset. In comprehension, however, a biphasic pattern was observed: interlocutor identity was first represented at the presentation of the interlocutor and then again at the presentation of the auditory word. In all, our findings underscore the importance of ecologically valid experimental paradigms and offer the first neurophysiological characterization of language control in a range of situations simulating real life to various degrees. SIGNIFICANCE STATEMENT Bilingualism is an inherently social phenomenon, interactional context fully determining language choice. This research addresses the neural mechanisms underlying multilingual individuals' ability to successfully adapt to varying conversational contexts both while speaking and listening. Our results showed that interactional context critically determines language control networks' engagement: switching under external constraints heavily recruited prefrontal control regions, whereas natural, voluntary switching did not. These findings challenge conclusions derived from artificial switching paradigms, which suggested that language switching is intrinsically effortful. Further, our results predict that the so-called bilingual advantage should be limited to individuals who need to control their languages according to external cues and thus would not occur by virtue of an experience in which switching is fully free. PMID:28821648

  19. Gendered Language in Interactive Discourse

    ERIC Educational Resources Information Center

    Hussey, Karen A.; Katz, Albert N.; Leith, Scott A.

    2015-01-01

    Over two studies, we examined the nature of gendered language in interactive discourse. In the first study, we analyzed gendered language from a chat corpus to see whether tokens of gendered language proposed in the gender-as-culture hypothesis (Maltz and Borker in "Language and social identity." Cambridge University Press, Cambridge, pp…

  20. Language and Social Identity Construction: A Study of a Russian Heritage Language Orthodox Christian School

    ERIC Educational Resources Information Center

    Moore, Ekaterina Leonidovna

    2012-01-01

    Grounded in discourse analytic and language socialization paradigms, this dissertation examines issues of language and social identity construction in children attending a Russian Heritage Language Orthodox Christian Saturday School in California. By conducting micro-analysis of naturally-occurring talk-in-interaction combined with longitudinal…

  1. Using a Language Generation System for Second Language Learning.

    ERIC Educational Resources Information Center

    Levison, Michael; Lessard, Greg

    1996-01-01

    Describes a language generation system, which, given data files describing a natural language, generates utterances of the class the user has specified. The system can exercise control over the syntax, lexicon, morphology, and semantics of the language. This article explores a range of the system's potential applications to second-language…

  2. The Relationship between Artificial and Second Language Learning

    ERIC Educational Resources Information Center

    Ettlinger, Marc; Morgan-Short, Kara; Faretta-Stutenberg, Mandy; Wong, Patrick C. M.

    2016-01-01

    Artificial language learning (ALL) experiments have become an important tool in exploring principles of language and language learning. A persistent question in all of this work, however, is whether ALL engages the linguistic system and whether ALL studies are ecologically valid assessments of natural language ability. In the present study, we…

  3. Assessment Measures for Specific Contexts of Language Use.

    ERIC Educational Resources Information Center

    Chalhoub-Deville, Micheline; Tarone, Elaine

    A discussion of second language testing focuses on the need for collaboration among researchers in second language learning, teaching, and testing concerning development of context-appropriate language tests. It is argued that the nature of the proficiency construct in language is not constant, but that different linguistic, functional, and…

  4. "Speaking English Naturally": The Language Ideologies of English as an Official Language at a Korean University

    ERIC Educational Resources Information Center

    Choi, Jinsook

    2016-01-01

    This study explores language ideologies of English at a Korean university where English has been adopted as an official language. This study draws on ethnographic data in order to understand how speakers respond to and experience the institutional language policy. The findings show that language ideologies in this university represent the…

  5. Factors Influencing Sensitivity to Lexical Tone in an Artificial Language: Implications for Second Language Learning

    ERIC Educational Resources Information Center

    Caldwell-Harris, Catherine L.; Lancaster, Alia; Ladd, D. Robert; Dediu, Dan; Christiansen, Morten H.

    2015-01-01

    This study examined whether musical training, ethnicity, and experience with a natural tone language influenced sensitivity to tone while listening to an artificial tone language. The language was designed with three tones, modeled after level-tone African languages. Participants listened to a 15-min random concatenation of six 3-syllable words.…

  6. The Relationship between Mathematics and Language: Academic Implications for Children with Specific Language Impairment and English Language Learners

    ERIC Educational Resources Information Center

    Alt, Mary; Arizmendi, Genesis D.; Beal, Carole R.

    2014-01-01

    Purpose: The present study examined the relationship between mathematics and language to better understand the nature of the deficit and the academic implications associated with specific language impairment (SLI) and academic implications for English language learners (ELLs). Method: School-age children (N = 61; 20 SLI, 20 ELL, 21 native…

  7. Sentence Repetition in Deaf Children with Specific Language Impairment in British Sign Language

    ERIC Educational Resources Information Center

    Marshall, Chloë; Mason, Kathryn; Rowley, Katherine; Herman, Rosalind; Atkinson, Joanna; Woll, Bencie; Morgan, Gary

    2015-01-01

    Children with specific language impairment (SLI) perform poorly on sentence repetition tasks across different spoken languages, but until now, this methodology has not been investigated in children who have SLI in a signed language. Users of a natural sign language encode different sentence meanings through their choice of signs and by altering…

  8. Language Design in the Processing of Non-Restrictive Relative Clauses in French as a Second Language

    ERIC Educational Resources Information Center

    Lorente Lapole, Amandine

    2012-01-01

    Recent years have witnessed a lively debate on the nature of learners' morphological competence and use. Some argue that a breakdown in acquisition of second-language (L2) is expected whenever features required for the analysis of L2 input are not present in the L1. Others argue that features have the same nature and etiology in first…

  9. Video to Text (V2T) in Wide Area Motion Imagery

    DTIC Science & Technology

    2015-09-01

    microtext) or a document (e.g., using Sphinx or Apache NLP ) as an automated approach [102]. Previous work in natural language full-text searching...language processing ( NLP ) based module. The heart of the structured text processing module includes the following seven key word banks...Features Tracker MHT Multiple Hypothesis Tracking MIL Multiple Instance Learning NLP Natural Language Processing OAB Online AdaBoost OF Optic Flow

  10. Sexual Self-Schemas in the Real World: Investigating the Ecological Validity of Language-Based Markers of Childhood Sexual Abuse

    PubMed Central

    Stanton, Amelia M.; Meston, Cindy M.

    2017-01-01

    Abstract This is the first study to examine language use and sexual self-schemas in natural language data extracted from posts to a large online forum. Recently, two studies applied advanced text analysis techniques to examine differences in language use and sexual self-schemas between women with and without a history of childhood sexual abuse. The aim of the current study was to test the ecological validity of the differences in language use and sexual self-schema themes that emerged between these two groups of women in the laboratory. Archival natural language data were extracted from a social media website and analyzed using LIWC2015, a computerized text analysis program, and other word counting approaches. The differences in both language use and sexual self-schema themes that manifested in recent laboratory research were replicated and validated in the large online sample. To our knowledge, these results provide the first empirical examination of sexual cognitions as they occur in the real world. These results also suggest that natural language analysis of text extracted from social media sites may be a potentially viable precursor or alternative to laboratory measurement of sexual trauma phenomena, as well as clinical phenomena, more generally. PMID:28570129

  11. Look Who's Talking: Speech Style and Social Context in Language Input to Infants Are Linked to Concurrent and Future Speech Development

    ERIC Educational Resources Information Center

    Ramírez-Esparza, Nairán; García-Sierra, Adrián; Kuhl, Patricia K.

    2014-01-01

    Language input is necessary for language learning, yet little is known about whether, in natural environments, the speech style and social context of language input to children impacts language development. In the present study we investigated the relationship between language input and language development, examining both the style of parental…

  12. Draco,Version 6.x.x

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, Kelly; Budge, Kent; Lowrie, Rob

    2016-03-03

    Draco is an object-oriented component library geared towards numerically intensive, radiation (particle) transport applications built for parallel computing hardware. It consists of semi-independent packages and a robust build system. The packages in Draco provide a set of components that can be used by multiple clients to build transport codes. The build system can also be extracted for use in clients. Software includes smart pointers, Design-by-Contract assertions, unit test framework, wrapped MPI functions, a file parser, unstructured mesh data structures, a random number generator, root finders and an angular quadrature component.

  13. Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

    2016-06-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.

  14. Sterling Software: An NLToolset-based System for MUC-6

    DTIC Science & Technology

    1995-11-01

    COCA - COLA ADVERTISING *PERIOD* ) ("’OOUBLEQUOTE"’ *EO-P"’ *SO-P"’ "’CAP* ABBREV _MR *CAP...34 Coca - Cola ". Since we weren’t using the parser, the part-of- speech obtained by a lexical lookup was of interest mainly if it was something like city-name...any contextual clues (such as "White House", "Fannie Mae", "Big Board", " Coca - cola " and "Coke", "Macy’s", "Exxon", etc). 252 SUB 6 0 0

  15. Signal Processing Expert Code (SPEC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ames, H.S.

    1985-12-01

    The purpose of this paper is to describe a prototype expert system called SPEC which was developed to demonstrate the utility of providing an intelligent interface for users of SIG, a general purpose signal processing code. The expert system is written in NIL, runs on a VAX 11/750 and consists of a backward chaining inference engine and an English-like parser. The inference engine uses knowledge encoded as rules about the formats of SIG commands and about how to perform frequency analyses using SIG. The system demonstrated that expert system can be used to control existing codes.

  16. Parallel File System I/O Performance Testing On LANL Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wiens, Isaac Christian; Green, Jennifer Kathleen

    2016-08-18

    These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.

  17. Language Learning--An Intellectual Challenge?

    ERIC Educational Resources Information Center

    Ager, Dennis E.

    1985-01-01

    Looks at the debate over whether foreign language study is intellectually challenging. Examines four points in the debate: the contrast between content and skill; the nature of the learning and teaching material; the nature of classroom interaction; and the idea of osmosis. (SED)

  18. The continuing legacy of nature versus nurture in biolinguistics.

    PubMed

    Bowling, Daniel L

    2017-02-01

    Theories of language evolution that separate biological and cultural contributions perpetuate a false dichotomy between nature and nurture. The explanatory power of future theories will depend on acknowledging the reality of gene-culture interaction and how it makes language possible.

  19. Selecting the Best Mobile Information Service with Natural Language User Input

    NASA Astrophysics Data System (ADS)

    Feng, Qiangze; Qi, Hongwei; Fukushima, Toshikazu

    Information services accessed via mobile phones provide information directly relevant to subscribers’ daily lives and are an area of dynamic market growth worldwide. Although many information services are currently offered by mobile operators, many of the existing solutions require a unique gateway for each service, and it is inconvenient for users to have to remember a large number of such gateways. Furthermore, the Short Message Service (SMS) is very popular in China and Chinese users would prefer to access these services in natural language via SMS. This chapter describes a Natural Language Based Service Selection System (NL3S) for use with a large number of mobile information services. The system can accept user queries in natural language and navigate it to the required service. Since it is difficult for existing methods to achieve high accuracy and high coverage and anticipate which other services a user might want to query, the NL3S is developed based on a Multi-service Ontology (MO) and Multi-service Query Language (MQL). The MO and MQL provide semantic and linguistic knowledge, respectively, to facilitate service selection for a user query and to provide adaptive service recommendations. Experiments show that the NL3S can achieve 75-95% accuracies and 85-95% satisfactions for processing various styles of natural language queries. A trial involving navigation of 30 different mobile services shows that the NL3S can provide a viable commercial solution for mobile operators.

  20. Programming Languages.

    ERIC Educational Resources Information Center

    Tesler, Lawrence G.

    1984-01-01

    Discusses the nature of programing languages, considering the features of BASIC, LOGO, PASCAL, COBOL, FORTH, APL, and LISP. Also discusses machine/assembly codes, the operation of a compiler, and trends in the evolution of programing languages (including interest in notational systems called object-oriented languages). (JN)

  1. Incidence Rate of Canonical vs. Derived Medical Terminology in Natural Language.

    PubMed

    Topac, Vasile; Jurcau, Daniel-Alexandru; Stoicu-Tivadar, Vasile

    2015-01-01

    Medical terminology appears in the natural language in multiple forms: canonical, derived or inflected form. This research presents an analysis of the form in which medical terminology appears in Romanian and English language. The sources of medical language used for the study are web pages presenting medical information for patients and other lay users. The results show that, in English, medical terminology tends to appear more in canonical form while, in the case of Romanian, it is the opposite. This paper also presents the service that was created to perform this analysis. This tool is available for the general public, and it is designed to be easily extensible, allowing the addition of other languages.

  2. Cross-lingual neighborhood effects in generalized lexical decision and natural reading.

    PubMed

    Dirix, Nicolas; Cop, Uschi; Drieghe, Denis; Duyck, Wouter

    2017-06-01

    The present study assessed intra- and cross-lingual neighborhood effects, using both a generalized lexical decision task and an analysis of a large-scale bilingual eye-tracking corpus (Cop, Dirix, Drieghe, & Duyck, 2016). Using new neighborhood density and frequency measures, the general lexical decision task yielded an inhibitory cross-lingual neighborhood density effect on reading times of second language words, replicating van Heuven, Dijkstra, and Grainger (1998). Reaction times for native language words were not influenced by neighborhood density or frequency but error rates showed cross-lingual neighborhood effects depending on target word frequency. The large-scale eye movement corpus confirmed effects of cross-lingual neighborhood on natural reading, even though participants were reading a novel in a unilingual context. Especially second language reading and to a lesser extent native language reading were influenced by lexical candidates from the nontarget language, although these effects in natural reading were largely facilitatory. These results offer strong and direct support for bilingual word recognition models that assume language-independent lexical access. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  3. Children with Developmental Language Impairment Have Vocabulary Deficits Characterized by Limited Breadth and Depth

    ERIC Educational Resources Information Center

    McGregor, Karla K.; Oleson, Jacob; Bahnsen, Alison; Duff, Dawna

    2013-01-01

    Background: Deficient vocabulary is a frequently reported symptom of developmental language impairment, but the nature of the deficit and its developmental course are not well documented. Aims: To describe the nature of the deficit in terms of breadth and depth of vocabulary knowledge and to determine whether the nature and the extent of the…

  4. Using Edit Distance to Analyse Errors in a Natural Language to Logic Translation Corpus

    ERIC Educational Resources Information Center

    Barker-Plummer, Dave; Dale, Robert; Cox, Richard; Romanczuk, Alex

    2012-01-01

    We have assembled a large corpus of student submissions to an automatic grading system, where the subject matter involves the translation of natural language sentences into propositional logic. Of the 2.3 million translation instances in the corpus, 286,000 (approximately 12%) are categorized as being in error. We want to understand the nature of…

  5. Phraseology and Frequency of Occurrence on the Web: Native Speakers' Perceptions of Google-Informed Second Language Writing

    ERIC Educational Resources Information Center

    Geluso, Joe

    2013-01-01

    Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…

  6. Autistic Symptomatology and Language Ability in Autism Spectrum Disorder and Specific Language Impairment

    ERIC Educational Resources Information Center

    Loucas, Tom; Charman, Tony; Pickles, Andrew; Simonoff, Emily; Chandler, Susie; Meldrum, David; Baird, Gillian

    2008-01-01

    Background: Autism spectrum disorders (ASD) and specific language impairment (SLI) are common developmental disorders characterised by deficits in language and communication. The nature of the relationship between them continues to be a matter of debate. This study investigates whether the co-occurrence of ASD and language impairment is associated…

  7. Literacy through Languages: Connecting with the Common Core

    ERIC Educational Resources Information Center

    Sandrock, Paul

    2013-01-01

    The Common Core Standards have defined literacy and outlined the mission for English Language Arts in a way that provides a natural fit with the National Standards for Language Learning. Taking advantage of this connection, language teachers can showcase the importance of learning languages by demonstrating how literacy is learned, practiced, and…

  8. Beliefs about Learning English as a Second Language among Native Groups in Rural Sabah, Malaysia

    ERIC Educational Resources Information Center

    Krishnasamy, Hariharan N.; Veloo, Arsaythamby; Lu, Ho Fui

    2013-01-01

    This paper identifies differences between the three ethnic groups, namely, Kadazans/Dusuns, Bajaus, and other minority ethnic groups on the beliefs about learning English as a second language based on the five variables, that is, language aptitude, language learning difficulty, language learning and communicating strategies, nature of language…

  9. The Two-Way Language Bridge: Co-Constructing Bilingual Language Learning Opportunities

    ERIC Educational Resources Information Center

    Martin-Beltran, Melinda

    2010-01-01

    Using a sociocultural theoretical lens, this study examines the nature of student interactions in a dual immersion school to analyze affordances for bilingual language learning, language exchange, and co-construction of language expertise. This article focuses on data from audio- and video-recorded interactions of fifth-grade students engaged in…

  10. Automatic Selection of Suitable Sentences for Language Learning Exercises

    ERIC Educational Resources Information Center

    Pilán, Ildikó; Volodina, Elena; Johansson, Richard

    2013-01-01

    In our study we investigated second and foreign language (L2) sentence readability, an area little explored so far in the case of several languages, including Swedish. The outcome of our research consists of two methods for sentence selection from native language corpora based on Natural Language Processing (NLP) and machine learning (ML)…

  11. Pinky Extension as a Phonestheme in Mongolian Sign Language

    ERIC Educational Resources Information Center

    Healy, Christina

    2011-01-01

    Mongolian Sign Language (MSL) is a visual-gestural language that developed from multiple languages interacting as a result of both geographic proximity and political relations and of the natural development of a communication system by deaf community members. Similar to the phonological systems of other signed languages, MSL combines handshapes,…

  12. On Teaching Strategies in Second Language Acquisition

    ERIC Educational Resources Information Center

    Yang, Hong

    2008-01-01

    How to acquire a second language is a question of obvious importance to teachers and language learners, and how to teach a second language has also become a matter of concern to the linguists' interest in the nature of primary linguistic data. Starting with the development stages of second language acquisition and Stephen Krashen's theory, this…

  13. Cognitive Approach to Assessing Pragmatic Language Comprehension in Children with Specific Language Impairment

    ERIC Educational Resources Information Center

    Ryder, Nuala; Leinonen, Eeva; Schulz, Joerg

    2008-01-01

    Background: Pragmatic language impairment in children with specific language impairment has proved difficult to assess, and the nature of their abilities to comprehend pragmatic meaning has not been fully investigated. Aims: To develop both a cognitive approach to pragmatic language assessment based on Relevance Theory and an assessment tool for…

  14. Restrictions on biological adaptation in language evolution.

    PubMed

    Chater, Nick; Reali, Florencia; Christiansen, Morten H

    2009-01-27

    Language acquisition and processing are governed by genetic constraints. A crucial unresolved question is how far these genetic constraints have coevolved with language, perhaps resulting in a highly specialized and species-specific language "module," and how much language acquisition and processing redeploy preexisting cognitive machinery. In the present work, we explored the circumstances under which genes encoding language-specific properties could have coevolved with language itself. We present a theoretical model, implemented in computer simulations, of key aspects of the interaction of genes and language. Our results show that genes for language could have coevolved only with highly stable aspects of the linguistic environment; a rapidly changing linguistic environment does not provide a stable target for natural selection. Thus, a biological endowment could not coevolve with properties of language that began as learned cultural conventions, because cultural conventions change much more rapidly than genes. We argue that this rules out the possibility that arbitrary properties of language, including abstract syntactic principles governing phrase structure, case marking, and agreement, have been built into a "language module" by natural selection. The genetic basis of human language acquisition and processing did not coevolve with language, but primarily predates the emergence of language. As suggested by Darwin, the fit between language and its underlying mechanisms arose because language has evolved to fit the human brain, rather than the reverse.

  15. From emblems to diagrams: Kepler's new pictorial language of scientific representation.

    PubMed

    Chen-Morris, Raz

    2009-01-01

    Kepler's treatise on optics of 1604 furnished, along with technical solutions to problems in medieval perspective, a mathematically-based visual language for the observation of nature. This language, based on Kepler's theory of retinal pictures, ascribed a new role to geometrical diagrams. This paper examines Kepler's pictorial language against the backdrop of alchemical emblems that flourished in and around the court of Rudolf II in Prague. It highlights the cultural context in which Kepler's optics was immersed, and the way in which Kepler attempted to demarcate his new science from other modes of the investigation of nature.

  16. Context and the Psychoeducational Assessment of Hearing Impaired Children.

    ERIC Educational Resources Information Center

    Ray, Steven

    1989-01-01

    This discussion of psychoeducational assessment of hearing-impaired students and the influence of language competence focuses on: the nature of the interaction between cognition and language, the nonpragmatic nature of traditional assessments, approaches to reducing intelligence test bias, pragmatic violations in intellectual assessment, and…

  17. CITE NLM: Natural-Language Searching in an Online Catalog.

    ERIC Educational Resources Information Center

    Doszkocs, Tamas E.

    1983-01-01

    The National Library of Medicine's Current Information Transfer in English public access online catalog offers unique subject search capabilities--natural-language query input, automatic medical subject headings display, closest match search strategy, ranked document output, dynamic end user feedback for search refinement. References, description…

  18. Natural Environment Language Assessment and Intervention with Severely Impaired Preschoolers.

    ERIC Educational Resources Information Center

    Halle, James W.; And Others

    1984-01-01

    The paper presents a rationale for assessing and intervening with severely impaired preschoolers in the natural environment, identifies three prerequisites for language training (content and motivation, reinforcing social and physical environment, and a communicative repertoire), and examines two levels of intervention. (CL)

  19. Voice-Dictated versus Typed-in Clinician Notes: Linguistic Properties and the Potential Implications on Natural Language Processing

    PubMed Central

    Zheng, Kai; Mei, Qiaozhu; Yang, Lei; Manion, Frank J.; Balis, Ulysses J.; Hanauer, David A.

    2011-01-01

    In this study, we comparatively examined the linguistic properties of narrative clinician notes created through voice dictation versus those directly entered by clinicians via a computer keyboard. Intuitively, the nature of voice-dictated notes would resemble that of natural language, while typed-in notes may demonstrate distinctive language features for reasons such as intensive usage of acronyms. The study analyses were based on an empirical dataset retrieved from our institutional electronic health records system. The dataset contains 30,000 voice-dictated notes and 30,000 notes that were entered manually; both were encounter notes generated in ambulatory care settings. The results suggest that between the narrative clinician notes created via these two different methods, there exists a considerable amount of lexical and distributional differences. Such differences could have a significant impact on the performance of natural language processing tools, necessitating these two different types of documents being differentially treated. PMID:22195229

  20. Quantization, Frobenius and Bi algebras from the Categorical Framework of Quantum Mechanics to Natural Language Semantics

    NASA Astrophysics Data System (ADS)

    Sadrzadeh, Mehrnoosh

    2017-07-01

    Compact Closed categories and Frobenius and Bi algebras have been applied to model and reason about Quantum protocols. The same constructions have also been applied to reason about natural language semantics under the name: ``categorical distributional compositional'' semantics, or in short, the ``DisCoCat'' model. This model combines the statistical vector models of word meaning with the compositional models of grammatical structure. It has been applied to natural language tasks such as disambiguation, paraphrasing and entailment of phrases and sentences. The passage from the grammatical structure to vectors is provided by a functor, similar to the Quantization functor of Quantum Field Theory. The original DisCoCat model only used compact closed categories. Later, Frobenius algebras were added to it to model long distance dependancies such as relative pronouns. Recently, bialgebras have been added to the pack to reason about quantifiers. This paper reviews these constructions and their application to natural language semantics. We go over the theory and present some of the core experimental results.

  1. Clinician-Oriented Access to Data - C.O.A.D.: A Natural Language Interface to a VA DHCP Database

    PubMed Central

    Levy, Christine; Rogers, Elizabeth

    1995-01-01

    Hospitals collect enormous amounts of data related to the on-going care of patients. Unfortunately, a clinicians access to the data is limited by complexities of the database structure and/or programming skills required to access the database. The COAD project attempts to bridge the gap between the clinical user's need for specific information from the database, and the wealth of data residing in the hospital information system. The project design includes a natural language interface to data contained in a VA DHCP database. We have developed a prototype which links natural language software to certain DHCP data elements, including, patient demographics, prescriptions, diagnoses, laboratory data, and provider information. English queries can by typed onto the system, and answers to the questions are returned. Future work includes refinement of natural language/DHCP connections to enable more sophisticated queries, and optimization of the system to reduce response time to user questions.

  2. The Measurement of Language Diversity.

    ERIC Educational Resources Information Center

    Brougham, James

    Accepting that language diversity is functionally related to other variables characterizing human societies, much discussion stems from the advantages or disadvantageous nature of language diversity in terms of national development and national unity. To discover ways of measuring language diversity would help, in part, to solve the language…

  3. Categorization of Survey Text Utilizing Natural Language Processing and Demographic Filtering

    DTIC Science & Technology

    2017-09-01

    SURVEY TEXT UTILIZING NATURAL LANGUAGE PROCESSING AND DEMOGRAPHIC FILTERING by Christine M. Cairoli September 2017 Thesis Advisor: Lyn...DATE September 2017 3. REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE CATEGORIZATION OF SURVEY TEXT UTILIZING NATURAL...words) Thousands of Navy survey free text comments are overlooked every year because reading and interpreting comments is expensive, time consuming

  4. Exploring the Ancestral Roots of American Sign Language: Lexical Borrowing from Cistercian Sign Language and French Sign Language

    ERIC Educational Resources Information Center

    Cagle, Keith Martin

    2010-01-01

    American Sign Language (ASL) is the natural and preferred language of the Deaf community in both the United States and Canada. Woodward (1978) estimated that approximately 60% of the ASL lexicon is derived from early 19th century French Sign Language, which is known as "langue des signes francaise" (LSF). The lexicon of LSF and ASL may…

  5. The Nature of the Language Faculty and Its Implications for Evolution of Language (Reply to Fitch, Hauser, and Chomsky)

    ERIC Educational Resources Information Center

    Jackendoff, Ray; Pinker, Steven

    2005-01-01

    In a continuation of the conversation with Fitch, Chomsky, and Hauser on the evolution of language, we examine their defense of the claim that the uniquely human, language-specific part of the language faculty (the ''narrow language faculty'') consists only of recursion, and that this part cannot be considered an adaptation to communication. We…

  6. Naturalism and Ideological Work: How Is Family Language Policy Renegotiated as Both Parents and Children Learn a Threatened Minority Language?

    ERIC Educational Resources Information Center

    Armstrong, Timothy Currie

    2014-01-01

    Parents who enroll their children to be educated through a threatened minority language frequently do not speak that language themselves and classes in the language are sometimes offered to parents in the expectation that this will help them to support their children's education and to use the minority language in the home. Providing…

  7. A Pragmatic Study on the Functions of Vague Language in Commercial Advertising

    ERIC Educational Resources Information Center

    Wenzhong, Zhu; Jingyi, Li

    2013-01-01

    Vagueness is one of the basic attributes of natural language. This is the same to advertising language. Vague language is a subject of increasing interest, and both foreign and domestic studies have attained success in it. Nevertheless, the study on the application of vague language in the context of English commercial advertising is relatively…

  8. Drop Everything and Write (DEAW): An Innovative Program to Improve Literacy Skills

    ERIC Educational Resources Information Center

    Joshi, R. Malatesha; Aaron, P. G.; Hill, Nancy; Ocker Dean, Emily; Boulware-Gooden, Regina; Rupley, William H.

    2008-01-01

    It is believed that language is an innate ability and, therefore, spoken language is acquired naturally and informally. In contrast, written language is thought to be an invention and, therefore, has to be learned through formal instruction. An alternate view, however, is that spoken language and written language are two forms of manifestations of…

  9. A Stronger Reason for the Right to Sign Languages

    ERIC Educational Resources Information Center

    Trovato, Sara

    2013-01-01

    Is the right to sign language only the right to a minority language? Holding a capability (not a disability) approach, and building on the psycholinguistic literature on sign language acquisition, I make the point that this right is of a stronger nature, since only sign languages can guarantee that each deaf child will properly develop the…

  10. Defining English Language Proficiency for Malaysian Tertiary Education: Past, Present and Future Efforts

    ERIC Educational Resources Information Center

    Heng, Chan Swee

    2012-01-01

    Any attempt to define English language proficiency can never be divorced from the theories that describe the nature of language, language acquisition and human cognition. By virtue of such theories being socially constructed, the descriptions are necessarily value-laden. Thus, a definition of language proficiency can only, at best, be described as…

  11. Natural language modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharp, J.K.

    1997-11-01

    This seminar describes a process and methodology that uses structured natural language to enable the construction of precise information requirements directly from users, experts, and managers. The main focus of this natural language approach is to create the precise information requirements and to do it in such a way that the business and technical experts are fully accountable for the results. These requirements can then be implemented using appropriate tools and technology. This requirement set is also a universal learning tool because it has all of the knowledge that is needed to understand a particular process (e.g., expense vouchers, projectmore » management, budget reviews, tax, laws, machine function).« less

  12. Positionalism of Relations and Its Consequences for Fact-Oriented Modelling

    NASA Astrophysics Data System (ADS)

    Keet, C. Maria

    Natural language-based conceptual modelling as well as the use of diagrams have been essential components of fact-oriented modelling from its inception. However, transforming natural language to its corresponding object-role modelling diagram, and vv., is not trivial. This is due to the more fundamental problem of the different underlying ontological commitments concerning positionalism of the fact types. The natural language-based approach adheres to the standard view whereas the diagram-based approach has a positionalist commitment, which is, from an ontological perspective, incompatible with the former. This hinders seamless transition between the two approaches and affects interoperability with other conceptual modelling languages. One can adopt either the limited standard view or the positionalist commitment with fact types that may not be easily verbalisable but which facilitates data integration and reusability of conceptual models with ontological foundations.

  13. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus

    PubMed Central

    Comeau, Donald C.; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W. John

    2014-01-01

    BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net PMID:24935050

  14. Inter-Annotator Agreement and the Upper Limit on Machine Performance: Evidence from Biomedical Natural Language Processing.

    PubMed

    Boguslav, Mayla; Cohen, Kevin Bretonnel

    2017-01-01

    Human-annotated data is a fundamental part of natural language processing system development and evaluation. The quality of that data is typically assessed by calculating the agreement between the annotators. It is widely assumed that this agreement between annotators is the upper limit on system performance in natural language processing: if humans can't agree with each other about the classification more than some percentage of the time, we don't expect a computer to do any better. We trace the logical positivist roots of the motivation for measuring inter-annotator agreement, demonstrate the prevalence of the widely-held assumption about the relationship between inter-annotator agreement and system performance, and present data that suggest that inter-annotator agreement is not, in fact, an upper bound on language processing system performance.

  15. A novel evaluation of two related and two independent algorithms for eye movement classification during reading.

    PubMed

    Friedman, Lee; Rigas, Ioannis; Abdulin, Evgeny; Komogortsev, Oleg V

    2018-05-15

    Nystrӧm and Holmqvist have published a method for the classification of eye movements during reading (ONH) (Nyström & Holmqvist, 2010). When we applied this algorithm to our data, the results were not satisfactory, so we modified the algorithm (now the MNH) to better classify our data. The changes included: (1) reducing the amount of signal filtering, (2) excluding a new type of noise, (3) removing several adaptive thresholds and replacing them with fixed thresholds, (4) changing the way that the start and end of each saccade was determined, (5) employing a new algorithm for detecting PSOs, and (6) allowing a fixation period to either begin or end with noise. A new method for the evaluation of classification algorithms is presented. It was designed to provide comprehensive feedback to an algorithm developer, in a time-efficient manner, about the types and numbers of classification errors that an algorithm produces. This evaluation was conducted by three expert raters independently, across 20 randomly chosen recordings, each classified by both algorithms. The MNH made many fewer errors in determining when saccades start and end, and it also detected some fixations and saccades that the ONH did not. The MNH fails to detect very small saccades. We also evaluated two additional algorithms: the EyeLink Parser and a more current, machine-learning-based algorithm. The EyeLink Parser tended to find more saccades that ended too early than did the other methods, and we found numerous problems with the output of the machine-learning-based algorithm.

  16. Replacing Fortran Namelists with JSON

    NASA Astrophysics Data System (ADS)

    Robinson, T. E., Jr.

    2017-12-01

    Maintaining a log of input parameters for a climate model is very important to understanding potential causes for answer changes during the development stages. Additionally, since modern Fortran is now interoperable with C, a more modern approach to software infrastructure to include code written in C is necessary. Merging these two separate facets of climate modeling requires a quality control for monitoring changes to input parameters and model defaults that can work with both Fortran and C. JSON will soon replace namelists as the preferred key/value pair input in the GFDL model. By adding a JSON parser written in C into the model, the input can be used by all functions and subroutines in the model, errors can be handled by the model instead of by the internal namelist parser, and the values can be output into a single file that is easily parsable by readily available tools. Input JSON files can handle all of the functionality of a namelist while being portable between C and Fortran. Fortran wrappers using unlimited polymorphism are crucial to allow for simple and compact code which avoids the need for many subroutines contained in an interface. Errors can be handled with more detail by providing information about location of syntax errors or typos. The output JSON provides a ground truth for values that the model actually uses by providing not only the values loaded through the input JSON, but also any default values that were not included. This kind of quality control on model input is crucial for maintaining reproducibility and understanding any answer changes resulting from changes in the input.

  17. AAC Language Activity Monitoring: Entering the New Millennium.

    ERIC Educational Resources Information Center

    Hill, Katya; Romich, Barry

    This report describes how augmentative and alternative communication (AAC) automated language activity monitoring can provide clinicians with the tools they need to collect and analyze language samples from the natural environment of children with disabilities for clinical intervention and outcomes measurements. The Language Activity Monitor (LAM)…

  18. AAC Best Practice Using Automated Language Activity Monitoring.

    ERIC Educational Resources Information Center

    Hill, Katya; Romich, Barry

    This brief paper describes automated language activity monitoring (LAM), an augmentative and alternative communication (AAC) methodology for the collection, editing, and analysis of language data in structured or natural situations with people who have severe communication disorders. The LAM function records each language event (letters, words,…

  19. Eliminating Unpredictable Variation through Iterated Learning

    ERIC Educational Resources Information Center

    Smith, Kenny; Wonnacott, Elizabeth

    2010-01-01

    Human languages may be shaped not only by the (individual psychological) processes of language acquisition, but also by population-level processes arising from repeated language learning and use. One prevalent feature of natural languages is that they avoid unpredictable variation. The current work explores whether linguistic predictability might…

  20. The Evolution of Musicality: What Can Be Learned from Language Evolution Research?

    PubMed Central

    Ravignani, Andrea; Thompson, Bill; Filippi, Piera

    2018-01-01

    Language and music share many commonalities, both as natural phenomena and as subjects of intellectual inquiry. Rather than exhaustively reviewing these connections, we focus on potential cross-pollination of methodological inquiries and attitudes. We highlight areas in which scholarship on the evolution of language may inform the evolution of music. We focus on the value of coupled empirical and formal methodologies, and on the futility of mysterianism, the declining view that the nature, origins and evolution of language cannot be addressed empirically. We identify key areas in which the evolution of language as a discipline has flourished historically, and suggest ways in which these advances can be integrated into the study of the evolution of music. PMID:29467601

  1. Innateness and culture in the evolution of language

    PubMed Central

    Kirby, Simon; Dowman, Mike; Griffiths, Thomas L.

    2007-01-01

    Human language arises from biological evolution, individual learning, and cultural transmission, but the interaction of these three processes has not been widely studied. We set out a formal framework for analyzing cultural transmission, which allows us to investigate how innate learning biases are related to universal properties of language. We show that cultural transmission can magnify weak biases into strong linguistic universals, undermining one of the arguments for strong innate constraints on language learning. As a consequence, the strength of innate biases can be shielded from natural selection, allowing these genes to drift. Furthermore, even when there is no natural selection, cultural transmission can produce apparent adaptations. Cultural transmission thus provides an alternative to traditional nativist and adaptationist explanations for the properties of human languages. PMID:17360393

  2. The Evolution of Musicality: What Can Be Learned from Language Evolution Research?

    PubMed

    Ravignani, Andrea; Thompson, Bill; Filippi, Piera

    2018-01-01

    Language and music share many commonalities, both as natural phenomena and as subjects of intellectual inquiry. Rather than exhaustively reviewing these connections, we focus on potential cross-pollination of methodological inquiries and attitudes. We highlight areas in which scholarship on the evolution of language may inform the evolution of music. We focus on the value of coupled empirical and formal methodologies, and on the futility of mysterianism , the declining view that the nature, origins and evolution of language cannot be addressed empirically. We identify key areas in which the evolution of language as a discipline has flourished historically, and suggest ways in which these advances can be integrated into the study of the evolution of music.

  3. Research and Development in Natural Language Understanding as Part of the Strategic Computing Program.

    DTIC Science & Technology

    1987-04-01

    facilities. BBN is developing a series of increasingly sophisticated natural language understanding systems which will serve as an integrated interface...Haas, A.R. A Syntactic Theory of Belief and Action. Artificial Intelligence. 1986. Forthcoming. [6] Hinrichs, E. Temporale Anaphora im Englischen

  4. Research in Progress: Invited Colloquium--Foreign Languages in an Age of Globalization

    ERIC Educational Resources Information Center

    Kramsch, Claire

    2013-01-01

    With the advent of globalization and the increasingly multilingual and multicultural nature of nations, institutions and classrooms, the fundamental nature of foreign language instruction is changing. Such traditional notions as: "native speaker", "target culture", "standard L2" are becoming problematic with the…

  5. Natural language information retrieval in digital libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Strzalkowski, T.; Perez-Carballo, J.; Marinescu, M.

    In this paper we report on some recent developments in joint NYU and GE natural language information retrieval system. The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and buildmore » a conceptual hierarchy specific to the database domain, and (3) process user`s natural language requests into effective search queries. This system has been used in NIST-sponsored Text Retrieval Conferences (TREC), where we worked with approximately 3.3 GBytes of text articles including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications`s Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. The system have been designed to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.« less

  6. Linear separability in superordinate natural language concepts.

    PubMed

    Ruts, Wim; Storms, Gert; Hampton, James

    2004-01-01

    Two experiments are reported in which linear separability was investigated in superordinate natural language concept pairs (e.g., toiletry-sewing gear). Representations of the exemplars of semantically related concept pairs were derived in two to five dimensions using multidimensional scaling (MDS) of similarities based on possession of the concept features. Next, category membership, obtained from an exemplar generation study (in Experiment 1) and from a forced-choice classification task (in Experiment 2) was predicted from the coordinates of the MDS representation using log linear analysis. The results showed that all natural kind concept pairs were perfectly linearly separable, whereas artifact concept pairs showed several violations. Clear linear separability of natural language concept pairs is in line with independent cue models. The violations in the artifact pairs, however, yield clear evidence against the independent cue models.

  7. New Directions: Communication Development in Persons with Severe Disabilities.

    ERIC Educational Resources Information Center

    Goetz, Lori; Sailor, Wayne

    1988-01-01

    To produce spontaneous and generalized language use by severely disabled individuals, the language training context and content must be examined. Training methods can better approximate the conditions of natural language use when they involve: generation of spontaneous language responses to effect real-world changes, single performance "trials,"…

  8. The ALICE System: A Workbench for Learning and Using Language.

    ERIC Educational Resources Information Center

    Levin, Lori; And Others

    1991-01-01

    ALICE, a multimedia framework for intelligent computer-assisted language instruction (ICALI) at Carnegie Mellon University (PA), consists of a set of tools for building a number of different types of ICALI programs in any language. Its Natural Language Processing tools for syntactic error detection, morphological analysis, and generation of…

  9. American Indian Language Proficiency Assessment; Considerations and Resources.

    ERIC Educational Resources Information Center

    Arizona State Dept. of Education, Phoenix.

    A primary concern affecting the more than 300 American Indian tribes and their educational institutions is the promotion, maintenance, and preservation of their approximately 200 native languages. The nature of language use must be documented and assessed to ascertain whether tribal members, particularly children, possess native language skills…

  10. An Instrument for Investigating Chinese Language Learning Environments in Singapore Secondary Schools

    ERIC Educational Resources Information Center

    Chua, Siew Lian; Wong, Angela F. L.; Chen, Der-Thanq

    2009-01-01

    This paper describes how a new classroom environment instrument, the "Chinese Language Classroom Environment Inventory (CLCEI)", was developed to investigate the nature of Chinese language classroom learning environments in Singapore secondary schools. The CLCEI is a bilingual instrument (English and Chinese Language) with 48 items…

  11. Cross-Language Information Retrieval: An Analysis of Errors.

    ERIC Educational Resources Information Center

    Ruiz, Miguel E.; Srinivasan, Padmini

    1998-01-01

    Investigates an automatic method for Cross Language Information Retrieval (CLIR) that utilizes the multilingual Unified Medical Language System (UMLS) Metathesaurus to translate Spanish natural-language queries into English. Results indicate that for Spanish, the UMLS Metathesaurus-based CLIR method is at least equivalent to if not better than…

  12. Merleau-Ponty's Phenomenology of Language and General Semantics.

    ERIC Educational Resources Information Center

    Lapointe, Francois H.

    A survey of Maurice Merleau-Ponty's views on the phenomenology of language yields insight into the basic semiotic nature of language. Merleau-ponty's conceptions stand in opposition to Saussure's linguistic postulations and Korzybski's scientism. That is, if language is studied phenomenologically, the acts of speech and gesture take on greater…

  13. El Espanol como Idioma Universal (Spanish as a Universal Language)

    ERIC Educational Resources Information Center

    Mijares, Jose

    1977-01-01

    A proposal to transform Spanish into a universal language because it possesses the prerequisites: it is a living language, spoken in several countries; it is a natural language; and it uses the ordinary alphabet. Details on simplification and standardization are given. (Text is in Spanish.) (AMH)

  14. Language Arts Program Guide, K-12.

    ERIC Educational Resources Information Center

    Hawaii State Dept. of Education, Honolulu. Office of Instructional Services.

    Intended for use by administrators, teachers, and district and state personnel, this guide provides a framework for Hawaii's kindergarten through grade 12 language arts program. Various sections of the guide contain (1) a statement of beliefs concerning the nature of language, language and learning, the student, and the school climate; (2) program…

  15. Mirror Neurons and the Evolution of Language

    ERIC Educational Resources Information Center

    Corballis, Michael C.

    2010-01-01

    The mirror system provided a natural platform for the subsequent evolution of language. In nonhuman primates, the system provides for the understanding of biological action, and possibly for imitation, both prerequisites for language. I argue that language evolved from manual gestures, initially as a system of pantomime, but with gestures…

  16. Clinical and Educational Perspectives on Language Intervention for Children with Autism.

    ERIC Educational Resources Information Center

    Kamhi, Alan G.; And Others

    The paper examines aspects of effective language intervention with autistic children. An overview is presented about the nature of language, its perception and comprehension, and the production of speech-language. Assessment strategies are considered. The second part of the paper analyzes traditional and communications-based intervention programs.…

  17. Teachers' and Students' Beliefs regarding Aspects of Language Learning

    ERIC Educational Resources Information Center

    Davis, Adrian

    2003-01-01

    The similarities and dissimilarities between teachers' and students' conceptions of language learning were addressed through a questionnaire survey concerning the nature and methods of language learning. The results indicate points of congruence between teachers' and students' beliefs about language learning in respect of eight main areas.…

  18. Rank and Sparsity in Language Processing

    ERIC Educational Resources Information Center

    Hutchinson, Brian

    2013-01-01

    Language modeling is one of many problems in language processing that have to grapple with naturally high ambient dimensions. Even in large datasets, the number of unseen sequences is overwhelmingly larger than the number of observed ones, posing clear challenges for estimation. Although existing methods for building smooth language models tend to…

  19. Informal Language Learning Setting: Technology or Social Interaction?

    ERIC Educational Resources Information Center

    Bahrani, Taher; Sim, Tam Shu

    2012-01-01

    Based on the informal language learning theory, language learning can occur outside the classroom setting unconsciously and incidentally through interaction with the native speakers or exposure to authentic language input through technology. However, an EFL context lacks the social interaction which naturally occurs in an ESL context. To explore…

  20. Discourses of prejudice in the professions: the case of sign languages

    PubMed Central

    Humphries, Tom; Kushalnagar, Poorna; Mathur, Gaurav; Napoli, Donna Jo; Padden, Carol; Rathmann, Christian; Smith, Scott

    2017-01-01

    There is no evidence that learning a natural human language is cognitively harmful to children. To the contrary, multilingualism has been argued to be beneficial to all. Nevertheless, many professionals advise the parents of deaf children that their children should not learn a sign language during their early years, despite strong evidence across many research disciplines that sign languages are natural human languages. Their recommendations are based on a combination of misperceptions about (1) the difficulty of learning a sign language, (2) the effects of bilingualism, and particularly bimodalism, (3) the bona fide status of languages that lack a written form, (4) the effects of a sign language on acquiring literacy, (5) the ability of technologies to address the needs of deaf children and (6) the effects that use of a sign language will have on family cohesion. We expose these misperceptions as based in prejudice and urge institutions involved in educating professionals concerned with the healthcare, raising and educating of deaf children to include appropriate information about first language acquisition and the importance of a sign language for deaf children. We further urge such professionals to advise the parents of deaf children properly, which means to strongly advise the introduction of a sign language as soon as hearing loss is detected. PMID:28280057

  1. A Look at Natural Language Retrieval Systems

    ERIC Educational Resources Information Center

    Townley, Helen M.

    1971-01-01

    Natural language systems are seen as falling into two classes - those which process and analyse the input and store it in an ordered fashion, and those which employ controls at the output stage. A variety of systems of both types is reviewed, and their respective features are discussed. (12 references) (Author/NH)

  2. Reconceptualizing the Nature of Goals and Outcomes in Language/s Education

    ERIC Educational Resources Information Center

    Leung, Constant; Scarino, Angela

    2016-01-01

    Transformations associated with the increasing speed, scale, and complexity of mobilities, together with the information technology revolution, have changed the demography of most countries of the world and brought about accompanying social, cultural, and economic shifts (Heugh, 2013). This complex diversity has changed the very nature of…

  3. Learning by Communicating in Natural Language with Conversational Agents

    ERIC Educational Resources Information Center

    Graesser, Arthur; Li, Haiying; Forsyth, Carol

    2014-01-01

    Learning is facilitated by conversational interactions both with human tutors and with computer agents that simulate human tutoring and ideal pedagogical strategies. In this article, we describe some intelligent tutoring systems (e.g., AutoTutor) in which agents interact with students in natural language while being sensitive to their cognitive…

  4. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    ERIC Educational Resources Information Center

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  5. Natural Language Processing and Game-Based Practice in iSTART

    ERIC Educational Resources Information Center

    Jackson, G. Tanner; Boonthum-Denecke, Chutima; McNamara, Danielle S.

    2015-01-01

    Intelligent Tutoring Systems (ITSs) are situated in a potential struggle between effective pedagogy and system enjoyment and engagement. iSTART, a reading strategy tutoring system in which students practice generating self-explanations and using reading strategies, employs two devices to engage the user. The first is natural language processing…

  6. Linguistically Motivated Features for CCG Realization Ranking

    ERIC Educational Resources Information Center

    Rajkumar, Rajakrishnan

    2012-01-01

    Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface…

  7. Design of Lexicons in Some Natural Language Systems.

    ERIC Educational Resources Information Center

    Cercone, Nick; Mercer, Robert

    1980-01-01

    Discusses an investigation of certain problems concerning the structural design of lexicons used in computational approaches to natural language understanding. Emphasizes three aspects of design: retrieval of relevant portions of lexicals items, storage requirements, and representation of meaning in the lexicon. (Available from ALLC, Dr. Rex Last,…

  8. Leveraging Code Comments to Improve Software Reliability

    ERIC Educational Resources Information Center

    Tan, Lin

    2009-01-01

    Commenting source code has long been a common practice in software development. This thesis, consisting of three pieces of work, made novel use of the code comments written in natural language to improve software reliability. Our solution combines Natural Language Processing (NLP), Machine Learning, Statistics, and Program Analysis techniques to…

  9. On the Margins of Discourse: The Relation of Literature to Language.

    ERIC Educational Resources Information Center

    Smith, Barbara Herrnstein

    This centrally focused collection of articles and lectures examines literary interpretation and the relation of literature to language. The first of the book's three parts introduces the distinction between natural discourse and fictive discourse (verbal structures that function as representatives of natural utterances). It also deals with the…

  10. Dealing with Quantifier Scope Ambiguity in Natural Language Understanding

    ERIC Educational Resources Information Center

    Hafezi Manshadi, Mohammad

    2014-01-01

    Quantifier scope disambiguation (QSD) is one of the most challenging problems in deep natural language understanding (NLU) systems. The most popular approach for dealing with QSD is to simply leave the semantic representation (scope-) underspecified and to incrementally add constraints to filter out unwanted readings. Scope underspecification has…

  11. Verification Processes in Recognition Memory: The Role of Natural Language Mediators

    ERIC Educational Resources Information Center

    Marshall, Philip H.; Smith, Randolph A. S.

    1977-01-01

    The existence of verification processes in recognition memory was confirmed in the context of Adams' (Adams & Bray, 1970) closed-loop theory. Subjects' recognition was tested following a learning session. The expectation was that data would reveal consistent internal relationships supporting the position that natural language mediation plays…

  12. Two Interpretive Systems for Natural Language?

    ERIC Educational Resources Information Center

    Frazier, Lyn

    2015-01-01

    It is proposed that humans have available to them two systems for interpreting natural language. One system is familiar from formal semantics. It is a type based system that pairs a syntactic form with its interpretation using grammatical rules of composition. This system delivers both plausible and implausible meanings. The other proposed system…

  13. A Text Knowledge Base from the AI Handbook.

    ERIC Educational Resources Information Center

    Simmons, Robert F.

    1987-01-01

    Describes a prototype natural language text knowledge system (TKS) that was used to organize 50 pages of a handbook on artificial intelligence as an inferential knowledge base with natural language query and command capabilities. Representation of text, database navigation, query systems, discourse structuring, and future research needs are…

  14. First Toronto Conference on Database Users. Systems that Enhance User Performance.

    ERIC Educational Resources Information Center

    Doszkocs, Tamas E.; Toliver, David

    1987-01-01

    The first of two papers discusses natural language searching as a user performance enhancement tool, focusing on artificial intelligence applications for information retrieval and problems with natural language processing. The second presents a conceptual framework for further development and future design of front ends to online bibliographic…

  15. Incremental Bayesian Category Learning from Natural Language

    ERIC Educational Resources Information Center

    Frermann, Lea; Lapata, Mirella

    2016-01-01

    Models of category learning have been extensively studied in cognitive science and primarily tested on perceptual abstractions or artificial stimuli. In this paper, we focus on categories acquired from natural language stimuli, that is, words (e.g., "chair" is a member of the furniture category). We present a Bayesian model that, unlike…

  16. Net-centric ACT-R-Based Cognitive Architecture with DEVS Unified Process

    DTIC Science & Technology

    2011-04-01

    effort has been spent in analyzing various forms of requirement specifications, viz, state-based, Natural Language based, UML-based, Rule- based, BPMN ...requirement specifications in one of the chosen formats such as BPMN , DoDAF, Natural Language Processing (NLP) based, UML- based, DSL or simply

  17. NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval.

    ERIC Educational Resources Information Center

    Zhou, Lina; Zhang, Dongsong

    2003-01-01

    Proposes a theoretical framework called NLPIR that integrates natural language processing (NLP) into information retrieval (IR) based on the assumption that there exists representation distance between queries and documents. Discusses problems in traditional keyword-based IR, including relevance, and describes some existing NLP techniques.…

  18. Language and Interactional Discourse: Deconstrusting the Talk-Generating Machinery in Natural Conversation

    ERIC Educational Resources Information Center

    Enyi, Amaechi Uneke

    2015-01-01

    The study entitled "Language and Interactional Discourse: Deconstructing the Talk-Generating Machinery in Natural Conversation" is an analysis of spontaneous and informal conversation. The study, carried out in the theoretical and methodological tradition of Ethnomethodology, was aimed at explicating how ordinary talk is organized and…

  19. Learning a Foreign Language: A New Path to Enhancement of Cognitive Functions

    ERIC Educational Resources Information Center

    Shoghi Javan, Sara; Ghonsooly, Behzad

    2018-01-01

    The complicated cognitive processes involved in natural (primary) bilingualism lead to significant cognitive development. Executive functions as a fundamental component of human cognition are deemed to be affected by language learning. To date, a large number of studies have investigated how natural (primary) bilingualism influences executive…

  20. Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents

    PubMed Central

    2010-01-01

    Background Drug-drug interactions are frequently reported in the increasing amount of biomedical literature. Information Extraction (IE) techniques have been devised as a useful instrument to manage this knowledge. Nevertheless, IE at the sentence level has a limited effect because of the frequent references to previous entities in the discourse, a phenomenon known as 'anaphora'. DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction. Methods The system uses a set of linguistic rules drawn by Centering Theory over the analysis provided by a biomedical syntactic parser. Semantic information provided by the Unified Medical Language System (UMLS) is also integrated in order to improve the recognition and the resolution of nominal drug anaphors. Besides, a corpus has been developed in order to analyze the phenomena and evaluate the current approach. Each possible case of anaphoric expression was looked into to determine the most effective way of resolution. Results An F-score of 0.76 in anaphora resolution was achieved, outperforming significantly the baseline by almost 73%. This ad-hoc reference line was developed to check the results as there is no previous work on anaphora resolution in pharmalogical documents. The obtained results resemble those found in related-semantic domains. Conclusions The present approach shows very promising results in the challenge of accounting for anaphoric expressions in pharmacological texts. DrugNerAr obtains similar results to other approaches dealing with anaphora resolution in the biomedical domain, but, unlike these approaches, it focuses on documents reflecting drug interactions. The Centering Theory has proved being effective at the selection of antecedents in anaphora resolution. A key component in the success of this framework is the analysis provided by the MMTx program and the DrugNer system that allows to deal with the complexity of the pharmacological language. It is expected that the positive results of the resolver increases performance of our future drug-drug interaction extraction system. PMID:20406499

  1. The Rhythm of Language: Fostering Oral and Listening Skills in Singapore Pre-School Children through an Integrated Music and Language Arts Program.

    ERIC Educational Resources Information Center

    Gan, Linda; Chong, Sylvia

    1998-01-01

    Examined the effectiveness of a year-long integrated language and music program (the Expressive Language and Music Project) to enhance Singaporean kindergartners' English oral-language competency. Found that the natural communicative setting and creative use of resources and activities based on the Orff and Kodaly approaches facilitated language…

  2. Natural Language Processing: A Tutorial. Revision

    DTIC Science & Technology

    1990-01-01

    English in word-for-word language translations. An oft-repeated (although fictional) anecdote illustrates the ... English by a language translation program, became: " The vodka is strong but 3 the steak is rotten." The point made is that vast amounts of knowledge...are required for effective language translations. The initial goal for Language Translation was "fully-automatic high-quality translation" (FAHOT).

  3. "Seamlessly" Learning Chinese: Contextual Meaning Making and Vocabulary Growth in a Seamless Chinese as a Second Language Learning Environment

    ERIC Educational Resources Information Center

    Wong, Lung-Hsiang; King, Ronnel B.; Chai, Ching Sing; Liu, May

    2016-01-01

    Second language learners are typically hampered by the lack of a natural environment to use the target language for authentic communication purpose (as a means for "learning by applying"). Thus, we propose MyCLOUD, a mobile-assisted seamless language learning approach that aims to nurture a second language social network that bridges…

  4. Numeral-Incorporating Roots in Numeral Systems: A Comparative Analysis of Two Sign Languages

    ERIC Educational Resources Information Center

    Fuentes, Mariana; Massone, Maria Ignacia; Fernandez-Viader, Maria del Pilar; Makotrinsky, Alejandro; Pulgarin, Francisca

    2010-01-01

    Numeral-incorporating roots in the numeral systems of Argentine Sign Language (LSA) and Catalan Sign Language (LSC), as well as the main features of the number systems of both languages, are described and compared. Informants discussed the use of numerals and roots in both languages (in most cases in natural contexts). Ten informants took part in…

  5. The State-of-the-Art in Natural Language Understanding.

    DTIC Science & Technology

    1981-01-28

    driven text analysis. If we know a story is about a restaurant, we expect that we may encounter a waitress, menu, table, a bill, food , and other... Pront aids for Data Bases During the 70’s a number of natural language data base front ends apreared: LUNPLR Woods et al 19721 has already been briefly...like to loo.< it inr. ui4 : 3D ’-- "-: handling of novel language, especially netaphor; az-I i,?i nn rti inriq, -mlerstanding systems: the handling of

  6. A study of the very high order natural user language (with AI capabilities) for the NASA space station common module

    NASA Technical Reports Server (NTRS)

    Gill, E. N.

    1986-01-01

    The requirements are identified for a very high order natural language to be used by crew members on board the Space Station. The hardware facilities, databases, realtime processes, and software support are discussed. The operations and capabilities that will be required in both normal (routine) and abnormal (nonroutine) situations are evaluated. A structure and syntax for an interface (front-end) language to satisfy the above requirements are recommended.

  7. Artificial intelligence, expert systems, computer vision, and natural language processing

    NASA Technical Reports Server (NTRS)

    Gevarter, W. B.

    1984-01-01

    An overview of artificial intelligence (AI), its core ingredients, and its applications is presented. The knowledge representation, logic, problem solving approaches, languages, and computers pertaining to AI are examined, and the state of the art in AI is reviewed. The use of AI in expert systems, computer vision, natural language processing, speech recognition and understanding, speech synthesis, problem solving, and planning is examined. Basic AI topics, including automation, search-oriented problem solving, knowledge representation, and computational logic, are discussed.

  8. A Review of Language: The Cultural Tool by Daniel L. Everett

    PubMed Central

    Weitzman, Raymond S.

    2013-01-01

    Language: The Cultural Tool by Daniel Everett covers a broad spectrum of issues concerning the nature of language from the perspective of an anthropological linguist who has had considerable fieldwork experience studying the language and culture of the Pirahã, an indigenous Amazonian tribe in Brazil, as well as a number of other indigenous languages and cultures. This review focuses mainly on the key elements of his approach to language: language as a solution to the communication problem; Everett's conception of language; what makes language possible; how language and culture influence each other.

  9. Visual statistical learning is related to natural language ability in adults: An ERP study.

    PubMed

    Daltrozzo, Jerome; Emerson, Samantha N; Deocampo, Joanne; Singh, Sonia; Freggens, Marjorie; Branum-Martin, Lee; Conway, Christopher M

    2017-03-01

    Statistical learning (SL) is believed to enable language acquisition by allowing individuals to learn regularities within linguistic input. However, neural evidence supporting a direct relationship between SL and language ability is scarce. We investigated whether there are associations between event-related potential (ERP) correlates of SL and language abilities while controlling for the general level of selective attention. Seventeen adults completed tests of visual SL, receptive vocabulary, grammatical ability, and sentence completion. Response times and ERPs showed that SL is related to receptive vocabulary and grammatical ability. ERPs indicated that the relationship between SL and grammatical ability was independent of attention while the association between SL and receptive vocabulary depended on attention. The implications of these dissociative relationships in terms of underlying mechanisms of SL and language are discussed. These results further elucidate the cognitive nature of the links between SL mechanisms and language abilities. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Visual statistical learning is related to natural language ability in adults: An ERP Study

    PubMed Central

    Daltrozzo, Jerome; Emerson, Samantha N.; Deocampo, Joanne; Singh, Sonia; Freggens, Marjorie; Branum-Martin, Lee; Conway, Christopher M.

    2017-01-01

    Statistical learning (SL) is believed to enable language acquisition by allowing individuals to learn regularities within linguistic input. However, neural evidence supporting a direct relationship between SL and language ability is scarce. We investigated whether there are associations between event-related potential (ERP) correlates of SL and language abilities while controlling for the general level of selective attention. Seventeen adults completed tests of visual SL, receptive vocabulary, grammatical ability, and sentence completion. Response times and ERPs showed that SL is related to receptive vocabulary and grammatical ability. ERPs indicated that the relationship between SL and grammatical ability was independent of attention while the association between SL and receptive vocabulary depended on attention. The implications of these dissociative relationships in terms of underlying mechanisms of SL and language are discussed. These results further elucidate the cognitive nature of the links between SL mechanisms and language abilities. PMID:28086142

  11. Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

    PubMed

    Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John

    2014-01-01

    BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.

  12. Culture and biology in the origins of linguistic structure.

    PubMed

    Kirby, Simon

    2017-02-01

    Language is systematically structured at all levels of description, arguably setting it apart from all other instances of communication in nature. In this article, I survey work over the last 20 years that emphasises the contributions of individual learning, cultural transmission, and biological evolution to explaining the structural design features of language. These 3 complex adaptive systems exist in a network of interactions: individual learning biases shape the dynamics of cultural evolution; universal features of linguistic structure arise from this cultural process and form the ultimate linguistic phenotype; the nature of this phenotype affects the fitness landscape for the biological evolution of the language faculty; and in turn this determines individuals' learning bias. Using a combination of computational simulation, laboratory experiments, and comparison with real-world cases of language emergence, I show that linguistic structure emerges as a natural outcome of cultural evolution once certain minimal biological requirements are in place.

  13. On the relation between dependency distance, crossing dependencies, and parsing. Comment on "Dependency distance: a new perspective on syntactic patterns in natural languages" by Haitao Liu et al.

    NASA Astrophysics Data System (ADS)

    Gómez-Rodríguez, Carlos

    2017-07-01

    Liu et al. [1] provide a comprehensive account of research on dependency distance in human languages. While the article is a very rich and useful report on this complex subject, here I will expand on a few specific issues where research in computational linguistics (specifically natural language processing) can inform DDM research, and vice versa. These aspects have not been explored much in [1] or elsewhere, probably due to the little overlap between both research communities, but they may provide interesting insights for improving our understanding of the evolution of human languages, the mechanisms by which the brain processes and understands language, and the construction of effective computer systems to achieve this goal.

  14. Automated database design from natural language input

    NASA Technical Reports Server (NTRS)

    Gomez, Fernando; Segami, Carlos; Delaune, Carl

    1995-01-01

    Users and programmers of small systems typically do not have the skills needed to design a database schema from an English description of a problem. This paper describes a system that automatically designs databases for such small applications from English descriptions provided by end-users. Although the system has been motivated by the space applications at Kennedy Space Center, and portions of it have been designed with that idea in mind, it can be applied to different situations. The system consists of two major components: a natural language understander and a problem-solver. The paper describes briefly the knowledge representation structures constructed by the natural language understander, and, then, explains the problem-solver in detail.

  15. SWAN: An expert system with natural language interface for tactical air capability assessment

    NASA Technical Reports Server (NTRS)

    Simmons, Robert M.

    1987-01-01

    SWAN is an expert system and natural language interface for assessing the war fighting capability of Air Force units in Europe. The expert system is an object oriented knowledge based simulation with an alternate worlds facility for performing what-if excursions. Responses from the system take the form of generated text, tables, or graphs. The natural language interface is an expert system in its own right, with a knowledge base and rules which understand how to access external databases, models, or expert systems. The distinguishing feature of the Air Force expert system is its use of meta-knowledge to generate explanations in the frame and procedure based environment.

  16. pymzML--Python module for high-throughput bioinformatics on mass spectrometry data.

    PubMed

    Bald, Till; Barth, Johannes; Niehues, Anna; Specht, Michael; Hippler, Michael; Fufezan, Christian

    2012-04-01

    pymzML is an extension to Python that offers (i) an easy access to mass spectrometry (MS) data that allows the rapid development of tools, (ii) a very fast parser for mzML data, the standard data format in MS and (iii) a set of functions to compare or handle spectra. pymzML requires Python2.6.5+ and is fully compatible with Python3. The module is freely available on http://pymzml.github.com or pypi, is published under LGPL license and requires no additional modules to be installed. christian@fufezan.net.

  17. KEGGParser: parsing and editing KEGG pathway maps in Matlab.

    PubMed

    Arakelyan, Arsen; Nersisyan, Lilit

    2013-02-15

    KEGG pathway database is a collection of manually drawn pathway maps accompanied with KGML format files intended for use in automatic analysis. KGML files, however, do not contain the required information for complete reproduction of all the events indicated in the static image of a pathway map. Several parsers and editors of KEGG pathways exist for processing KGML files. We introduce KEGGParser-a MATLAB based tool for KEGG pathway parsing, semiautomatic fixing, editing, visualization and analysis in MATLAB environment. It also works with Scilab. The source code is available at http://www.mathworks.com/matlabcentral/fileexchange/37561.

  18. All Together Now: Concurrent Learning of Multiple Structures in an Artificial Language

    ERIC Educational Resources Information Center

    Romberg, Alexa R.; Saffran, Jenny R.

    2013-01-01

    Natural languages contain many layers of sequential structure, from the distribution of phonemes within words to the distribution of phrases within utterances. However, most research modeling language acquisition using artificial languages has focused on only one type of distributional structure at a time. In two experiments, we investigated adult…

  19. Implicit Language Learning: Adults' Ability to Segment Words in Norwegian

    ERIC Educational Resources Information Center

    Kittleson, Megan M.; Aguilar, Jessica M.; Tokerud, Gry Line; Plante, Elena; Asbjornsen, Arve E.

    2010-01-01

    Previous language learning research reveals that the statistical properties of the input offer sufficient information to allow listeners to segment words from fluent speech in an artificial language. The current pair of studies uses a natural language to test the ecological validity of these findings and to determine whether a listener's language…

  20. Assessment of Academic Literacy Skills: Preparing Minority and LEP (Limited English Proficient) Students for Postsecondary Education.

    ERIC Educational Resources Information Center

    Kuehn, Phyllis

    Addressing the problem of the language-related barriers to successful postsecondary education for underprepared college students, an assessment of academic language proficiency and a curriculum to help students improve their academic language skills were developed. The nature of the language tasks required in the undergraduate curriculum was…

Top